Monday, November 12, 2018

RatioKey project posted to GitHub

I've placed the Xcode project for RatioKey 1.1 in a GitHub repository.

This is an outdated Xcode project, unchanged since late 2010. Attempting to load it into a modern version of Xcode results in many errors.

Casually reviewing the code, eight years on, I'm reminded of the painful effort involved in getting it into a shippable state, and am mildly horrified at the needless complexity it oozes. Brother Ockham would not be pleased.

Friday, October 26, 2018

Pursuing clarity through openness, part 9: crafting voice by shifting amplitude among harmonics over time

Just as every member of the harmonic series composing a structure is also a member of the harmonic series defined by the Highest Common Fundamental, so too are their own harmonics, and we can make use of this to transform them from simple sine waves to complex tones, perhaps even phonemes, by specifying how much each of those secondary harmonics should contribute to voicing the primary harmonics in response to user actions.

A fairly simple and straightforward way of doing this can most easily be described by analogy to a row of decrepit fenceposts and the nonparallel rails (or wire, if you prefer) between them. The fenceposts represent particular points in time, specified for the purpose of rendering in terms of samples, strung out between the beginning and end of a note. The rails represent an intensity (volume factor) for each of the secondary harmonics contributing to the overall sound to be rendered, but unlike a fence in good repair these rails may cross each other and either end (or the entire rail) end may lie on the ground between any pair of posts. Typically, the rails will at least sit at an angle between any two posts, representing interpolated intensity values.

What is really at play here is a movement of acoustic energy among secondary harmonics in the interest of creating the voice of a primary harmonic. Because those secondary harmonics are also part of the harmonic structure, the same process of multiplying the phase of the Highest Common Fundamental (for the current sample, expressed in cycles) by the harmonic number in terms of the HCF for those secondary harmonics, keeping only the fractional part, multiplying by the size of the sine table, and truncating to produce an index for sine lookup, still applies. In fact this replaces going through these steps for the primary harmonic, since its voice is now composed of the intensities of its own harmonics, remembering that it is its own first harmonic.

For each sample and each secondary harmonic, the result of the sine table lookup is multiplied by the intensity factor calculated for that secondary harmonic and that sample, then the results of those multiplications added together to arrive at the contribution that note makes to the overall sample value. These totals for multiple simultaneous notes are simply added together.

It might be more efficient to combine intensities for various HCF-harmonics before multiplying by the result of the sine table lookup for each, but that is a more complex coding problem, so I'll leave this as an mission for the reader, should you decide to accept it.

If the fencepost and rails analogy doesn't work for you, and you have an old-style bulletin board and some push-pins and string handy, you can use columns of push-pins to represent posts (samples for which the intensities of secondary harmonics are explicitly specified) and string stretched between those push-pins to represent interpolated values.

More elaborate versions, using graphically defined BĂ©zier curves or explicit functions to specify the per-sample intensities of secondary harmonics are also options. So too are modifications to those specifications based on user action parameters like the velocity, pressure, and time-to-release.

Okay, take a breath, step back, let it sink in, and see if a playground full of cool toys doesn't gel in front of you, and maybe also some appreciation for why a part-time developer like myself might find such a project daunting, and why I have chosen to lay it all out.

Even this isn't exhaustive; there's plenty of room for expanding upon this vision, and I invite any with the motivation to do so to take it and run with it.

I'm sure I'll have more to say, details to be filled in, loose ends to be tied up, but this marks the end of of my whirlwind introduction to the topic.

Wednesday, October 24, 2018

Pursuing clarity through openness, part 8: integer-ratio intervals against a logarithmic scale

While integer-ratio intervals generally sound better than irrational intervals, the sensation of pitch is roughly logarithmic. Ascending octaves on a piano sound like they each rise about the same amount, but the frequency doubles from one to the next.

To accommodate this, when adapting harmonic series to a playable interface, whether on-screen or physical, the spacing between the buttons or touch-pads or whatever represents notes should diminish moving from lower to higher harmonics. The distance between the 1st and 2nd harmonics should be the same as the distance between the 2nd and 4th, between the 4th and 8th, and between the 8th and 16th. Depending upon how many harmonics you include, it may not be possible to have them all on-screen or within reach simultaneously, and the higher ones will present increasingly smaller targets.

The problem of smaller targets can be alleviated by using multiple harmonic series, which is what harmonic structures are all about. It can also be alleviated by removing harmonics that are irrelevant to a particular purpose, leaving a sparse structure that might be termed a lattice. This filtering is another case where prime factors can be useful.

A perk of using a logarithmic scale for pitch is that it allows having multiple harmonic series that are copies of a template, all with exactly the same dimensions. These duplicate series can be moved up or down-scale without distorting the correlation between their position and the frequencies they produce. Even better, everything at the same position along that scale will have the same pitch.

I'm rather fond of the notion of a physical instrument interface patterned generally on the shape of the saguaro cactus, which has branches that emerge almost horizontally from the main stem and then turn sharply upwards. Vertical pieces each representing a single harmonic series could be mounted on a central post so they would slide up/down through slots, or pivot on a parallelogram linkage, the idea being that the higher they were positioned the higher the frequencies they would generate, again using a logarithmic frequency scale.

There is one more major topic to cover, and probably some loose ends to tie up, but I think I'll be taking a break before proceeding with the next installment.

Pursuing clarity through openness, part 7: from structure to sound

Digital sound is a complex subject, with many variations on the theme. Most use Pulse-code Modulation (PCM) in some fashion. PCM is a sequence of numbers representing the amplitude of a sound wave, the instantaneous pressure, measured frequently at regular intervals in the case of a microphone capturing sound from the environment. The frequency of those measurements, the sample rate, is most commonly 44100 per second, too low to capture the nuances of the squeaks made by mice and bats but more than adequate for human voices.

The way those measurements are encoded varies, with 16-bit signed integers being a common format made popular by its use on CDs. Apple uses that format for its microphones and speakers, but internally Apple's OSes use 32-bit floats to pass data around, waiting until the very last step to convert those to integers for output. So, at least for Apple devices, synthesizing sound means generating a sequence of 32-bit float 'samples' quickly enough to stay at least a little ahead of the output, so that it never runs out.

However, if you're working in an interactive context, where the delay between a user action and the sound it generates needs to be imperceptibly small, you don't want to get too far ahead of the output. If, for example, the length of a note depends on the time between touch down and touch up events, it cannot be entirely precomputed, and even if it could, if you want to be able to play more than a single note at once, there would be the issue of combining it into the stream of sample values at just the right moment to produce proper phase-alignment, to avoid destructive interference.

The most straightforward approach is to generate the stream of samples to be fed to the output on the fly, just in time. Apple's older Core Audio framework provides callbacks for this purpose; you supply a block/closure, or a pointer to a function, to an audio unit, which then calls that code whenever it is ready for more data. This is a low-latency process. The challenge is to craft code that will return in time, so you don't leave the output without data. You stand a better chance of doing this in C than in Swift, but even in C you need to be careful not to try to do too much in a callback; anything that can be precomputed should be.

AVAudioNodes provide callbacks, but it's not clear to me whether these are appropriate for an interactive context. AVAudioNodes also wrap AUAudioUnits, which have callbacks of their own. I think it should be possible to make use of these and avoid the need to set up an audio unit in C, but I already had that code so I haven't yet put this theory to the test.

At this point you'll be staring, figuratively if not actually, at an empty callback routine. At least in the case of Core Audio audio units, you will have been passed a structure with a pointer to an array of pointers to buffers. Assuming only a single channel, you'll get the pointer in the [0] cell of the array and begin writing sample values into the buffer. When done, you return the structure. Anything that needs to be retained from one such call to the next, such as a phase alignment, will need to have broader scope than the callback routine, or perhaps be static if that can work in this context.

I've mentioned phase-alignment a couple of times, but not yet what we need to track the phase of nor what alignment might mean.

The samples we'll be adding to the buffer mentioned above will be derived from sine values. Because sine values take some effort (cpu cycles) to compute they should be precomputed, so we'll want a table (an array) of them from which particular values can be extracted using a simple index. The table should represent one complete cycle of a sine wave, from sin(0.0) up to but not including sin(2.0 * pi) assuming you're using a sine function that takes radians as an argument.

Frequency, also called pitch, can be expressed in terms of the rate of progression through the values in this sine table. When using this approach, the size of the sine table (the number of elements it contains) becomes an important component of the calculations.

I've managed the business of tracking the phase-alignment of a synthetic sound wave as a progression through a sine table several different ways. Originally I used radians/second to represent frequency, which meant that the phase-alignment for the current sample had to be multiplied by the sine_table_size/two_pi and the result of that truncated to produce an integer index. Then I realized I might just as well be using sine-table-indices/second, which only needs to be checked for being out of range and adjusted by sine_table_size if it is. At some point it occurred to me that this approach, if combined with a sine_table_size equal to the sample rate, would eliminate the need for converting from cycles per second to sine-table-indices per second, since they would be equivalent, requiring only a type conversion from double to int just before the table lookup.

When I began to experiment with complex tones, I also began to use the current phase-alignment of the fundamental to generate the phase alignments of any harmonics to be included, multiplying it by their harmonic numbers, applying modulo division by the sine_table_size to reduce this product to the proper range, and using the result of that modulo division as an index into the sine table. Again, at some point it occurred to me that this same approach would work with harmonic structures composed of multiple harmonic series, if I were to track the phase of the Highest Common Fundamental (HCF) and multiply that by the harmonic numbers of the members of the structure as determined by their position within the harmonic series defined by the HCF, their HCF harmonic numbers.

Then, finally, I realized that, if I were to keep frequency in cycles per second, I could eliminate the modulo divisions, since modulo 1.0 is equivalent to simply dropping the non-fractional part of a floating point value. The tradeoff in doing this is the need to reintroduce multiplication by sine_table_size followed by truncation to produce integer indices for table lookup.

By this time I'd lost track of the distinction between these various approaches and began to combine elements of them inappropriately, leading to confusion – vaguely analogous to random mutation in genetics, it occasionally works out but mostly you get no discernible difference, or monsters.

So, now you know several ways to specify a frequency and how the choice of which to use will effect the process of using it to generate a stream of sine values to be passed along to next step, whether that's directly to the output or to code that applies modifications to the sine values before passing them along to the output. Right?

That's probably enough for one day.

Tuesday, October 23, 2018

Pursuing clarity through openness, part 6: the anchor, HCF, and tracing the chain of tonality

From this point on I'll (mostly) refer to the Highest Common Fundamental as HCF.

While you might actually choose an anchor that initially matches the pitch of the HCF, they have different purposes, and different behavior resulting from changes to the structure. The anchor is like a handle, a means of hanging onto the structure as a whole, and exactly where it's positioned is somewhat arbitrary. The HCF, on the other hand, even though it will end up being defined in terms of the anchor, is not arbitrary at all. It's position is dictated by the fundamentals of the series composing the structure. Change one of those fundamentals and it's likely that the HCF will also need to change, and its relationship to every member of the structure along with it.

The anchor is the only component of the structure which is directly tied to the pitch-scale; everything else is connected to the pitch-scale through the anchor. For the purpose of sound generation that connection also passes through the HCF. Starting with the ratios relating the fundamentals of the series composing the structure to the anchor, expressed in lowest terms, the denominators of those ratios will need to be reduced to their prime factors, for example 1/6 becomes 1/(2*3) and 1/9 becomes 1/(3*3). For each prime used, the greatest number of times it is used in a single denominator (in this example 2 and 3*3) is included in a multiplication to produce the denominator of a ratio relating relating the anchor to the HCF (2 * 3*3 = 18, yielding a ratio of 1/18). This is the application of prime factors for which I can make an argument in favor of their relevance.

To determine which members of the harmonic series defined by the HCF represent the fundamentals of the series composing the structure, we'll divide the ratios relating those fundamentals to the anchor by the ratio relating the HCF to the anchor, or rather invert and multiply (1/6 * 18 = 3, and 1/9 * 18 = 2).

At this point it will be helpful to introduce some shorthand notation, to differentiate between native harmonic numbers (multiples of the fundamentals of the harmonic series composing the structure) and harmonic numbers in the series defined by the HCF, I'll prefix the latter group with "HCF-" (HCF-3 and HCF-2 in the above example).

To go on to determine which members of the harmonic series defined by the HCF represent the members of the other series, we'll multiply their native harmonic numbers by the HCF harmonic number of the fundamental of the series to which they belong (using the 5th harmonic in each case, 5 * HCF-3 => HCF-15 and 5 * HCF-2 => HCF-10).

We can carry this one step further, crafting complex tones using harmonics of the harmonics, which are also part of the structure, by simply adding another multiplication factor. Continuing with the above example, the 2nd harmonic of the 5th harmonic of each series will be two times the basic harmonic (2 * HCF-15 => HCF-30 and 2 * HCF-10 => HCF-20) and the 3rd harmonic of the 5th harmonic of each series will be three times the basic harmonic (3 * HCF-15 => HCF-45 and 3 * HCF-10 => HCF-30).

Note that it's quite possible to arrive at multiple instances of the same HCF harmonic, and there may be code efficiencies to be built around that.

What we now have is a chain of connection, beginning with the pitch-scale, passing through the anchor, through the HCF, to the fundamentals of the harmonic series composing the structure, and from them to the other members of those series, and from them to their own harmonics, expressed in terms of multiples of the HCF.

So how do we get sound out of this? That'll be next.

Monday, October 22, 2018

Pursuing clarity through openness, part 5: the highest common fundamental

Given a harmonic structure, two or more harmonic series all connected together through identically pitched members at different harmonic numbers, an anchor having nominally stable integer-ratio relationships to the fundamentals of those series, and a means of specifying the pitch of that anchor, what more do you need?

Perhaps nothing else, depending upon what you have in mind to do, but if you intend to use this assemblage to synthesize sound there's another important piece to the puzzle.

Say you have two tuning forks that are very nearly the same pitch, within a fraction of 1 Hz of being exactly the same, and you strike them both and hold them near each other and close to one ear, the sound you hear will rise and fall as the sound waves from the tuning forks alternate between constructive and destructive interference. The rate at which this happens is the beat frequency, and the closer the tuning forks are to being exactly the same pitch the lower the beat frequency.

With digital sound, it is quite possible to have two sound sources at exactly the same constant pitch, meaning that how they interact at first is how they will continue to interact for as long as both sources persist, whether constructively, destructively, or something in between. What sort of interference you get depends entirely on the relative alignment of their phases at the outset, something that isn't easy to control if, for example, you're passing user events to code that spawns a new render thread for each new note. That scenario will reward you with random results.

To backtrack a bit, a property of harmonic structures is that they always imply a harmonic series that includes all of the members of all of the harmonic series composing the structure. Generally, the fundamental of that implied series will be lower than any of the fundamentals of the series composing the structure, although it might be the same as the lowest of them. Actually there are many such fundamentals, since, if x defines a harmonic series which includes all the members of a structure, so too will x/2, x/3, x/4, and so forth. To keep things simple, what we really want is the highest value satisfying the requirement that it define a series including every member of the structure. This is what I'll call the 'highest common fundamental'.

You can think of that highest common fundamental as a constant tone, albeit one which may be well below the threshold of human hearing and, in any case, isn't part of the sound produced. It only exists as a kind of metronome, establishing a phase alignment that advances at a steady pace and drives the phase alignments of all of the members of the structure, allowing control over how they interact, whether constructively or destructively. I'll save the details of how this works for later.

For now just soak in the idea that every harmonic structure implies a fundamental defining a series that includes every member of the structure, and that implied fundamental can be used to drive sound generation while controlling for constructive vs. destructive interference.

Pursuing clarity through openness, part 4: treating harmonic structures as units

Just as it's easier to carry a bucket that has a handle than one that doesn't, it's easier to manipulate and make use of a collection of related harmonic series if they are all tied to a single frame of reference that both specifies the relationships between them and relates them to an external context.

That external context is the pitch scale, frequencies measured in Hz. This is a bit like Schrödinger's cat in that it doesn't matter for the purpose of the relationships between harmonic series and their members, not until those relationships begin to participate in the generation of sound, whether tinkering, composition, practice, or performance. The relationships between series in a structure are pitch-independent (beat frequencies excepted).

However, in the interest of being able to apply that external pitch scale when the time comes, it will be convenient to have a sort of handle to connect to, a node which is stable with respect to the rest of the structure (at least between edits), so that by assigning a frequency to it you also assign frequencies to every member of every series in the structure. That node need not actually be a member of any of those series. The only requirement is that it be in integer-ratio relationship with them; exactly what's used is somewhat arbitrary. There are probably better choices and less good choices, based on the complexity of the ratios needed to specify the fundamentals of the component series, and it may be desirable to recompute this central point of reference after an edit, for the sake of keeping those ratios as simple as possible, but these issues are out of scope for the moment.

I'll be referring to this recomputable central point of reference as the 'anchor'. I say this with some trepidation as I have also used 'anchor' to mean something different, but this seems both the best word for what is intended and the best use of the word, so I'll let it be. Again, I welcome pointers to any standard terminology having the same meaning.

The anchor might start out as the fundamental of the first series in a structure, before anything else is added, but later you might find yourself moving that series either higher or lower with respect to the rest of the structure, meaning that either the anchor follows it, and the ratios relating it to the other fundamentals will need to be recalculated, or it remains stable with respect to the rest of the structure and the ratio relating it to that initial series will need to be recalculated. However one deals with this, after it's over the anchor must remain in integer-ratio relationship with the fundamentals of all series in the structure.

There's also the matter of how to specify the anchor within the external pitch-scale context. You can simply assign it a frequency, and then recompute that frequency as needed, however I recommend using the combination of an infinitely variable scalar (a floating-point value) and an integer ratio, multiplied together to produce the anchor's frequency. The ratio makes consonant transposition simple, and the scalar allows fine tuning.

Next, the nature of the 'highest common fundamental' and what it's good for.