Thursday, September 05, 2019

Pointer to relevant post on parallel blog

I have four blogs, three of which have significant overlap. The post linked below is an example of this.

It's primarily about my (modest) progress in catching up with the latest developments in the Swift programming language, but it bears on the project described here, to which I intend to first apply those developments.

Dog-paddling behind the bow wave

Tuesday, June 25, 2019

More complications, leading to a potential solution

This evening it occurred to me that varying the pitch of a note, while generating its phase from a multiple of the phase of a base tone, might result in artifacts. I'm not certain of this, and cannot yet articulate why I think it could happen, but it seems at least plausible.

A solution also occurred to me, which is to only use the base tone to generate the initial phase of the note, and from that point on track its phase independently. That thought lead to another complication; when you want the varying pitch to come to rest on a specific tone, the phase of the note may not align with a newly generated note of the same frequency.

I first thought about pacing the change in pitch so it would end up phase-aligned on the target frequency. This would work for scripted compositions, but in live performance it isn't possible to know what the target frequency will be until it happens.

So it seems as though a better solution would be to cross-fade from the sliding note to a newly generated note which is stable on the target frequency.

But, if this mechanism (independently tracking the phase of each note, after initiating it using the phase of the base frequency) is in place for notes with varying pitch, why not just use it for all notes, and not have to worry about whether they will remain at a constant pitch?

Applying this technique to all notes would mean that the base tone is only used to initiate new notes, which would mean precision is no longer an issue, so we can dispense with 80-bit floats!

[7/5/19: The thought that set this all in motion, that varying the pitch of a note while generating its phase from a multiple of the phase of a base tone might result in artifacts (noise), remains a matter of conjecture. I haven't yet hit upon a way of determining whether this is an actual concern. However, eliminating the need for 80-bit floats is sufficient motivation to proceed as though it were established fact.]

Wednesday, June 19, 2019

Ground-shifting changes

We've all had a couple weeks to assimilate all that was announced at WWDC, and those who surf the bleeding edge have been very busy getting up to speed and producing blog posts, newsletters, podcasts, and videos paving the way for the rest of us.

Just listing all of the resources already available would be a formidable task, so instead I'll just mention a couple of good starting points.

For anything related to the Swift programming language, the Swift.org website is the center of the universe. What you won't find there is much in the way of links to blogs, newsletters, podcasts, or YouTube channels relating even to Swift development.

That gaps is nicely filled by Dave Verwer's iOS Dev Directory, which does not include a link to this blog, nor should it!

I don't expect to have much to say here for a few months. In the meantime, you can catch me on Twitter at https://twitter.com/harmonicLattice.

Sunday, June 16, 2019

Navigating a larger problem space

On the same day as my most recent post here, I also began a thread on Twitter, in which I laid out the opportunities and constraints presented by various approaches to generating tones by multiplying the phase of a base tone by frequency ratios.

This took several hours, and I had to finish it the next morning, nevertheless, except for a minor glitch or two, I think I managed to get it straight, possibly for the first time.

Only generating tones that are all integer multiples of the base tone is significantly simpler, but taking that simple approach precludes the use of any musical practice involving pitch variation — bending, sliding, or vibrato.

For the purpose of producing a fuller sound, more like a physical instrument, the set of pure tones that are all integer multiples of the base tone is just too confining. Unfortunately, the alternative seems to be to use phases that continue to increase indefinitely, tracking them using high precision floating-point numbers to keep it working long enough to be usable. I keep thinking there must be a clever hack that would make this all unnecessary, but so far this has just lead me down rabbit holes.

The rabbit holes have become a problem because I cannot hold everything in that Twitter thread in my mind at once; I have to deal with it as I posted it there, in Tweet-sized bites, and have more than once lost track of one detail or another.

If you think of a cycle as being a circle, and phase as being an angle superimposed on that circle, or a position on its circumference, continuously increasing phases can be thought of as wrapping, winding, or coiling around that circle.

The need for high precision comes in because this approach involves multiplying the phase of the base tone by a frequency ratio that might have a value as high as 20,000, then discarding everything to the left of the decimal point, leaving only whatever significant figures were to the right of the decimal point. As that base tone phase increases, so too does the result of multiplying it by the frequency ratio, meaning there are fewer and fewer significant figures remaining on the right, and sooner or later insufficient precision to properly use it for the next step, conversion either into an index for a lookup table or directly into the magnitude of a sound wave for a particular sample, by means of an algorithm. Using higher-precision (80-bit) floating-point numbers buys time.

This inelegant approach grates on my sensibilities as a programmer, but, short of returning to only trying to produce tones that are integer multiples of the base tone, I haven't yet found any way around it.

Thursday, June 06, 2019

Cognitive paralysis: hopefully temporary

I'm presently doing a pretty good emulation of a robot that's got itself 'trapped' in a corner its programming is inadequate to escape. With any luck, this will pass, but I consider myself fortunate to have recognized the symptoms and desisted from digging myself even deeper into confusion.

Saturday, May 11, 2019

A Larger Vision: one piece falls into place

Over the past couple months the scope of this project has expanded rather suddenly, from one tightly focused on enabling music based on harmonics (also representable as integer-ratio intervals) to one which is still motivated by the desire to support harmonic tonality, but which also strives to be more generally useful. This means more work, but also something I might actually be proud to release into the world, if and when I get it into a state where it's ready for that.

One result of this reconceptualization is that I'll be repurposing the term "base frequency" from "an intermediary object which may be used in conjunction with the anchor, providing the scalar factor" to something more concrete, the sample rate divided by size of one or more lookup tables used to represent wave forms that aren't easily calculated on the fly, for example sine waves. As such it will be a minor detail of the implementation, not something user-facing, except as the user might be a programmer working with a framework, if that turns out to be the direction the project evolves.

(Update, 06June2019: At this time, ALL custom terminology should be considered temporary and subject to redefinition, replacement, or deprecation. If/when this all stabilizes, I'll post an updated lexicon.)

Wednesday, April 24, 2019

Moving Targets

I've been letting this project steep on the back burner while firming up my understanding of the basics of the Swift programming language, which I will be using, likely in combination with C for the most demanding real-time code. This has been a propitious pause, as it has surfaced rather gaping oversights in how I've thought about what I've set out to do. What follows is the current state of my evolving understanding and intention.

Caveat: My custom terminology is also still in flux, and usage going forward may not correlate exactly with what came before. I will endeavor to nail down this slippery business sooner rather than later.

Most fundamentally, while making harmonic-based melody more accessible is the primary motivation driving my interest in this project, baking that into the design in a form that makes working with or folding in other tonal systems unnecessarily difficult would be a mistake. This is easily accommodated by defining the frequencies of of available tones in terms of floating point numbers rather than integers. To keep compound error to a minimum, these should be double precision (64 bit).

Since, as previously mentioned, the simplest way to calculate sine table indices begins with tracking the per-sample phase of a 1.0 Hz base frequency, there no longer seems to be a clear purpose for the HCF (Highest Common Fundamental). However, I'm not confident this concept won't still prove valuable, so let's put it on the shelf for the time being. If it comes back off that shelf, it might well be under another, hopefully less clumsy name.

If tones can be specified simply in terms of their first-harmonic frequencies in Hz, expressed as double precision floating point numbers, rudimentary support for pitch bending and sliding becomes a simple matter of respecifying that first-harmonic frequency on a per-sample basis. I say 'rudimentary' because I suspect providing such support while avoiding artifacts will turn out to be more complicated than this.

Next there's the matter of the phases of overtones not necessarily being perfectly aligned with (typically trailing) the phase of a tone's first harmonic. For the moment let's call this overtone offset, since accommodating this can be as simple as adding an offset to the per-sample phase calculated for each overtone. That offset might be calculated as a fraction of the first harmonic's cycle time, and applied before conversion to a sine table index, although moving at least part of that calculation outside of the real-time context and passing the result in as a simple quantity would make sense.

Given overtones with phase offsets, the question arises whether we might want the option of defining tones in terms of multiple instances of overtones, each with its own per-sample offset and amplitude. Since this could so complicate real-time calculations that polyphony becomes problematic, I'm inclined to also put this idea on the shelf, until I've given more thought to the possibility of voices with some/all of the complicated rendering having been precomputed.

The main obstacle I see in the path of precomputation is the aspiration to make the sound output responsive to factors like velocity, pressure, up/down-scale movement, and time-to-release, which can't be known in advance. As a workaround, it should at least be possible to capture these while producing a less nuanced rendering in real time, then apply them after the fact, editing as needed to achieve the desired effect.

In any case, multiple overlapping notes using the same tone should be available, each with its own set of overtones and their variable attributes, with offsets also optionally applied to their first harmonics, for the purpose of generating echoes if nothing else. Considering this, providing multiple per-note instances of overtones might simply be needless complication.

Finally, because there's a temptation to withhold functionality from the real-time context in order make sure rendering can happen in a timely manner, this project really wants to split into two components (modes), one (stage) focused on real-time performance, and the other (studio) focused on providing a full set of features. The communication between these two modes is a sort of bidirectional funnel, and needs to be well defined. An advantage of this requirement is that it is an obvious place to look for an organizing principle, around which to build out the rest of the model and basic functionality.

As such, it may also prove a suitable focal point for any open source initiative, allowing 'stage' and 'studio' applications from different vendors to interoperate. But I'm really getting way ahead of myself in even mentioning that. First I need to build out my own project, then maybe I can think about turning it into an open-source project.

Addendum (25April2019): This is not even close to being a final decision, but I'm thinking it makes the most sense to specify, for any given note, the per-sample frequency, amplitude, and phase offset of the first harmonic, and then to specify the same attributes for higher harmonics (overtones) relative to that, although, for the sake of efficiency, it will be desirable to precompute as much of this as can be without sacrificing responsiveness to the performer.

Saturday, February 16, 2019

The Elements of Voice

In a previous post on this blog, I defined voice as "Any attributes in the synthesis of a note other than its basic frequency and the overall volume, for example the ADSR Envelope or emphasis on different harmonics as the note progresses." You can also find a brief explanation of the ADSR Envelope in that same post.

In RatioKey 1.1 (removed from the App Store more than two years ago), I provided the means to edit the duration of each phase of the ADSR envelope, as well as the volume at the point where each phase transitions into the next. This helped make up for that app only being capable of generating a single simple sine wave at a time, with each new note interrupting the previous note, and no support at all for overtones.

Even back in 2010, while working on that app, I wanted to be able to synthesize more interesting voices, composed of harmonics (what I'd now term secondary harmonics), with the intensity of each varying independently over time, and to craft a simple interface for editing such voices, but at that time I had no clear idea how to generate multiple simultaneous notes, much less how to build them from harmonic components.

Over the intervening years, I've ferreted out solutions for various aspects of this problem space, but it wasn't until I'd experienced the absence of phase alignment, motivating a reevaluation of my approach, which led to the idea of 1) determining the Highest Common Fundamental (HCF), 2) tracking its phase on a per-sample basis, and 3) using that phase to generate indices for sine table lookup on a per-sample basis for members of a harmonic structure, that I felt confident I could actually do it. That was, for me, the key missing piece to the puzzle.

In the process of fleshing out that idea, I had another eureka moment when I realized that this approach would not only facilitate the synthesis of any member of a harmonic structure while guaranteeing phase alignment, but it would also enable per-sample modulation of the harmonics of those structure members (secondary harmonics) by the very same method, since they are also part of the harmonic structure.

Given the ability to independently control the intensity of secondary harmonics over time, my sense is that this should supersede the ADSR paradigm. Yes, you might still want to ramp up the volume very quickly, drain some of it back off almost as quickly, then hold it nearly steady for awhile, before tapering off to silence, but this is just as easily achieved by controlling the intensity of component harmonics as by controlling that of the basic pitch.

Per-sample control of harmonic intensities, translated into physical terms, equates to moving acoustic energy around among harmonics, much as we do with our tongues and the way we shape our mouths while speaking. This might be approached with the discipline of an engineer applying the conservation of energy, or utterly fancifully, or anywhere in between. It could be used to mimic familiar sounds, or to create sounds even a veteran sound collector or foley artist would be hard pressed to find in the wild or produce physically.

There are also elements of voice that this approach, as currently conceived, does not support, notably any sort of pitch bending or sliding, except as these might be applied to a harmonic structure as a unit, rather than to individual notes. In the current version, all members of the harmonic structure, including the secondary harmonics, are discrete pitches.

(Yes, it should be possible to support pitch bending and sliding by allowing variable factors relating the HCF to parts of the structure. Strictly speaking, in that event, it would at least intermittently cease to be a harmonic structure. This may be a case where accommodation is more important than conceptual cohesion, and worth the added complexity. Further contemplation is indicated.)

Tuesday, January 29, 2019

From Harmonic Structure to HCF to Sample Value, Part 5: Focusing on Pitch Specification and Alteration

Up to this point I've treated the Anchor (and Base Frequency, possibly not mentioned here since 2010) as more-or-less integral aspects of a Harmonic Structure, but really the Anchor only exists to provide a couple of services.

First, and most obviously, the Anchor is a point of reference for specifying the pitches of the fundamentals of the harmonic series composing the structure, and also of the HCF (Highest Common Fundamental). For this purpose it is enough that the Anchor's own frequency be unambiguous. Tuning would simply involve incremental alterations to that frequency.

The other service the Anchor provides is the ability to move a harmonic structure up/down-scale as a unit, by integer-ratio factors. This is what I previously referred to as "Consonant Transposition" on the theory that such a change is likely to be more consonant than using an irrational factor.

There could be other ways to provide these services, of course, including the option of separating the scalar component of the definition of the Anchor's frequency from the integer-ratio component, by bringing back the concept of a Base Frequency.

The Base Frequency would be specified simply using a Double (double precision floating point value), which you could think of as a multiplication factor that is always applied to 1.0 Hertz.

The Anchor would then be specified as an integer-ratio multiple of the Base Frequency.

Tuning would be accomplished by altering the factor relating the Base Frequency to 1.0 Hz, and consonant transposition would be accomplished by altering the ratio relating the Anchor to the Base Frequency.

This seems a little cleaner to me than combining a Double and an integer ratio into a 'dual-component' type, but your mileage may very.

In any case, these details need not be exposed to the user! What matters is that the pitches of the fundamentals of the series composing the harmonic structure are tunable as a unit and editable by integer-ratio factors, collectively as well as individually, and that those pitches as well as that of the HCF are clearly specified.

Sunday, January 27, 2019

From Harmonic Structure to HCF to Sample Value, Part 4: Focusing on Phase & Phase Advancement

So maybe you're a little hazy on what is meant by phase, even more so regarding phase advancement, and not at all convinced I know what I'm talking about in suggesting that repeatedly multiplying the phase of a lower frequency by a positive integer can be used to generate a higher frequency. Like, how does that work?

Phase relates back to the sine wave, which itself relates back to the unit circle, but this is beginning to feel like a circular definition. What does it really mean?

Let's approach this from a different direction, using an analogy. Say you have a shaft, rotating at one degree per second. It's going to take that shaft 360 seconds to complete one rotation. Now say you have another shaft, the position of which is updated once per second according to the rule that its new position should be twice that of the first shaft. If the first shaft has moved 10 degrees, the second shaft will have moved 20 degrees. If the first shaft has moved 50 degrees, the second shaft will have moved 100.

But what happens when the first shaft has moved 180 degrees and the second shaft has moved 360 degrees? The second, faster shaft is already back where it started while the first shaft is still only halfway around. Fine, no problem, it's free to keep right on moving, starting a second rotation while the first shaft finishes its first, but because doubling the number of degrees the first shaft has turned will now result in a number larger than 360, we'll need to remove the first 360 degrees to bring the result into a range we can work with. So, essentially, when it gets to 360 degrees the second shaft resets to 0 degrees and keeps on moving.

Likewise, when the first shaft gets to 360 degrees, it also resets to zero and keeps moving.

But what if for every degree the first shaft moves the second shaft moves 5 degrees. The same principle applies, but because we're getting the position of the second shaft by multiplying the position of the first shaft by 5, it won't be enough to subtract 360 degrees after its first rotation, we'll need something that will work no matter how many rotations it has already completed. That something is modulo division.

In this example, after multiplying the position of the first shaft by 5 we'll take the result of that and apply modulo 360, to remove all of the full turns and leave only the amount by which the second shaft's new position exceeds a full turn. We could use the same approach for the first shaft, but in that case it's simpler to just subtract 360 degrees every time it completes a full rotation.

You may recall, in a previous installment I said that if you measure phase (rotation) in cycles, modulo division isn't necessary. This is because if we were to use modulo division in that case, it would be modulo 1.0, which is exactly equivalent to simply keeping the fractional portion of a decimal number and discarding everything to the left of the decimal point.

So, to ease back into more standard terminology, phase equates to how much the rotation of a shaft, at any given point in time, exceeds an indeterminate number of complete rotations. How far beyond the start/end point of a cycle it has progressed, and phase advancement equates to how much rotation occurs between one point in time and the next, one second and the next in the above example. It is a rate of change.

Note that in the above example we only applied phase advancement to the first shaft, to determine its phase at the next point in time, and used that to calculate the phase at the same point in time for the second shaft. The rate of phase advancement for the second shaft is only implied, never explicit.

Using this approach we might add a third shaft, applying the same multiplier to the phase of the first shaft as we did for the second shaft, and be confident that the second and third shafts would always be perfectly synchronized, rotating in lockstep.

A cycle is a cycle, whether it's a sine wave or a rotating shaft or the interplay of the tilt of Earth's rotational axis with its movement around the sun, creating seasons.

Phase is what portion of the next full cycle has been completed, and phase advancement is the rate of change of the phase, change/time. For a shaft, phase advancement is how fast it is turning. For a sound, phase advancement is its frequency, its pitch. For Earth's seasons, phase advancement is how quickly one passes into the next.

If you were confused before, I hope that you are now at least less confused.

Tuesday, January 22, 2019

From Harmonic Structure to HCF to Sample Value, Part 3: Clarifying Terminology

This is very much a work in progress. No doubt the list will grow over time, as inspiration strikes and I have time to give to it. Some items link to Wikipedia (or other) articles, and some of those might not be included except that the articles they link too are so well done and include relevant material.

Array
A common way of structuring data, a list of items, usually all of the same type.
Big O notation
A standard method of expressing the computational complexity of an algorithm.
ADSR Envelope
Attack: the initial, usually abrupt escalation of volume at the beginning of a note.
Decay: the rapid loss of some of that volume immediately following the attack phase.
Sustain: a period of more stable volume following the decay phase.
Release: the final attenuation of volume to zero.
Anchor
My name for an intermediary object used to establish the frequencies of the fundamentals of the harmonic series composing a harmonic structure, and the frequency of their Highest Common Fundamental. The frequency of the Anchor is specified by the combination of two factors multiplied together, a scalar and an integer ratio.
Base Frequency
My name for an intermediary object which may be used in conjunction with the Anchor, providing the scalar factor.
Beat Frequency
A periodic variation in volume at a rate that is the difference between the frequencies of two simultaneous tones.
C-family Programming Languages
For the present purpose, C, C++, and Objective-C.
Callback
Code you provide to a framework which it calls when the conditions are right or when the time comes.
CD Quality
Two channels of 16-bit integer values at 44100 samples per channel per second.
Consonance
A quality of "simultaneous or successive sounds...associated with sweetness, pleasantness, and acceptability" best exemplified by chords composed of frequencies all related by ratios of small integers.
Consonant Transposition
Moving a harmonic structure up/down-scale as a unit, by an integer-ratio factor.
CPU Cycle
Not exactly a precise unit of measure, because various instructions take differing amounts of time to complete, because multiple instructions may be 'in-flight' simultaneously, and because it is becoming increasingly common to offload much of the work to coprocessors better adapted for particular classes of algorithms. Even so, it still works as a rough measure of computational effort.
Cycle
One repetition of a repeating pattern or event.
Cycles per Second
The number of repetitions of a repeating pattern or event with each passing second.
Digital Audio
The encoding of audio signals into or their synthesis in digital form, subsequent processing, and decoding to analog signals to drive speakers.
Double Precision
A floating point number with relatively high precision, usually occupying 64 bits.
Floating Point Number
A means of expressing very large, very small, and fractional values.
Frequency
The rate of repetition of a repeating pattern or event; for sound usually expressed in cycles per second (Hertz or Hz).
Fundamental
The lowest member of a harmonic series, every other member of the series being an integer multiple of the fundamental.
Harmonic
A member of a harmonic series, an integer multiple of the fundamental.
Harmonic Number
An integer representing both the factor by which the frequency of the fundamental of a harmonic series is multiplied to produce the frequency of a particular harmonic and the position of that harmonic within the series, where the fundamental itself is the first harmonic.
Harmonic Series
A sequence of integer multiples of a fundamental, of a fundamental frequency in the context of sound.
Harmonic Structure
Two or more harmonic series the fundamentals of which are related by integer ratios, having members with the same frequency at different harmonic numbers (although these may occur at harmonic numbers too high for inclusion in a given implementation).
Hertz (Hz)
Cycles per second.
Highest Common Fundamental (HCF)
The highest frequency which can serve as the fundamental of a harmonic series including every member of every harmonic series constituting a harmonic structure.
Index (plural: Indices)
A means of specifying a particular member of an array.
Integer
A whole number: ..., -3, -2, -1, 0, 1, 2, 3, ...
Integer Ratio
A ratio in which both the numerator and denominator are positive integers. In the context of ratio-based music, ratios composed of small integers are strongly preferred.
Intensity
An abstract representation of volume, which may or may not scale linearly.
Inverse (multiplicative)
The result of reversing the numerator and denominator of a ratio.
Modulo Division
Extraction of the remainder from a division, as opposed to its truncation or expression as a fractional result.
Note
An instance of a tone, generated either programmatically or in response to a user event.
Phase
The state of completion of the current cycle of a repeating pattern or event.
Phase Advancement
The amount by which the phase changes between one point in time (one sample) and the next.
Pi (𝜋)
The ratio between the circumference and the diameter of a circle.
Pitch
Used interchangeably with frequency, but occasionally with the suggestion of subjectivity.
Radian (rad)
The angle traversed by wrapping the radius of a circle around its circumference; commonly used as the unit for an argument in functions that calculate trigonometric values.
Ratio (fraction)
A proportionality between two quantities, calculated by dividing one (the numerator or dividend) by the other (the denominator or divisor), using a variation on division that preserves any remainder as a fractional component of the result, for example a quotient of type Double.
Real-time
Any computational context where both the initiation and completion of a sequence of operations are time-constrained to the extent that efficiency becomes a high priority.
Sample
A single value, representing a single instant, in a sequence of values composing a digital audio signal.
Sample Rate
The number of samples per second composing a digital audio signal.
Secondary Harmonics
The harmonics of a member of a harmonic structure.
Sine
A repeating trigonometric function.
Sine Wave
A graph of the sine function, and, by analogy, any phenomenon having a similar pattern, like sound.
Sound
The sensory experience of a sound wave.
Sound Wave
Propagating variations in air pressure, or a graph of those variations.
Table
An ordered list of values of the same type, frequently implemented as an array.
Tone
Used interchangeably with frequency, but occasionally with the implication of a voice being applied to that frequency.
Truncation
Discarding the fractional portion of a floating point value, as when performing conversion to an integer. Also discarding the remainder in integer division.
Unit Circle
A circle with a radius of 1.0, frequently centered on the origin of a two-dimensional coordinate system (x = 0.0 and y = 0.0); the foundational concept for much/most of trigonometry.
Unsigned Integer
An integer with no sign bit, representing a value that is greater than or equal to zero.
Voice
Any attributes in the synthesis of a note other than its basic frequency and the overall volume, for example the ADSR Envelope or emphasis on different harmonics as the note progresses.
Zero-based Indexing
The first element of an array has index 0.

Feel free to comment with suggestions, terms to include and/or definitions, or if you disagree with a definition I've supplied. If I use a definition that you've supplied, I'll provide attribution by linking to the comment, unless you specify that I should not do so.

Sunday, January 20, 2019

From Harmonic Structure to HCF to Sample Value, Part 2: Multiples of Sine Phase Advancement per Time

Beginning in Part 5 of the previous series, I've already gone into some detail regarding what I've termed the Highest Common Fundamental (HCF). I may revisit this, but that existing explanation seems adequate for the present purpose.

The main reason for caring about the HCF, perhaps the only reason, is that it can be used to generate any tone in the harmonic structure associated with it. To achieve this, some conceptual agility is required.

The first step is to determine the position of the HCF relative to some reference which is generally stable with regard to the harmonic structure (the Anchor), expressed as an integer ratio, and to use that ratio to determine its frequency. Any change to the structure will necessitate recalculation of this ratio and the resulting frequency.

Next, that frequency is recast as a rate of sine phase advancement. The units for this are the same as for frequency, and, as mentioned in the previous installment, there are various ways of expressing this:

  • cycles per second (Hz)
  • cycles per sample
  • radians per second
  • radians per sample
  • sine table indices per second
  • sine table indices per sample

The default choice for specifying the frequency of the HCF is cycles per second (Hz), but those may not be the most appropriate units for specifying the HCF's rate of sine phase advancement. Let's take a closer look at how we'll be using that quantity.

When sound generation starts, we'll be setting the phase of the HCF in motion. For each sample, it will be advanced by an amount determined by the frequency. If that amount is expressed 'per sample' rather than 'per second' the advancement can be a simple addition, with a check for exceeding (>=) 1.0 cycles, 2𝜋 radians, or the number of elements in the sine table (and, if that check returns true, subtracting 1.0 cycles, 2𝜋 radians, or the number of elements in the sine table).

Since we'll be using the phase on a 'per sample' basis, let's remove the 'per second' options from the list, leaving us with:

  • cycles per sample
  • radians per sample
  • sine table indices per sample

To produce the contribution of a particular harmonic to a single sample, we'll multiply the phase (cycles, radians, or sine table indices) of the HCF for that sample by the harmonic number (in terms of the HCF) of the harmonic we want to generate, extract from that product just the portion by which it exceeds the nearest multiple of 1.0 cycles, 2𝜋 radians, or the number of elements in the sine table (modulo division, or the equivalent), and translate that into an index into the sine table to retrieve a sine value.

For phase expressed in cycles, instead of using modulo division, from the product of the first step above we can simply extract the fractional portion (x - trunc(x)), multiply that by the number of elements in the sine table, and truncate that result to produce a usable index.

For phase expressed in sine table indices, modulo division by the number of elements in the sine table is necessary, but once that's done a single truncation is all that's required to produce an index for table lookup.

Phase expressed in radians has neither of these advantages. It requires both the modulo division and multiplication by a conversion factor, followed by truncation, so let's eliminate it, leaving us with just two choices — cycles or sine table indices.

It comes down to which is more expensive (in terms of cpu cycles), modulo division or an additional truncation, a subtraction, and a multiplication. That seems like a pretty easy call, modulo division is probably several times more expensive than the combination of three fast operations. This might seem trivial, but if you want to be able to generate multiple simultaneous tones, each composed of multiple secondary harmonics, 44100 times per second, wringing out those extra cpu cycles becomes important.

So, the winner is phase expressed in cycles and phase advancement in cycles per sample.

Now that we have our units nailed down, let's make another pass through the context and the process of arriving at sample values. The Anchor is like a handle, a convenient point of reference which is nominally stable with regard to the Harmonic Structure, at least between changes to that structure. The Highest Common Fundamental (HCF) is a downward projection of the structure; it cannot be higher than the lowest fundamental of a harmonic series included in the structure, and would typically be even lower, very possibly subsonic. While its position is dictated by the structure, the HCF is defined in terms of the Anchor, by means of an integer ratio, which is used to determine its frequency, in cycles per second (Hz). That frequency is then re-expressed in terms of cycles per sample (simply divide by the sample rate in samples/second), which are also appropriate units for per sample phase advancement.

Everything up to this point is only done once, unless the harmonic structure itself is edited, in which case it is done again. What we now have is an HCF defining a harmonic series which includes every member of the harmonic structure, including all of their secondary harmonics, and a rate of phase advancement for the HCF. We could at this point, calculate rates of phase advancement for each member of the structure and launch separate phase tracking for each in response to user events, but this would result in entirely random interference patterns. So, instead, we will launch phase tracking for the HCF alone, and calculate phase alignment on a per sample basis for each member of the structure which is currently participating in sound generation.

Because we have chosen to express phase (advancement) in cycles (per sample), this calculation is as simple as multiplying the phase of the HCF for the current sample by the harmonic number of the member (in the harmonic series defined by the HCF), and keeping only the fractional portion of the resulting value, the part after the decimal point (x - trunc(x)). That fractional portion of the product is multiplied by the number of elements in the sine table, and the result of that converted to an integer using a truncating initializer.

This index is then used to retrieve a sine value from the table, which is then multiplied by a volume factor (calculated separately) for that member and the current sample, and these values for each of the currently participating members are added together to produce the overall sample value.

Note that the list of currently participating members and the volume factor for each must be maintained on a per-sample (or at the very least per-callback) basis, and either copied into the real-time context or modifications based on user events saved until the real-time callback is done with them, by means of some simple locking mechanism. Even this is probably best done in C or C++, with your Swift code passing in user events by calling C/C++ functions.

For best effect, you'll probably want to either insert an equalizer downstream or incorporate the function of an equalizer into the calculation of volume factors. The latter approach seems preferable, since it removes some load from the real-time pipeline, but equalizers and UI to match are readily available plug-ins, so that might be one optimization too many, at least initially.

That's it in a nutshell, although I may have more to say about specifics as I get further into it myself.

Thursday, January 17, 2019

From Harmonic Structure to HCF to Sample Value, Part 1: Laying the Foundation

I've glossed over this subject previously, but here I'll go into it with greater care, and in greater detail. The context for all that follows is computer software which generates sound (a sequence of sample values) on-the-fly, constrained both by the need to make them available within the time allowed and by the need to minimize the latency experienced by the user of the software, the delay in response to user actions.

So, what is pitch?

My take is that it's very nearly interchangeable with frequency, if perhaps slightly more subjective.

So what is frequency?

Frequency is the rate at which some specific type of event occurs, the number of events over a given time span, or events per unit time. The beating of your heart, for example, would ordinarily be measured in beats per minute. The speed of your car would (in the U.S.) be measured in miles per hour, and the speed of its engine would be measured in revolutions (of the crankshaft) per minute.

Pitch, the frequency of a sound, is typically measured in cycles per second (Hertz, or Hz in abbreviated form). A cycle is a single unit in the repeating pattern of a sine wave, a construct from trigonometry that provides a decent approximation for the manner in which sound is transmitted by successive waves of higher and lower pressure passing through air.

Trigonometry is based around the idea of a unit circle, a circle with a radius of exactly 1. We commonly think of circles as being divided into 360°, but in trigonometry it is more common to express the magnitude of an angle, arc (a partial circle), or rotation in terms of radians. A radian is the same length as the radius, but wrapped around the perimeter of the circle. A unit circle has a circumference of 2pi (2𝜋) or about 6.283185307179586 radians. (Radians are also commonly used in standard functions that compute sine values.)

One cycle of a sine wave is analogous to one complete rotation (2𝜋 radians) of a unit circle. In fact, if you were to roll a unit circle along a horizontal line and track the vertical displacement of some point on that circle by tracing the same vertical position along a vertical line through and moving with the center of the circle, the result would be a sine wave. If you can deal with the 3-dimensional projection, the following animated GIF is an even better visualization. (source: Wikipedia)

What is actually being traced in this animation is a cosine, but the shape is the same as a sine curve.

What all of the foregoing is leading up to is the point that, even in the context of sound, there are other valid measurements of frequency beside cycles per second. We might just as well express frequency in terms of radians per second, if there were any advantage to doing so, and there are even more options.

When dividing one cycle of a sine wave into smaller segments, with the intention of precomputing sine values at evenly spaced intervals and using these to populate an array for later access by means of index values, to avoid the performance hit of having to compute sine values on-the-fly, the number of segments used, which will also be the size of the array, is almost completely arbitrary, at least above a lower threshold at which the division into segments is fine-grained enough to produce acceptable fidelity. One might use 360 segments, or 44,100 segments, the same as the number of samples per second in CD-quality audio. Two other options, 256 and 65536 (2^8 and 2^16 respectively, where '^' is an exponentiation operator), are suggested by potential performance advantages around particular integer math operations. The only downsides to using more, smaller segments are the amount of fast memory consumed by the array, since it must remain in memory whenever sound generation is running, and the time and computation effort (battery power) needed to create the array, if you choose to remove it from memory whenever sound generation ceases.

From here on I will begin using 'array' and 'table' more or less interchangeably. Conceptually, the collection of sine values constitutes a table, but it is necessarily implemented as an array. In this context, both refer to a sequence of index-accessible sine values beginning with sin(0.0) and progressing at even intervals up to but not including sin(2𝜋).

As with various ways of delineating events to be timed, time need not be measured in seconds. Another altogether valid unit of time is the interval between successive audio samples, 1/44100 second in the case of CD-quality audio.

Taken together, the above offers us six different ways of expressing frequency, only one of which is cycles per second. Remember that a cycle is one complete sine wave, that a radian is equal to 1/2𝜋 cycles (about 0.159154943091895), and that the magnitude of sine table indices as a measure of a fraction of a cycle depends upon the size of the array holding the sine values.

  • cycles per second (Hz)
  • cycles per sample
  • radians per second
  • radians per sample
  • sine table indices per second
  • sine table indices per sample

Which of these is chosen has implications for the complexity and performance of the algorithms involved, which will be the subject of the next installment.