Saturday, May 11, 2019

A Larger Vision: one piece falls into place

Over the past couple months the scope of this project has expanded rather suddenly, from one tightly focused on enabling music based on harmonics (also representable as integer-ratio intervals) to one which is still motivated by the desire to support harmonic tonality, but which also strives to be more generally useful. This means more work, but also something I might actually be proud to release into the world, if and when I get it into a state where it's ready for that.

One result of this reconceptualization is that I'll be repurposing the term "base frequency" from "an intermediary object which may be used in conjunction with the anchor, providing the scalar factor" to something more concrete, the sample rate divided by size of one or more lookup tables used to represent wave forms that aren't easily calculated on the fly, for example sine waves. As such it will be a minor detail of the implementation, not something user-facing, except as the user might be a programmer working with a framework, if that turns out to be the direction the project evolves.

Wednesday, April 24, 2019

Moving Targets

I've been letting this project steep on the back burner while firming up my understanding of the basics of the Swift programming language, which I will be using, likely in combination with C for the most demanding real-time code. This has been a propitious pause, as it has surfaced rather gaping oversights in how I've thought about what I've set out to do. What follows is the current state of my evolving understanding and intention.

Caveat: My custom terminology is also still in flux, and usage going forward may not correlate exactly with what came before. I will endeavor to nail down this slippery business sooner rather than later.

Most fundamentally, while making harmonic-based melody more accessible is the primary motivation driving my interest in this project, baking that into the design in a form that makes working with or folding in other tonal systems unnecessarily difficult would be a mistake. This is easily accommodated by defining the frequencies of of available tones in terms of floating point numbers rather than integers. To keep compound error to a minimum, these should be double precision (64 bit).

Since, as previously mentioned, the simplest way to calculate sine table indices begins with tracking the per-sample phase of a 1.0 Hz base frequency, there no longer seems to be a clear purpose for the HCF (Highest Common Fundamental). However, I'm not confident this concept won't still prove valuable, so let's put it on the shelf for the time being. If it comes back off that shelf, it might well be under another, hopefully less clumsy name.

If tones can be specified simply in terms of their first-harmonic frequencies in Hz, expressed as double precision floating point numbers, rudimentary support for pitch bending and sliding becomes a simple matter of respecifying that first-harmonic frequency on a per-sample basis. I say 'rudimentary' because I suspect providing such support while avoiding artifacts will turn out to be more complicated than this.

Next there's the matter of the phases of overtones not necessarily being perfectly aligned with (typically trailing) the phase of a tone's first harmonic. For the moment let's call this overtone offset, since accommodating this can be as simple as adding an offset to the per-sample phase calculated for each overtone. That offset might be calculated as a fraction of the first harmonic's cycle time, and applied before conversion to a sine table index, although moving at least part of that calculation outside of the real-time context and passing the result in as a simple quantity would make sense.

Given overtones with phase offsets, the question arises whether we might want the option of defining tones in terms of multiple instances of overtones, each with its own per-sample offset and amplitude. Since this could so complicate real-time calculations that polyphony becomes problematic, I'm inclined to also put this idea on the shelf, until I've given more thought to the possibility of voices with some/all of the complicated rendering having been precomputed.

The main obstacle I see in the path of precomputation is the aspiration to make the sound output responsive to factors like velocity, pressure, up/down-scale movement, and time-to-release, which can't be known in advance. As a workaround, it should at least be possible to capture these while producing a less nuanced rendering in real time, then apply them after the fact, editing as needed to achieve the desired effect.

In any case, multiple overlapping notes using the same tone should be available, each with its own set of overtones and their variable attributes, with offsets also optionally applied to their first harmonics, for the purpose of generating echoes if nothing else. Considering this, providing multiple per-note instances of overtones might simply be needless complication.

Finally, because there's a temptation to withhold functionality from the real-time context in order make sure rendering can happen in a timely manner, this project really wants to split into two components (modes), one (stage) focused on real-time performance, and the other (studio) focused on providing a full set of features. The communication between these two modes is a sort of bidirectional funnel, and needs to be well defined. An advantage of this requirement is that it is an obvious place to look for an organizing principle, around which to build out the rest of the model and basic functionality.

As such, it may also prove a suitable focal point for any open source initiative, allowing 'stage' and 'studio' applications from different vendors to interoperate. But I'm really getting way ahead of myself in even mentioning that. First I need to build out my own project, then maybe I can think about turning it into an open-source project.

Addendum (25April2019): This is not even close to being a final decision, but I'm thinking it makes the most sense to specify, for any given note, the per-sample frequency, amplitude, and phase offset of the first harmonic, and then to specify the same attributes for higher harmonics (overtones) relative to that, although, for the sake of efficiency, it will be desirable to precompute as much of this as can be without sacrificing responsiveness to the performer.

Saturday, February 16, 2019

The Elements of Voice

In a previous post on this blog, I defined voice as "Any attributes in the synthesis of a note other than its basic frequency and the overall volume, for example the ADSR Envelope or emphasis on different harmonics as the note progresses." You can also find a brief explanation of the ADSR Envelope in that same post.

In RatioKey 1.1 (removed from the App Store more than two years ago), I provided the means to edit the duration of each phase of the ADSR envelope, as well as the volume at the point where each phase transitions into the next. This helped make up for that app only being capable of generating a single simple sine wave at a time, with each new note interrupting the previous note, and no support at all for overtones.

Even back in 2010, while working on that app, I wanted to be able to synthesize more interesting voices, composed of harmonics (what I'd now term secondary harmonics), with the intensity of each varying independently over time, and to craft a simple interface for editing such voices, but at that time I had no clear idea how to generate multiple simultaneous notes, much less how to build them from harmonic components.

Over the intervening years, I've ferreted out solutions for various aspects of this problem space, but it wasn't until I'd experienced the absence of phase alignment, motivating a reevaluation of my approach, which led to the idea of 1) determining the Highest Common Fundamental (HCF), 2) tracking its phase on a per-sample basis, and 3) using that phase to generate indices for sine table lookup on a per-sample basis for members of a harmonic structure, that I felt confident I could actually do it. That was, for me, the key missing piece to the puzzle.

In the process of fleshing out that idea, I had another eureka moment when I realized that this approach would not only facilitate the synthesis of any member of a harmonic structure while guaranteeing phase alignment, but it would also enable per-sample modulation of the harmonics of those structure members (secondary harmonics) by the very same method, since they are also part of the harmonic structure.

Given the ability to independently control the intensity of secondary harmonics over time, my sense is that this should supersede the ADSR paradigm. Yes, you might still want to ramp up the volume very quickly, drain some of it back off almost as quickly, then hold it nearly steady for awhile, before tapering off to silence, but this is just as easily achieved by controlling the intensity of component harmonics as by controlling that of the basic pitch.

Per-sample control of harmonic intensities, translated into physical terms, equates to moving acoustic energy around among harmonics, much as we do with our tongues and the way we shape our mouths while speaking. This might be approached with the discipline of an engineer applying the conservation of energy, or utterly fancifully, or anywhere in between. It could be used to mimic familiar sounds, or to create sounds even a veteran sound collector or foley artist would be hard pressed to find in the wild or produce physically.

There are also elements of voice that this approach, as currently conceived, does not support, notably any sort of pitch bending or sliding, except as these might be applied to a harmonic structure as a unit, rather than to individual notes. In the current version, all members of the harmonic structure, including the secondary harmonics, are discrete pitches.

(Yes, it should be possible to support pitch bending and sliding by allowing variable factors relating the HCF to parts of the structure. Strictly speaking, in that event, it would at least intermittently cease to be a harmonic structure. This may be a case where accommodation is more important than conceptual cohesion, and worth the added complexity. Further contemplation is indicated.)

Tuesday, January 29, 2019

From Harmonic Structure to HCF to Sample Value, Part 5: Focusing on Pitch Specification and Alteration

Up to this point I've treated the Anchor (and Base Frequency, possibly not mentioned here since 2010) as more-or-less integral aspects of a Harmonic Structure, but really the Anchor only exists to provide a couple of services.

First, and most obviously, the Anchor is a point of reference for specifying the pitches of the fundamentals of the harmonic series composing the structure, and also of the HCF (Highest Common Fundamental). For this purpose it is enough that the Anchor's own frequency be unambiguous. Tuning would simply involve incremental alterations to that frequency.

The other service the Anchor provides is the ability to move a harmonic structure up/down-scale as a unit, by integer-ratio factors. This is what I previously referred to as "Consonant Transposition" on the theory that such a change is likely to be more consonant than using an irrational factor.

There could be other ways to provide these services, of course, including the option of separating the scalar component of the definition of the Anchor's frequency from the integer-ratio component, by bringing back the concept of a Base Frequency.

The Base Frequency would be specified simply using a Double (double precision floating point value), which you could think of as a multiplication factor that is always applied to 1.0 Hertz.

The Anchor would then be specified as an integer-ratio multiple of the Base Frequency.

Tuning would be accomplished by altering the factor relating the Base Frequency to 1.0 Hz, and consonant transposition would be accomplished by altering the ratio relating the Anchor to the Base Frequency.

This seems a little cleaner to me than combining a Double and an integer ratio into a 'dual-component' type, but your mileage may very.

In any case, these details need not be exposed to the user! What matters is that the pitches of the fundamentals of the series composing the harmonic structure are tunable as a unit and editable by integer-ratio factors, collectively as well as individually, and that those pitches as well as that of the HCF are clearly specified.

Sunday, January 27, 2019

From Harmonic Structure to HCF to Sample Value, Part 4: Focusing on Phase & Phase Advancement

So maybe you're a little hazy on what is meant by phase, even more so regarding phase advancement, and not at all convinced I know what I'm talking about in suggesting that repeatedly multiplying the phase of a lower frequency by a positive integer can be used to generate a higher frequency. Like, how does that work?

Phase relates back to the sine wave, which itself relates back to the unit circle, but this is beginning to feel like a circular definition. What does it really mean?

Let's approach this from a different direction, using an analogy. Say you have a shaft, rotating at one degree per second. It's going to take that shaft 360 seconds to complete one rotation. Now say you have another shaft, the position of which is updated once per second according to the rule that its new position should be twice that of the first shaft. If the first shaft has moved 10 degrees, the second shaft will have moved 20 degrees. If the first shaft has moved 50 degrees, the second shaft will have moved 100.

But what happens when the first shaft has moved 180 degrees and the second shaft has moved 360 degrees? The second, faster shaft is already back where it started while the first shaft is still only halfway around. Fine, no problem, it's free to keep right on moving, starting a second rotation while the first shaft finishes its first, but because doubling the number of degrees the first shaft has turned will now result in a number larger than 360, we'll need to remove the first 360 degrees to bring the result into a range we can work with. So, essentially, when it gets to 360 degrees the second shaft resets to 0 degrees and keeps on moving.

Likewise, when the first shaft gets to 360 degrees, it also resets to zero and keeps moving.

But what if for every degree the first shaft moves the second shaft moves 5 degrees. The same principle applies, but because we're getting the position of the second shaft by multiplying the position of the first shaft by 5, it won't be enough to subtract 360 degrees after its first rotation, we'll need something that will work no matter how many rotations it has already completed. That something is modulo division.

In this example, after multiplying the position of the first shaft by 5 we'll take the result of that and apply modulo 360, to remove all of the full turns and leave only the amount by which the second shaft's new position exceeds a full turn. We could use the same approach for the first shaft, but in that case it's simpler to just subtract 360 degrees every time it completes a full rotation.

You may recall, in a previous installment I said that if you measure phase (rotation) in cycles, modulo division isn't necessary. This is because if we were to use modulo division in that case, it would be modulo 1.0, which is exactly equivalent to simply keeping the fractional portion of a decimal number and discarding everything to the left of the decimal point.

So, to ease back into more standard terminology, phase equates to how much the rotation of a shaft, at any given point in time, exceeds an indeterminate number of complete rotations. How far beyond the start/end point of a cycle it has progressed, and phase advancement equates to how much rotation occurs between one point in time and the next, one second and the next in the above example. It is a rate of change.

Note that in the above example we only applied phase advancement to the first shaft, to determine its phase at the next point in time, and used that to calculate the phase at the same point in time for the second shaft. The rate of phase advancement for the second shaft is only implied, never explicit.

Using this approach we might add a third shaft, applying the same multiplier to the phase of the first shaft as we did for the second shaft, and be confident that the second and third shafts would always be perfectly synchronized, rotating in lockstep.

A cycle is a cycle, whether it's a sine wave or a rotating shaft or the interplay of the tilt of Earth's rotational axis with its movement around the sun, creating seasons.

Phase is what portion of the next full cycle has been completed, and phase advancement is the rate of change of the phase, change/time. For a shaft, phase advancement is how fast it is turning. For a sound, phase advancement is its frequency, its pitch. For Earth's seasons, phase advancement is how quickly one passes into the next.

If you were confused before, I hope that you are now at least less confused.

Tuesday, January 22, 2019

From Harmonic Structure to HCF to Sample Value, Part 3: Clarifying Terminology

This is very much a work in progress. No doubt the list will grow over time, as inspiration strikes and I have time to give to it. Some items link to Wikipedia (or other) articles, and some of those might not be included except that the articles they link too are so well done and include relevant material.

A common way of structuring data, a list of items, usually all of the same type.
Big O notation
A standard method of expressing the computational complexity of an algorithm.
ADSR Envelope
Attack: the initial, usually abrupt escalation of volume at the beginning of a note.
Decay: the rapid loss of some of that volume immediately following the attack phase.
Sustain: a period of more stable volume following the decay phase.
Release: the final attenuation of volume to zero.
My name for an intermediary object used to establish the frequencies of the fundamentals of the harmonic series composing a harmonic structure, and the frequency of their Highest Common Fundamental. The frequency of the Anchor is specified by the combination of two factors multiplied together, a scalar and an integer ratio.
Base Frequency
My name for an intermediary object which may be used in conjunction with the Anchor, providing the scalar factor.
Beat Frequency
A periodic variation in volume at a rate that is the difference between the frequencies of two simultaneous tones.
C-family Programming Languages
For the present purpose, C, C++, and Objective-C.
Code you provide to a framework which it calls when the conditions are right or when the time comes.
CD Quality
Two channels of 16-bit integer values at 44100 samples per channel per second.
A quality of "simultaneous or successive sounds...associated with sweetness, pleasantness, and acceptability" best exemplified by chords composed of frequencies all related by ratios of small integers.
Consonant Transposition
Moving a harmonic structure up/down-scale as a unit, by an integer-ratio factor.
CPU Cycle
Not exactly a precise unit of measure, because various instructions take differing amounts of time to complete, because multiple instructions may be 'in-flight' simultaneously, and because it is becoming increasingly common to offload much of the work to coprocessors better adapted for particular classes of algorithms. Even so, it still works as a rough measure of computational effort.
One repetition of a repeating pattern or event.
Cycles per Second
The number of repetitions of a repeating pattern or event with each passing second.
Digital Audio
The encoding of audio signals into or their synthesis in digital form, subsequent processing, and decoding to analog signals to drive speakers.
Double Precision
A floating point number with relatively high precision, usually occupying 64 bits.
Floating Point Number
A means of expressing very large, very small, and fractional values.
The rate of repetition of a repeating pattern or event; for sound usually expressed in cycles per second (Hertz or Hz).
The lowest member of a harmonic series, every other member of the series being an integer multiple of the fundamental.
A member of a harmonic series, an integer multiple of the fundamental.
Harmonic Number
An integer representing both the factor by which the frequency of the fundamental of a harmonic series is multiplied to produce the frequency of a particular harmonic and the position of that harmonic within the series, where the fundamental itself is the first harmonic.
Harmonic Series
A sequence of integer multiples of a fundamental, of a fundamental frequency in the context of sound.
Harmonic Structure
Two or more harmonic series the fundamentals of which are related by integer ratios, having members with the same frequency at different harmonic numbers (although these may occur at harmonic numbers too high for inclusion in a given implementation).
Hertz (Hz)
Cycles per second.
Highest Common Fundamental (HCF)
The highest frequency which can serve as the fundamental of a harmonic series including every member of every harmonic series constituting a harmonic structure.
Index (plural: Indices)
A means of specifying a particular member of an array.
A whole number: ..., -3, -2, -1, 0, 1, 2, 3, ...
Integer Ratio
A ratio in which both the numerator and denominator are positive integers. In the context of ratio-based music, ratios composed of small integers are strongly preferred.
An abstract representation of volume, which may or may not scale linearly.
Inverse (multiplicative)
The result of reversing the numerator and denominator of a ratio.
Modulo Division
Extraction of the remainder from a division, as opposed to its truncation or expression as a fractional result.
An instance of a tone, generated either programmatically or in response to a user event.
The state of completion of the current cycle of a repeating pattern or event.
Phase Advancement
The amount by which the phase changes between one point in time (one sample) and the next.
Pi (𝜋)
The ratio between the circumference and the diameter of a circle.
Used interchangeably with frequency, but occasionally with the suggestion of subjectivity.
Radian (rad)
The angle traversed by wrapping the radius of a circle around its circumference; commonly used as the unit for an argument in functions that calculate trigonometric values.
Ratio (fraction)
A proportionality between two quantities, calculated by dividing one (the numerator or dividend) by the other (the denominator or divisor), using a variation on division that preserves any remainder as a fractional component of the result, for example a quotient of type Double.
Any computational context where both the initiation and completion of a sequence of operations are time-constrained to the extent that efficiency becomes a high priority.
A single value, representing a single instant, in a sequence of values composing a digital audio signal.
Sample Rate
The number of samples per second composing a digital audio signal.
Secondary Harmonics
The harmonics of a member of a harmonic structure.
A repeating trigonometric function.
Sine Wave
A graph of the sine function, and, by analogy, any phenomenon having a similar pattern, like sound.
The sensory experience of a sound wave.
Sound Wave
Propagating variations in air pressure, or a graph of those variations.
An ordered list of values of the same type, frequently implemented as an array.
Used interchangeably with frequency, but occasionally with the implication of a voice being applied to that frequency.
Discarding the fractional portion of a floating point value, as when performing conversion to an integer. Also discarding the remainder in integer division.
Unit Circle
A circle with a radius of 1.0, frequently centered on the origin of a two-dimensional coordinate system (x = 0.0 and y = 0.0); the foundational concept for much/most of trigonometry.
Unsigned Integer
An integer with no sign bit, representing a value that is greater than or equal to zero.
Any attributes in the synthesis of a note other than its basic frequency and the overall volume, for example the ADSR Envelope or emphasis on different harmonics as the note progresses.
Zero-based Indexing
The first element of an array has index 0.

Feel free to comment with suggestions, terms to include and/or definitions, or if you disagree with a definition I've supplied. If I use a definition that you've supplied, I'll provide attribution by linking to the comment, unless you specify that I should not do so.

Sunday, January 20, 2019

From Harmonic Structure to HCF to Sample Value, Part 2: Multiples of Sine Phase Advancement per Time

Beginning in Part 5 of the previous series, I've already gone into some detail regarding what I've termed the Highest Common Fundamental (HCF). I may revisit this, but that existing explanation seems adequate for the present purpose.

The main reason for caring about the HCF, perhaps the only reason, is that it can be used to generate any tone in the harmonic structure associated with it. To achieve this, some conceptual agility is required.

The first step is to determine the position of the HCF relative to some reference which is generally stable with regard to the harmonic structure (the Anchor), expressed as an integer ratio, and to use that ratio to determine its frequency. Any change to the structure will necessitate recalculation of this ratio and the resulting frequency.

Next, that frequency is recast as a rate of sine phase advancement. The units for this are the same as for frequency, and, as mentioned in the previous installment, there are various ways of expressing this:

  • cycles per second (Hz)
  • cycles per sample
  • radians per second
  • radians per sample
  • sine table indices per second
  • sine table indices per sample

The default choice for specifying the frequency of the HCF is cycles per second (Hz), but those may not be the most appropriate units for specifying the HCF's rate of sine phase advancement. Let's take a closer look at how we'll be using that quantity.

When sound generation starts, we'll be setting the phase of the HCF in motion. For each sample, it will be advanced by an amount determined by the frequency. If that amount is expressed 'per sample' rather than 'per second' the advancement can be a simple addition, with a check for exceeding (>=) 1.0 cycles, 2𝜋 radians, or the number of elements in the sine table (and, if that check returns true, subtracting 1.0 cycles, 2𝜋 radians, or the number of elements in the sine table).

Since we'll be using the phase on a 'per sample' basis, let's remove the 'per second' options from the list, leaving us with:

  • cycles per sample
  • radians per sample
  • sine table indices per sample

To produce the contribution of a particular harmonic to a single sample, we'll multiply the phase (cycles, radians, or sine table indices) of the HCF for that sample by the harmonic number (in terms of the HCF) of the harmonic we want to generate, extract from that product just the portion by which it exceeds the nearest multiple of 1.0 cycles, 2𝜋 radians, or the number of elements in the sine table (modulo division, or the equivalent), and translate that into an index into the sine table to retrieve a sine value.

For phase expressed in cycles, instead of using modulo division, from the product of the first step above we can simply extract the fractional portion (x - trunc(x)), multiply that by the number of elements in the sine table, and truncate that result to produce a usable index.

For phase expressed in sine table indices, modulo division by the number of elements in the sine table is necessary, but once that's done a single truncation is all that's required to produce an index for table lookup.

Phase expressed in radians has neither of these advantages. It requires both the modulo division and multiplication by a conversion factor, followed by truncation, so let's eliminate it, leaving us with just two choices — cycles or sine table indices.

It comes down to which is more expensive (in terms of cpu cycles), modulo division or an additional truncation, a subtraction, and a multiplication. That seems like a pretty easy call, modulo division is probably several times more expensive than the combination of three fast operations. This might seem trivial, but if you want to be able to generate multiple simultaneous tones, each composed of multiple secondary harmonics, 44100 times per second, wringing out those extra cpu cycles becomes important.

So, the winner is phase expressed in cycles and phase advancement in cycles per sample.

Now that we have our units nailed down, let's make another pass through the context and the process of arriving at sample values. The Anchor is like a handle, a convenient point of reference which is nominally stable with regard to the Harmonic Structure, at least between changes to that structure. The Highest Common Fundamental (HCF) is a downward projection of the structure; it cannot be higher than the lowest fundamental of a harmonic series included in the structure, and would typically be even lower, very possibly subsonic. While its position is dictated by the structure, the HCF is defined in terms of the Anchor, by means of an integer ratio, which is used to determine its frequency, in cycles per second (Hz). That frequency is then re-expressed in terms of cycles per sample (simply divide by the sample rate in samples/second), which are also appropriate units for per sample phase advancement.

Everything up to this point is only done once, unless the harmonic structure itself is edited, in which case it is done again. What we now have is an HCF defining a harmonic series which includes every member of the harmonic structure, including all of their secondary harmonics, and a rate of phase advancement for the HCF. We could at this point, calculate rates of phase advancement for each member of the structure and launch separate phase tracking for each in response to user events, but this would result in entirely random interference patterns. So, instead, we will launch phase tracking for the HCF alone, and calculate phase alignment on a per sample basis for each member of the structure which is currently participating in sound generation.

Because we have chosen to express phase (advancement) in cycles (per sample), this calculation is as simple as multiplying the phase of the HCF for the current sample by the harmonic number of the member (in the harmonic series defined by the HCF), and keeping only the fractional portion of the resulting value, the part after the decimal point (x - trunc(x)). That fractional portion of the product is multiplied by the number of elements in the sine table, and the result of that converted to an integer using a truncating initializer.

This index is then used to retrieve a sine value from the table, which is then multiplied by a volume factor (calculated separately) for that member and the current sample, and these values for each of the currently participating members are added together to produce the overall sample value.

Note that the list of currently participating members and the volume factor for each must be maintained on a per-sample (or at the very least per-callback) basis, and either copied into the real-time context or modifications based on user events saved until the real-time callback is done with them, by means of some simple locking mechanism. Even this is probably best done in C or C++, with your Swift code passing in user events by calling C/C++ functions.

For best effect, you'll probably want to either insert an equalizer downstream or incorporate the function of an equalizer into the calculation of volume factors. The latter approach seems preferable, since it removes some load from the real-time pipeline, but equalizers and UI to match are readily available plug-ins, so that might be one optimization too many, at least initially.

That's it in a nutshell, although I may have more to say about specifics as I get further into it myself.