Tuesday, June 25, 2019

More complications, leading to a potential solution

This evening it occurred to me that varying the pitch of a note, while generating its phase from a multiple of the phase of a base tone, might result in artifacts. I'm not certain of this, and cannot yet articulate why I think it could happen, but it seems at least plausible.

A solution also occurred to me, which is to only use the base tone to generate the initial phase of the note, and from that point on track its phase independently. That thought lead to another complication; when you want the varying pitch to come to rest on a specific tone, the phase of the note may not align with a newly generated note of the same frequency.

I first thought about pacing the change in pitch so it would end up phase-aligned on the target frequency. This would work for scripted compositions, but in live performance it isn't possible to know what the target frequency will be until it happens.

So it seems as though a better solution would be to cross-fade from the sliding note to a newly generated note which is stable on the target frequency.

But, if this mechanism (independently tracking the phase of each note, after initiating it using the phase of the base frequency) is in place for notes with varying pitch, why not just use it for all notes, and not have to worry about whether they will remain at a constant pitch?

Applying this technique to all notes would mean that the base tone is only used to initiate new notes, which would mean precision is no longer an issue, so we can dispense with 80-bit floats!

[7/5/19: The thought that set this all in motion, that varying the pitch of a note while generating its phase from a multiple of the phase of a base tone might result in artifacts (noise), remains a matter of conjecture. I haven't yet hit upon a way of determining whether this is an actual concern. However, eliminating the need for 80-bit floats is sufficient motivation to proceed as though it were established fact.]

Wednesday, June 19, 2019

Ground-shifting changes

We've all had a couple weeks to assimilate all that was announced at WWDC, and those who surf the bleeding edge have been very busy getting up to speed and producing blog posts, newsletters, podcasts, and videos paving the way for the rest of us.

Just listing all of the resources already available would be a formidable task, so instead I'll just mention a couple of good starting points.

For anything related to the Swift programming language, the Swift.org website is the center of the universe. What you won't find there is much in the way of links to blogs, newsletters, podcasts, or YouTube channels relating even to Swift development.

That gaps is nicely filled by Dave Verwer's iOS Dev Directory, which does not include a link to this blog, nor should it!

I don't expect to have much to say here for a few months. In the meantime, you can catch me on Twitter at https://twitter.com/harmonicLattice.

Sunday, June 16, 2019

Navigating a larger problem space

On the same day as my most recent post here, I also began a thread on Twitter, in which I laid out the opportunities and constraints presented by various approaches to generating tones by multiplying the phase of a base tone by frequency ratios.

This took several hours, and I had to finish it the next morning, nevertheless, except for a minor glitch or two, I think I managed to get it straight, possibly for the first time.

Only generating tones that are all integer multiples of the base tone is significantly simpler, but taking that simple approach precludes the use of any musical practice involving pitch variation — bending, sliding, or vibrato.

For the purpose of producing a fuller sound, more like a physical instrument, the set of pure tones that are all integer multiples of the base tone is just too confining. Unfortunately, the alternative seems to be to use phases that continue to increase indefinitely, tracking them using high precision floating-point numbers to keep it working long enough to be usable. I keep thinking there must be a clever hack that would make this all unnecessary, but so far this has just lead me down rabbit holes.

The rabbit holes have become a problem because I cannot hold everything in that Twitter thread in my mind at once; I have to deal with it as I posted it there, in Tweet-sized bites, and have more than once lost track of one detail or another.

If you think of a cycle as being a circle, and phase as being an angle superimposed on that circle, or a position on its circumference, continuously increasing phases can be thought of as wrapping, winding, or coiling around that circle.

The need for high precision comes in because this approach involves multiplying the phase of the base tone by a frequency ratio that might have a value as high as 20,000, then discarding everything to the left of the decimal point, leaving only whatever significant figures were to the right of the decimal point. As that base tone phase increases, so too does the result of multiplying it by the frequency ratio, meaning there are fewer and fewer significant figures remaining on the right, and sooner or later insufficient precision to properly use it for the next step, conversion either into an index for a lookup table or directly into the magnitude of a sound wave for a particular sample, by means of an algorithm. Using higher-precision (80-bit) floating-point numbers buys time.

This inelegant approach grates on my sensibilities as a programmer, but, short of returning to only trying to produce tones that are integer multiples of the base tone, I haven't yet found any way around it.

Thursday, June 06, 2019

Cognitive paralysis: hopefully temporary

I'm presently doing a pretty good emulation of a robot that's got itself 'trapped' in a corner its programming is inadequate to escape. With any luck, this will pass, but I consider myself fortunate to have recognized the symptoms and desisted from digging myself even deeper into confusion.

Saturday, May 11, 2019

A Larger Vision: one piece falls into place

Over the past couple months the scope of this project has expanded rather suddenly, from one tightly focused on enabling music based on harmonics (also representable as integer-ratio intervals) to one which is still motivated by the desire to support harmonic tonality, but which also strives to be more generally useful. This means more work, but also something I might actually be proud to release into the world, if and when I get it into a state where it's ready for that.

One result of this reconceptualization is that I'll be repurposing the term "base frequency" from "an intermediary object which may be used in conjunction with the anchor, providing the scalar factor" to something more concrete, the sample rate divided by size of one or more lookup tables used to represent wave forms that aren't easily calculated on the fly, for example sine waves. As such it will be a minor detail of the implementation, not something user-facing, except as the user might be a programmer working with a framework, if that turns out to be the direction the project evolves.

(Update, 06June2019: At this time, ALL custom terminology should be considered temporary and subject to redefinition, replacement, or deprecation. If/when this all stabilizes, I'll post an updated lexicon.)

Wednesday, April 24, 2019

Moving Targets

I've been letting this project steep on the back burner while firming up my understanding of the basics of the Swift programming language, which I will be using, likely in combination with C for the most demanding real-time code. This has been a propitious pause, as it has surfaced rather gaping oversights in how I've thought about what I've set out to do. What follows is the current state of my evolving understanding and intention.

Caveat: My custom terminology is also still in flux, and usage going forward may not correlate exactly with what came before. I will endeavor to nail down this slippery business sooner rather than later.

Most fundamentally, while making harmonic-based melody more accessible is the primary motivation driving my interest in this project, baking that into the design in a form that makes working with or folding in other tonal systems unnecessarily difficult would be a mistake. This is easily accommodated by defining the frequencies of of available tones in terms of floating point numbers rather than integers. To keep compound error to a minimum, these should be double precision (64 bit).

Since, as previously mentioned, the simplest way to calculate sine table indices begins with tracking the per-sample phase of a 1.0 Hz base frequency, there no longer seems to be a clear purpose for the HCF (Highest Common Fundamental). However, I'm not confident this concept won't still prove valuable, so let's put it on the shelf for the time being. If it comes back off that shelf, it might well be under another, hopefully less clumsy name.

If tones can be specified simply in terms of their first-harmonic frequencies in Hz, expressed as double precision floating point numbers, rudimentary support for pitch bending and sliding becomes a simple matter of respecifying that first-harmonic frequency on a per-sample basis. I say 'rudimentary' because I suspect providing such support while avoiding artifacts will turn out to be more complicated than this.

Next there's the matter of the phases of overtones not necessarily being perfectly aligned with (typically trailing) the phase of a tone's first harmonic. For the moment let's call this overtone offset, since accommodating this can be as simple as adding an offset to the per-sample phase calculated for each overtone. That offset might be calculated as a fraction of the first harmonic's cycle time, and applied before conversion to a sine table index, although moving at least part of that calculation outside of the real-time context and passing the result in as a simple quantity would make sense.

Given overtones with phase offsets, the question arises whether we might want the option of defining tones in terms of multiple instances of overtones, each with its own per-sample offset and amplitude. Since this could so complicate real-time calculations that polyphony becomes problematic, I'm inclined to also put this idea on the shelf, until I've given more thought to the possibility of voices with some/all of the complicated rendering having been precomputed.

The main obstacle I see in the path of precomputation is the aspiration to make the sound output responsive to factors like velocity, pressure, up/down-scale movement, and time-to-release, which can't be known in advance. As a workaround, it should at least be possible to capture these while producing a less nuanced rendering in real time, then apply them after the fact, editing as needed to achieve the desired effect.

In any case, multiple overlapping notes using the same tone should be available, each with its own set of overtones and their variable attributes, with offsets also optionally applied to their first harmonics, for the purpose of generating echoes if nothing else. Considering this, providing multiple per-note instances of overtones might simply be needless complication.

Finally, because there's a temptation to withhold functionality from the real-time context in order make sure rendering can happen in a timely manner, this project really wants to split into two components (modes), one (stage) focused on real-time performance, and the other (studio) focused on providing a full set of features. The communication between these two modes is a sort of bidirectional funnel, and needs to be well defined. An advantage of this requirement is that it is an obvious place to look for an organizing principle, around which to build out the rest of the model and basic functionality.

As such, it may also prove a suitable focal point for any open source initiative, allowing 'stage' and 'studio' applications from different vendors to interoperate. But I'm really getting way ahead of myself in even mentioning that. First I need to build out my own project, then maybe I can think about turning it into an open-source project.

Addendum (25April2019): This is not even close to being a final decision, but I'm thinking it makes the most sense to specify, for any given note, the per-sample frequency, amplitude, and phase offset of the first harmonic, and then to specify the same attributes for higher harmonics (overtones) relative to that, although, for the sake of efficiency, it will be desirable to precompute as much of this as can be without sacrificing responsiveness to the performer.

Saturday, February 16, 2019

The Elements of Voice

In a previous post on this blog, I defined voice as "Any attributes in the synthesis of a note other than its basic frequency and the overall volume, for example the ADSR Envelope or emphasis on different harmonics as the note progresses." You can also find a brief explanation of the ADSR Envelope in that same post.

In RatioKey 1.1 (removed from the App Store more than two years ago), I provided the means to edit the duration of each phase of the ADSR envelope, as well as the volume at the point where each phase transitions into the next. This helped make up for that app only being capable of generating a single simple sine wave at a time, with each new note interrupting the previous note, and no support at all for overtones.

Even back in 2010, while working on that app, I wanted to be able to synthesize more interesting voices, composed of harmonics (what I'd now term secondary harmonics), with the intensity of each varying independently over time, and to craft a simple interface for editing such voices, but at that time I had no clear idea how to generate multiple simultaneous notes, much less how to build them from harmonic components.

Over the intervening years, I've ferreted out solutions for various aspects of this problem space, but it wasn't until I'd experienced the absence of phase alignment, motivating a reevaluation of my approach, which led to the idea of 1) determining the Highest Common Fundamental (HCF), 2) tracking its phase on a per-sample basis, and 3) using that phase to generate indices for sine table lookup on a per-sample basis for members of a harmonic structure, that I felt confident I could actually do it. That was, for me, the key missing piece to the puzzle.

In the process of fleshing out that idea, I had another eureka moment when I realized that this approach would not only facilitate the synthesis of any member of a harmonic structure while guaranteeing phase alignment, but it would also enable per-sample modulation of the harmonics of those structure members (secondary harmonics) by the very same method, since they are also part of the harmonic structure.

Given the ability to independently control the intensity of secondary harmonics over time, my sense is that this should supersede the ADSR paradigm. Yes, you might still want to ramp up the volume very quickly, drain some of it back off almost as quickly, then hold it nearly steady for awhile, before tapering off to silence, but this is just as easily achieved by controlling the intensity of component harmonics as by controlling that of the basic pitch.

Per-sample control of harmonic intensities, translated into physical terms, equates to moving acoustic energy around among harmonics, much as we do with our tongues and the way we shape our mouths while speaking. This might be approached with the discipline of an engineer applying the conservation of energy, or utterly fancifully, or anywhere in between. It could be used to mimic familiar sounds, or to create sounds even a veteran sound collector or foley artist would be hard pressed to find in the wild or produce physically.

There are also elements of voice that this approach, as currently conceived, does not support, notably any sort of pitch bending or sliding, except as these might be applied to a harmonic structure as a unit, rather than to individual notes. In the current version, all members of the harmonic structure, including the secondary harmonics, are discrete pitches.

(Yes, it should be possible to support pitch bending and sliding by allowing variable factors relating the HCF to parts of the structure. Strictly speaking, in that event, it would at least intermittently cease to be a harmonic structure. This may be a case where accommodation is more important than conceptual cohesion, and worth the added complexity. Further contemplation is indicated.)