Wednesday, April 24, 2019

Moving Targets

I've been letting this project steep on the back burner while firming up my understanding of the basics of the Swift programming language, which I will be using, likely in combination with C for the most demanding real-time code. This has been a propitious pause, as it has surfaced rather gaping oversights in how I've thought about what I've set out to do. What follows is the current state of my evolving understanding and intention.

Caveat: My custom terminology is also still in flux, and usage going forward may not correlate exactly with what came before. I will endeavor to nail down this slippery business sooner rather than later.

Most fundamentally, while making harmonic-based melody more accessible is the primary motivation driving my interest in this project, baking that into the design in a form that makes working with or folding in other tonal systems unnecessarily difficult would be a mistake. This is easily accommodated by defining the frequencies of of available tones in terms of floating point numbers rather than integers. To keep compound error to a minimum, these should be double precision (64 bit).

Since, as previously mentioned, the simplest way to calculate sine table indices begins with tracking the per-sample phase of a 1.0 Hz base frequency, there no longer seems to be a clear purpose for the HCF (Highest Common Fundamental). However, I'm not confident this concept won't still prove valuable, so let's put it on the shelf for the time being. If it comes back off that shelf, it might well be under another, hopefully less clumsy name.

If tones can be specified simply in terms of their first-harmonic frequencies in Hz, expressed as double precision floating point numbers, rudimentary support for pitch bending and sliding becomes a simple matter of respecifying that first-harmonic frequency on a per-sample basis. I say 'rudimentary' because I suspect providing such support while avoiding artifacts will turn out to be more complicated than this.

Next there's the matter of the phases of overtones not necessarily being perfectly aligned with (typically trailing) the phase of a tone's first harmonic. For the moment let's call this overtone offset, since accommodating this can be as simple as adding an offset to the per-sample phase calculated for each overtone. That offset might be calculated as a fraction of the first harmonic's cycle time, and applied before conversion to a sine table index, although moving at least part of that calculation outside of the real-time context and passing the result in as a simple quantity would make sense.

Given overtones with phase offsets, the question arises whether we might want the option of defining tones in terms of multiple instances of overtones, each with its own per-sample offset and amplitude. Since this could so complicate real-time calculations that polyphony becomes problematic, I'm inclined to also put this idea on the shelf, until I've given more thought to the possibility of voices with some/all of the complicated rendering having been precomputed.

The main obstacle I see in the path of precomputation is the aspiration to make the sound output responsive to factors like velocity, pressure, up/down-scale movement, and time-to-release, which can't be known in advance. As a workaround, it should at least be possible to capture these while producing a less nuanced rendering in real time, then apply them after the fact, editing as needed to achieve the desired effect.

In any case, multiple overlapping notes using the same tone should be available, each with its own set of overtones and their variable attributes, with offsets also optionally applied to their first harmonics, for the purpose of generating echoes if nothing else. Considering this, providing multiple per-note instances of overtones might simply be needless complication.

Finally, because there's a temptation to withhold functionality from the real-time context in order make sure rendering can happen in a timely manner, this project really wants to split into two components (modes), one (stage) focused on real-time performance, and the other (studio) focused on providing a full set of features. The communication between these two modes is a sort of bidirectional funnel, and needs to be well defined. An advantage of this requirement is that it is an obvious place to look for an organizing principle, around which to build out the rest of the model and basic functionality.

As such, it may also prove a suitable focal point for any open source initiative, allowing 'stage' and 'studio' applications from different vendors to interoperate. But I'm really getting way ahead of myself in even mentioning that. First I need to build out my own project, then maybe I can think about turning it into an open-source project.

Addendum (25April2019): This is not even close to being a final decision, but I'm thinking it makes the most sense to specify, for any given note, the per-sample frequency, amplitude, and phase offset of the first harmonic, and then to specify the same attributes for higher harmonics (overtones) relative to that, although, for the sake of efficiency, it will be desirable to precompute as much of this as can be without sacrificing responsiveness to the performer.