In a previous post on this blog, I defined voice as "Any attributes in the synthesis of a note other than its basic frequency and the overall volume, for example the ADSR Envelope or emphasis on different harmonics as the note progresses." You can also find a brief explanation of the ADSR Envelope in that same post.
In RatioKey 1.1 (removed from the App Store more than two years ago), I provided the means to edit the duration of each phase of the ADSR envelope, as well as the volume at the point where each phase transitions into the next. This helped make up for that app only being capable of generating a single simple sine wave at a time, with each new note interrupting the previous note, and no support at all for overtones.
Even back in 2010, while working on that app, I wanted to be able to synthesize more interesting voices, composed of harmonics (what I'd now term secondary harmonics), with the intensity of each varying independently over time, and to craft a simple interface for editing such voices, but at that time I had no clear idea how to generate multiple simultaneous notes, much less how to build them from harmonic components.
Over the intervening years, I've ferreted out solutions for various aspects of this problem space, but it wasn't until I'd experienced the absence of phase alignment, motivating a reevaluation of my approach, which led to the idea of 1) determining the Highest Common Fundamental (HCF), 2) tracking its phase on a per-sample basis, and 3) using that phase to generate indices for sine table lookup on a per-sample basis for members of a harmonic structure, that I felt confident I could actually do it. That was, for me, the key missing piece to the puzzle.
In the process of fleshing out that idea, I had another eureka moment when I realized that this approach would not only facilitate the synthesis of any member of a harmonic structure while guaranteeing phase alignment, but it would also enable per-sample modulation of the harmonics of those structure members (secondary harmonics) by the very same method, since they are also part of the harmonic structure.
Given the ability to independently control the intensity of secondary harmonics over time, my sense is that this should supersede the ADSR paradigm. Yes, you might still want to ramp up the volume very quickly, drain some of it back off almost as quickly, then hold it nearly steady for awhile, before tapering off to silence, but this is just as easily achieved by controlling the intensity of component harmonics as by controlling that of the basic pitch.
Per-sample control of harmonic intensities, translated into physical terms, equates to moving acoustic energy around among harmonics, much as we do with our tongues and the way we shape our mouths while speaking. This might be approached with the discipline of an engineer applying the conservation of energy, or utterly fancifully, or anywhere in between. It could be used to mimic familiar sounds, or to create sounds even a veteran sound collector or foley artist would be hard pressed to find in the wild or produce physically.
There are also elements of voice that this approach, as currently conceived, does not support, notably any sort of pitch bending or sliding, except as these might be applied to a harmonic structure as a unit, rather than to individual notes. In the current version, all members of the harmonic structure, including the secondary harmonics, are discrete pitches.
(Yes, it should be possible to support pitch bending and sliding by allowing variable factors relating the HCF to parts of the structure. Strictly speaking, in that event, it would at least intermittently cease to be a harmonic structure. This may be a case where accommodation is more important than conceptual cohesion, and worth the added complexity. Further contemplation is indicated.)