Just as every member of the harmonic series composing a structure is also a member of the harmonic series defined by the Highest Common Fundamental, so too are their own harmonics, and we can make use of this to transform them from simple sine waves to complex tones, perhaps even phonemes, by specifying how much each of those secondary harmonics should contribute to voicing the primary harmonics in response to user actions.
A fairly simple and straightforward way of doing this can most easily be described by analogy to a row of decrepit fenceposts and the nonparallel rails (or wire, if you prefer) between them. The fenceposts represent particular points in time, specified for the purpose of rendering in terms of samples, strung out between the beginning and end of a note. The rails represent an intensity (volume factor) for each of the secondary harmonics contributing to the overall sound to be rendered, but unlike a fence in good repair these rails may cross each other and either end (or the entire rail) end may lie on the ground between any pair of posts. Typically, the rails will at least sit at an angle between any two posts, representing interpolated intensity values.
What is really at play here is the movement of acoustic energy among secondary harmonics in the interest of creating the voice of a primary harmonic. Because those secondary harmonics are also part of the harmonic structure, the same process of multiplying the phase of the Highest Common Fundamental (for the current sample, expressed in cycles) by the harmonic number in terms of the HCF for those secondary harmonics, keeping only the fractional part, multiplying by the size of the sine table, and truncating to produce an index for sine lookup, still applies. In fact this replaces going through these steps for the primary harmonic, since its voice is now composed of the intensities of its own harmonics, remembering that it is its own first harmonic.
For each sample and each secondary harmonic, the result of the sine table lookup is multiplied by the intensity factor calculated for that secondary harmonic and that sample, then the results of those multiplications added together to arrive at the contribution that note makes to the overall sample value. These totals for multiple simultaneous notes are simply added together.
It might be more efficient to combine intensities for various HCF-harmonics (multiple instances of the same pitch originating from different locations within the structure) before multiplying by the result of the sine table lookup, rather than doing this once per sample for each of them, but that is a more complex coding problem, so I'll leave this as an mission for the reader, should you decide to accept it.
If the fencepost and rails analogy doesn't work for you, and you have an old-style bulletin board and some push-pins and string handy, you can use columns of push-pins to represent posts (samples for which the intensities of secondary harmonics are explicitly specified) and string stretched between those push-pins to represent interpolated values.
More elaborate versions, using graphically defined BĂ©zier curves or explicit functions to specify the per-sample intensities of secondary harmonics are also options. So too are modifications to those specifications based on user action parameters like the velocity, pressure, and time-to-release.
Okay, take a breath, step back, let it sink in, and see if a playground full of cool toys doesn't gel in front of you, and maybe also some appreciation for why a part-time developer like myself might find such a project daunting (as well as chronically engaging), and why I have chosen to lay it all out.
Even this isn't exhaustive; there's plenty of room for expanding upon this vision, and I invite any with the motivation to do so to take it and run with it.
I'm sure I'll have more to say, details to be filled in, loose ends to be tied up, but this marks the end of of my whirlwind introduction to the topic.