With the assistance of some of the technologies I’ve made as part of my Masters Degree, the idea was to make something fairly advanced, being able to utilise things like Dynamic Time Warping, IR Timbral Modification, and Metadata Analysis.
Video demonstrations are hopefully coming soon – in the meantime, here’s some words to give you an idea on the processing going on…
The first stage of the program was to ‘rectify’ the incoming note messages from a keyboard or sequencer. If you’ve scripted in Kontakt before, you’ll know it’s not natively set up to deal with monophonic instruments on an advanced level.
I really wanted to use note releases to be legato transitions to another note (if there was another note held down), so this required “on note” and “on release” information being sent to my own function; Note Controller.
The Note Controller is in charge of determining what the intention behind the notes messages are. As well as deciding if the note activity was an attack, legato, or release, it also stores information about when the notes played and the velocity (for a variable memory size – I use a variable called “fingers” which sets the size of the memory). This allows a much more intricate interpretation of the incoming note information, as the script is able to distinguish between:
- Attacks (can be multiple groups by velocities)
- Legatos (with both note on and note off) (can also interpret time and velocity)
Most interesting to me is the ability to detect detaché note information (in this context, I define detaché as notes that have a short time gap between the end of a note, and the start of a new note – like a legato, but only just separated). A way of recording the sample content for this feature must be investigated.
After the intention behind the notes have been ascertained by the Note Controller, the Voice Controller is there to decide which samples are to start, stop, fade, etc. The Voice Controller is made with many different partitions that tell it which samples to play for an attack, or legato, or staccato, release, etc, – and how to start and stop the current voices to achieve that.
This part of the program is incredibly intricate yet remarkably powerful due to a clever way of dealing with Note IDs, and a part of the program I called the Scheduler. The Scheduler ensures that in instances where complex crossfades are going on (eg, when playing fast) between articulation groups and pitch groups, there are never any stray notes acting out of place by fading in or fading out undesirably.
This is most relevant when using legato samples:- I deemed it good practice to hold the legato sample for as long as possible to reduce the number of crossfades. Reason being, the longer you hold it, the fewer crossfades you’re likely to do as you’re able to transition from legato sample to legato sample without having the sustain jammed in between (and the possible changes in timbre that can go with that).
The Voice controller also has the ability to choose randomised (or potentially specifically chosen) samples for dynamic layers, combined with the dynamics controller, this makes a powerful tool for a wide range or variability.
I’m aiming for dynamics to be multi layered then crossfaded to create a mix for the desired dynamic level input by the sequencer. This will obviously require the use of my dynamic time warping program to ensure that phasey/doubling artefacts are not present.
This part of the program relies on a metadata analysis of loudness of the notes. The analysis is a fairly simple script that calculates the perceptual loudness of the sample, carried out in Java and exported to a format that Kontakt can read. Measuring the loudness of the sample give the ability in the Kontakt engine to automatically adjust the mix to match the desired dynamic.
If a target dynamic is below or above the range of samples available, the script artificially boosts or reduces the volume to match the target dynamic. It’s possible that a high shelf filter could be applied to modify the frequencies appropriately if it adds to the realism.
The dynamics controller is currently undergoing a further development, whereby dynamic legato samples (legato samples at different dynamics) are not to be mixed. Mixing legato samples sometimes lead to the phasey/doubling problem, as their harmonics and pitch fluctuate more. The new way of dealing with legato samples is for the engine to choose a legato sample that is closest to the target dynamic and artificially modify the volume to match the target dynamic.
This must have some assessment to its suitability, as I’ll want to avoid a ‘binary’ change from one dynamic to another dynamic (for instance with a crescendo on a trill). It may be more suitable to use compromise between the linear and binary fades by using an ‘S’ curve (eg sigmoid) to make sure most of one or the other sounds, yet there is still some ability to fade continuously.
All this requires absolutely no adjustment of sample volume whatsoever – just scrupulous scientific measurement with automated code and maths! 😀
(however, improvements must be made to the measurement, as staccatos ‘sound’ louder than they actually are – a better model on perceptual loudness is needed)
The vibrato controller, like the dynamics controller, has its roots in metadata analysis. Real samples of vibrato were analysed for their oscillations in pitch and then formatted for importing to Kontakt.
After my first success with this, I developed the metadata and engine to a much more advanced state: Pitch envelopes were filtered into vibrato and sub-vibrato frequencies in my metadata analysis, and therefore can be controlled separately in the engine. Furthermore, envelopes were split up into different types, being attack, sustain, and release.
When a new note is played, the attack envelope is played. Once finished, it triggers the sustain oscillations. The sustain oscillation array is a 2D vector of many oscillations where any of them can be randomly selected on the fly. Once the note release is triggered, a random release oscillation plays once while the sustain oscillation fades to neutral.
This system can perhaps be improved upon later if the oscillation array were to be redefined as a 3D(or more) tensor, using the dimensions to replicate real features such as vibrato speed, vibrato depth, or style.
The oscillations may also modify the target dynamic at the users discretion.
I’m not entirely convinced by the success of this vibrato system just yet – It will require more testing before I can assess the success of it, or suggest improvements.
After just two weeks, I have something I’m quite proud and fond of:- The system very stable, and is appropriately partitioned into powerful and optimised functions allowing for expansion of capabilities as I add to them.
Development has been held up somewhat with a fatal flaw in my Dynamic Time Warping program – it may take some time to fix, or research and develop a new way to handle phase alignment entirely.