As my piece has to do with silenced voices I have been looking at ways to construct metaphors around the voice. Using it’s texture as an instrument by sampling is a fairly popular approach, but how about using it as information to control a different instrument?
Peter Ablinger is exploring this in his extensive Quadraturen (“Squarings”) series.
According to his website, the way this algorithm works is as follows:
‘1) The first step is always an acoustic photograph (“phonograph”).
This can be a recording of anything: speech, street noise, music.
(2) Time and frequency of the chosen “phonograph” are dissolved into a grid of small “squares” whose format may, for example, be 1 second (time) to 1 second (interval).
(3) The resulting grid is the score, which is then to be reproduced in different media:
on traditional instruments, computer controlled piano, or in white noise.
The reproduction of “phonographs” by instruments can be compared to photo-realist painting, or – what describes the technical aspect of the “Quadraturen” more precisely -with techniques in the graphic arts that use grids to transform photos into prints.’
https://ablinger.mur.at/docu11.html
Luckily somebody took the time to implement such an algorithm and make the application available online for free use at: https://www.ofoct.com/audio-converter/convert-wav-or-mp3-ogg-aac-wma-to-midi.html#google_vignette
This allows you to convert any audio to MIDI which you can then play through your DAW of choice, with a suitable plug in.
As expected, simpler tones with short attack seem to be most fitted for this.
What I find interesting are the possible implications of using such an auditory illusion. I recently became aware of the fact that the human brain deals with different types of sonic information completely different, having distinct brain cells to depict each of the following: spoken voice, singing voice and music.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6866591/
Listening to some examples, my observation is that the speech does not become obvious immediately, first perception is that of rather cacophonous music. Once aware of the words to be listening for, the music disappears and the voice starts coming through.
In practice, there are a few difficulties. First of all, the audio must be recorded at a reasonable level and I got better results when using chest voice (probably contains more resonances). Using a saturator plug in on the audio also helps yield better results. Tiny experiment below.