Look Who’s Talking

One of the topics I am considering in my audio paper is the relationship between the voice and technology. The desire to mimic the human voice. How it influences what we think of as synthetic/natural. What we deem as immediate/mediated.

We speak a lot of the digital voice. Assigned to the introduction of the internet. But interest in replicating the human voice dates much further.

I have used text-to-speech conversion many times in my experiments and I have previously written about the ‘talking piano’ effect, but I have never taken a deeper look into the origins of the digital voice.

Constructing a machine that would imitate human speech was a very popular interest in the 18th century.

‘The mouth, the tongue, the teeth, the vocal chords – how is it that this simple panopoly can produce a wide variety of specific sounds of such complexity and distinctiveness that no acoustic machine can emulate them?’ [1]

‘Euler himself entertained the fantasy of constructing a piano or an organ where each key would represent one of the sounds of speech, so that one could speak by pressing the succession of keys, like playing the piano.'[1] This reminds me of the ‘talking piano’ audio illusion I mentioned earlier. (As a former physics student, it always fascinates me how many physicists/mathematicians had an interest and involvement in music and the arts.)

The first speaking machine is thought to be the Kempelen die Sprech-Maschine (can still be seen today at the Deutsches Museum in Munich, still in working order). It came about as the result of a prize issued by the Royal Academy of Sciences in St. Petersburg in 1780. The request was to build a machine which could reproduce the vowels, and to explain their physical properties.

The machine follows the human voice ‘technology’, composed of a wooden box connected to bagpipe ‘lungs’ and a rubber funnel ‘mouth’, and had to be operated by hand.

The earliest electronic speech synthesizers came from the Bell Laboratories. The VODER was introduced in 1939.

The first song sung by a synthesized voice was Daisy Bell, in 1961.[2] (Fun fact: it is also referenced in 2001: A Space Odyssey)

Votrex – you did not, yes I did

Speak & Spell is the first widely consumer available product to use speech synthesis. It does however use pre-recorded words.

This makes me think of how common speaking/singing toys were growing up. And the effect this might have had. It was possibly the first introduction we got to English-language songs and words. It gave those toys have such a different aura, with the contrast between the environment they were placed in and the environment they sung of (or what we thought those places might have been).

A small detour, but possibly important in the context of my audio paper: my first singing toy was a teddy bear named Girl (pronounced as dgeerl). The word was on the little backpack it came with so naturally I assumed it could only be her name. It sung what I now know to be the chorus to this song:

[1] A Voice and Nothing More

[2] Voice in the Media

Leave a comment

Your email address will not be published. Required fields are marked *