Sound - The Last Digital Frontier

I have been listening to the excellent musings of Mike Jones at Digital Basin who has some very interesting things to say about games, cinema, machinima, and the digital age (addition to Machinifeed noted by Overman). I was especially struck by the three part series on sound and space. Fascinating observations on how sound has transformed cinema over time, how important sound is to the filmgoing experience. That, in conjunction with the recent update of Microsoft's game content usage rules (see Hugh's post), specifically regarding sounds has led to this.

This made me think of two points: 1. why is it that sound is so important, and 2. why is sound production still so damn analog?

The first appears to come from the fact that since we are such visual creatures our visual processing centers in the brain are so powerful that we can neatly fill in the gaps in both moving and static images. So powerful that we tend to see things when there is nothing there (look up at the clouds in the sky, what do you see?). When it comes to sound humans are not very capable at filling in the gaps (maybe dogs are?). Perhaps our audio processing is so degraded because of the preponderance of visual processing. I posted before about how you can see a sentence of scrambled words and still make sense of it:
Aoccdrnig to rscheearch at an Elingsh uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, olny taht the frist and lsat ltteres are at the rghit pcleas. The rset can be a toatl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by ilstef, but the wrod as a wlohe.
You can read that. It basically makes the internet readable, which is filled with loads of l33t speak. But if someone spoke that sentence to you it would not make a bit of sense.

As an aside, I'm often reminded of how bad sound is done in the YouTube age, and I've often stopped watching many clips due to bad sound, or stopped listening to a podcast because someone can't be bothered to buy a decent microphone.

I am not really interested in exploring how important sound is to cinema, how the music, sounds, and dialog add to the moving image as many others have covered those points. Suffice to say that sound is important and it has to be very close to perfect so that we are not ejected from the virtual by jarring differences in our expectations and what we actually hear. What really interests me is how sound production, in comparison to image production, still consists of capturing analog sounds and then editing them. Which brings me to the second point.

We can now produce all of the visual elements of a film in a computer (animation, machinima, etc.). Sound, however, is not quite there yet. Music of course has been synthesized for quite some time, although whether synthesized sound can ever measure up to a real orchestral score is debatable. But what about sound effects? It still seems you have to capture analog sound (either the real sound or a facsimile) to mimic the sound effects you would expect to see in a film. And dialog? Real spoken words by real actors still seems the way to go. Why is that? Besides the speak and spell voices you sometimes hear, I know of no digitally synthesized voices that could replace the spoken words of a real actor. Am I wrong? Is there something out there I've missed? Could there ever be digitally produced dialog that would be capable of an academy award winning performance? Could the nuances of a tear-jerking dialog ever be replicated in ones and zeros? How soon before an entire normal film (not a silent visual montage) be created entirely in a computer? How long before that virtual world springs solely from a director's imagination, where the characters, how they look, the words they speak, the sound effects that occur, the swelling music that tells us what to feel emotionally, are all created by one person using a CPU? Could a virtual actor ever digitally say with real conviction "here's looking at you kid"?

We use machinima to capture real-time rendering of virtual worlds using a game engine (or more generally, a rendering engine), and yet there is nothing similar for sound. No virtual audio scape where one could create the sound of a bird singing, or a tank rolling across the plains, or a busy train station in Europe, or a newspaper office at crunch time, or any of these.

Am I wrong? Is there something out there that I've missed, or something coming over the horizon that will do these things? Some software company that's exploiting this gaping black hole? Or are we forever doomed to drag actor's into sound proof booths, to pay someone to clap two pieces of metal together in front of a microphone, or to shove a boom mic practically into someone's face so that last bit of emotional subtext that the lead actor reveals is captured for all to hear?

