Saturday, June 19, 2010

The wick, the internet, and Andy Warhol

I just finished reading Nicholas Carr’s “The Big Switch: Rewiring the World, from Edison to Google.” It is a book about technology, and how certain types of technologies, which are initially intended for individual fruition, gradually become utilities.  Electric power is a classic example of that phenomenon. After the electric power generator became cost effective, factories and some households started to buy, own, and manage their own generators. But then someone—he was Edison’s personal secretary Samuel Insull—figured out that producing energy in a single place and distributing it for a fee would have been much more efficient and cheaper for everyone. In other words Insull created the “hosted” or “on demand” model for electric power: you pay for what you use. That was not the first example of this progression from individual usage to utility. Before him and before electric power was possible, Henry Burden built in 1852 a giant water wheel somewhere in upstate New York and used it to distribute mechanical power to the farms and factories in its neighborhood. And if you think about it, many other technologies followed the same path. Music:  from individual gramophones to radio. Transportation:  from individual coaches to public transports. And finally the computer: from individual computers to the cloud, or what Carr calls the World Wide Computer.  
Interesting enough, computers started with the idea of being mainly a utility. Computers in the 1970s and 1980s were too expensive for individuals and small companies to own them, so they were deployed as mainframes in centralized locations, and computing power was distributed to its users for a metered fee on used CPU.  But unfortunately, back then, data sharing was quite difficult because of the modest data communication bandwidth available. So yes, you could have your data stored in some central location, but uploading or download it from a remote location could take huge amounts of time, unless you could walk or drive to the central computing facility and give them your tapes or stacks of cards. That’s one of the reasons that made personal computing so popular. So we went back from a utility to individual deployment just because the infrastructure was not able to properly support the distribution. But now it is a different story. Bandwidth increased enormously during the past few years, and the idea of a centralized computer became all of a sudden, and again, a viable and attractive alternative to personal computers. In just a couple of years the term “cloud computing” started to acquire immense popularity, indicating a model where computer power (CPU cycles) is distributed from a virtually centralized location to everywhere in the world. It does not take much imagination to see a future not far from now where we won’t buy full-fledged computers for our home or offices, but simple appliances which, once connected to the network, will be able to use the virtually infinite computer power and storage provided by the Amazons and Googles of the world. Keeping tens of thousands of song tracks, pictures, and movies stored in our home computers as we have done during the past years—with the added complication of keping them sorted, backed up, etc.—starts feeling a little bit outdated and unnecessary once you start using services like Pandora, Flick, or Netflix.  
But let’s get back to the thing I wanted to talk about: the wick. The epilogue of Carr’s book starts by saying the wick is one of man’s greatest inventions as well as one of the most modest ones.  It allowed to move from the primitive torchlight to the more civilized candles which remained the dominant lighting technology  for hundreds of years, only substituted by the wickless gas lamp and then by Edison’s incandescent bulb. Indeed the bulb did bring huge benefits to the society and the industry but, in Carr’s own words, it also brought subtle and unexpected changes to the way people lived. The candle constituted a focal point for families. In the evening families gathered around the flickering light of a candle to talk, tell stories, be together. With the advent of electric light, the family started to disperse around the house, and each family member started to spend more time in their own rooms or spaces during the evening.
Other technologies brought societal changes of the order of magnitude of those brought by electric light. I remember when I was a kid, there was only one television in my house—like in most of the other people’s houses—and at that time in Italy we had only two TV channels. The whole family was waiting for the time, after dinner, to gather around the only TV and watch the pretty much only choice of show, or movie, because the choice of which channel to watch was often obvious: a thriller on one channel, a documentary on shepherding on the other, the most popular quiz show of the decade on one channel, chamber music on the other one.  The whole family sat in front of the TV watching in silence, all together every night. I even remember when we bought the first washing machine – the first day we all sat in front of it in awe, waiting for the washing program to change from prewash, to wash, and rinse. That night the excitement of the new machine surpassed that of the show on TV.   Then, with more TVs in the house, and with so many channels to chose form, that moment of togetherness disappeared, and everyone was in their room after dinner, listening to music or watching their favorite show. And no…we don’t watch washing machines anymore …
The same thing happened with computers. At the beginning of my work career, in the early 1980s, my wife and I used to live in the same building where a friend couple lived. We and the other couple decided to share the costs and buy a Commodore 64 together. There, after dinner, we were gathering in one of the two apartments, and play with our first home computer which had a TV as its monitor, and a cassette recorder as the only storage device, substituted one year later by a brand new shining 5 inch floppy disk unit. Hours and hours together, sipping wine and playing the Aztec Tomb, Pac Man, and Mission 5 rescue. Then PCs became cheap, cheap enough for each one of us to have our own and we stopped getting together to play video games.  Then Internet and the Web came. I remember Mosaic, the first browse. I compiled it and installed it on my remote UNIX machine first, and then on my home PC. The first days of the Web created such an intense sense of curiosity that it was not uncommon for the whole family to gather around the PC and browse. Then we got use to it. More PCs at home, wireless internet, everyone navigating to wherever they felt like.
On the one hand the personal computer and the internet greatly contributed, more than any other technology, to the dispersion of the social nuclei. Browsing the internet, listening to online streamed music,   watching online movies, is a personal affair, not a tribe gathering moment. But paradoxically, on the other hand, the internet helped people get closer. Who hasn’t found lost friends and connected again with them on Facebook, LinkedIn, or Skype? Or just googled the names of that old buddy of yours and found that he is now a famous writer, or has a webpage with his picture and email address? Our societal rules and rituals and the way we connect with each other are changing forever because of the technology—the web—who had one of the most large-scale impacts on the whole humanity. So, paradoxically the same internet which brought us apart brings together. Not in the same way as the wick, but in a way which is completely different and new.  
When I was reading Carr’s book discussing how internet is giving everyone in a unique opportunity to publish and share their writings, pictures, movies, etc. , my first thought was of Andy Warhol famous quote “In the future everyone will be world-famous for 15 minutes." I don’t know if Andy Warhol envisioned something as big as the web, but definitely that is what the web is bringing us: the potential to get our 15 minutes of fame.

Saturday, June 5, 2010

Speech Recognition--continued

Thanks to all who kindly commented, either privately or through this blog to my response to Robert Fortner's piece on speech recognition. For completeness, I am reporting here his comment, and my response to his comment. 

On May 30th Robert Fortner said:

Hi, Roberto:
Thank you for reading and your impassioned comment.
I read your blog and you write "If you think that speech recognition technology, after 50 years of so of research, would bring us HAL 9000, you are right to think it is dead."
That's what I think!
You go on to say "that type of speech recognition was never alive, except in the dreams of science-fiction writers." I agree that SF writers were big purveyors of that dream, but I think a lot of other people believed in it too, maybe most people--and that's why the death of that dream has gone unrecognized. Nobody wants to talk about it. It's pretty shocking.
What do you mean computers aren't automatically (i.e. with a lot of work by smart people like you) going to progress to understanding language?
Hard to believe.

On May 30th Roberto Pieraccini said:

Hi Robert ... thanks for the response to my response to your blog ... I started working in speech recognition research in 1981 ... Since then I built speech recognizers, spoken language understanding systems, and finally those dialog system on the phone that some people hate and techies call IVRs.. (now I don't build anything anymore because I am a manager :) ) ... but during all this time I never believed I would see a HAL-like computer in my life time. And I am sure the thousands of serious colleagues and researchers in speech technology around the world never believed that either. At the end we are engineers who build machines. And as we get to realize the inscrutable complexity and sophistication of human intelligence (and speech is one of the most evident manifestations of that), and the principles on which we base our machines, we soon understand that building something even remotely comparable to a human speaking to another human is beyond the realm of today's technology, and probably beyond the realm of the technology of the next few decades (but of course you never know ... we could not predict the Web 20 years ago...could we?).
Speech recognition is a mechanical thing ... you get a digitized signal from a microphone, chop it in small pieces, compare the pieces to the models of speech sounds you previously stored in a computer's memory, and give each piece a "likelihood" to be part of that sound. Pieces of sounds make sounds, sounds make words, words make sentences, and you keep scoring all the hypotheses in an orderly fashion based on statistical models of larger and larger entities (sounds, words, sentences), such as models of the probability a sound following other sounds in a word, a word following other words in a sentence, and so on. At the end you come up with an hypothesis of what was said. And using the mathematical recipes prescribed by the engineers who worked that out, you get a correct hypothesis most of the times... "most of the times" ... not always. If you do the things right, that "most of the times" can become large ... but never 100%. There is never 100% in anything humans, or nature, make...but sometimes you can get pretty damn close to it..and that's what we strive for as engineers.
So, there is no human-like intelligence (God forbid HAL-like evil intelligence) in speech recognition. No intelligence in the traditional human-like sense ... (but ...what's intelligence anyway?). There is no knowledge of the world, there is not perception of the world, and having experienced and thought about the world for every minute of our conscious and unconscious life. Speech recognition is a machine which compares pieces of signal with models of them ... period. And doing that with the "statistical" way works orders of magnitude much better than doing it in a more "knowledge-based" inferential, reasoning way...I mean doing it in an AI-sh manner... We tried that--the AI-sh knowledge-based approach--very hard in the 1970s and 1980s but it always failed, until the "statistical" brute force approach started to prevail and gain popularity in the early 1980s. AI failed because the assumption on which it was based presumed you can put all the knowledge into a computer by creating rational models that explain the world...and letting the computer reason about it. At the end it is the eternal struggle between rationalism and empiricism .. .elegant rationalism (AI) lost the battle (someone think the battle .... not the war) because stupid brute-force pragmatic empiricism (statistics) was cheaper and more effective ...
So, if you accept that ...i.e. if you accept that speech recognition is a mechanical thing with no pretense of HAL-like "Can't do that Dave" conversations, you start believing that even that dummy mechanical thing can be useful. For instance, instead of asking people to push buttons on a 12 key telephone keypad, you can ask them to say things. Instead of pushing the first three letters of the movie you wanna see, you can ask them to "say the name of the movie you wanna see" (do you remember the hilarious Seinfeld episode were Kramer pretended he was an IVR system? ... and why not? if you are driving your car, you can probably use that mechanical thing to enter the new destination on your navigation system without fidgeting with its touch screen. And maybe, you may be able to do the same with your iPhone or Android phone. At the basis there is a belief that saying things is more natural and effective that pushing button on a keypad, at least in certain situations). And one thing leads to builds on technology...creating more and more complex things that hopefully work better and better. These are the dreams of us engineers ... not the dream of HAL (although I have to say that probably that dream unconsciously attracted us to this field). Why that disconnect between engineer's dreams and laypeople dreams? Who knows? But, as I said, bad scientific press, bad media, movies, and bad marketing probably contributed to that, besides the collective unconscious of our species, that of building a machine that resembles us in all our manifestations (Pygmalion?).
I am not sure about your last questions. What I meant is that computers *are* automatically going to progress in language understanding. But they are doing that by following "learning recipes" prescribed by the smart people out there and digesting oodles of data (which is more and more available, and computers are good at that). The learning recipes we figured out until now brought us so far. If we don't give up in teaching and fostering speech recognition and machine learning research, one day some smart kid from some famous or less famous university somewhere in the world will figure out a smarter "recipe"... and maybe we will have a HAL-like speech recognizer .. or something closer to it...