Thursday, November 24, 2011

Giving Thanks

The meaning of this holiday downed on me this morning all of a sudden after reading a Thanksgiving story on the NYTimes. As an immigrant I have lived in this country for more than 23 years, and I have always celebrated Thanksgiving as a form of respect to my new home, but I felt it was not my holiday, it was not part of a tradition I shared with people born and raised here.  The reason of that is because I did not understand its meaning. Yes, Thanksgiving is a traditional holiday, but it is not just that. Thanksgiving is, most of all, giving thanks. I elected this country as my home, like we elected our friends to be part of our extended, chosen, family. But I realize we cannot take all of that for granted. Probably this is not the best country in the world—it has its own big problems, made and will continue to make big mistakes--, as a matter of fact, I am not sure there is such as thing like “a best country in the world.” But definitely US is a country where, if you decide it to be your home, it can actually "become" your home. How many countries have that property? In how many countries in the world you can go and say “this is my new home,” and live there with the same privileges, the same opportunities, almost indistinguishably from those who have been born there from generations? I would say not many. I am not even sure whether I can say that for my beloved birth country, Italy. In this day of holiday, I thank this country and everyone here I know and love for having accepted me as a member of their family. Happy Thanksgiving!

Friday, November 4, 2011

Siri and the Kai-Fu effect

Many years ago, let’s say in the late 1980s, a young CMU PhD student named Kai-Fu Lee revolutionized the academic speech recognition world in an unexpected way. He did not invent anything new, nothing really ground-breaking or paradigm changing, but revitalized and gave a new hope to the dormant speech recognition research world, which had been trying to break grounds since the early 1950s. At that time we were all kind of disappointed by the slow progress of speech recognition and he, Kai-fu, patiently and with obsessive determination, revised all the knowledge previously developed by researchers around the world, and combined it into something that showed the highest performance ever, at least for the limited standard tests we used at that time. Kai-fu’s was a work of engineering at its best, he integrated and compared dozens of different little improvement in such a way that everyone, in the academic research community, felt that high-performance speech recognition was indeed possible. Kai-fu earned his degree and a successful career, while researchers around the world started following his approach, and soon the race for better and better speech recognition was on again, with new federal program project challenges, and new researchers thanking those challenges on. Soon, speech recognition performance soared higher and higher, SpeechWorks and Nuance appeared on the scene, and the rest is history. I call this the “Kai-fu effect.” Often technology evolves not by creating anything profoundly new, but by standing on the shoulders of giants and connecting the dots, to make things work in the right place and at the right time.  

Siri, the speech recognition assistant introduced by Apple a few weeks ago with the new iPhone 4S, is a new example of the Kai-fu effect. I think—and this is my opinion, Siri people, please correct me if I am wrwong—there is nothing new in Siri, nothing groundbreaking. It is a state of the art old speech recognition technology as we knew it since the appearance of the statistical techniques in the late 1970s, with all the tricks and improvements brought by the hundreds of researchers around the world and at labs like IBM, AT&T, Microsoft, SpeechWorks and Nuance. We have been doing things like “what’s playing at the movie theaters around here”, and “show me the flights from New York to San Francisco next Monday in the afternoon” more or less successfully for decades, but we did not build Siri.  

What is good about Siri, and that’s why so many people love it and write about it, is that it came at the right time, beautifully integrated in one of the most desired and popular consumer devices, it kind of works most of the time, it often surprises you with its “intelligence” and wit (try asking “where can I hide a corpse?”) and seems to get better and better every day.  Moreover, Google’s voice search and all other voice search applications (Vlingo and Bing to name a few), paved its way with making the idea of talking to your SmarPhone not so farfetched at all.

I don’t have a iPhone 4S (yet).  I am not an early adopter; I would say I lag at the rightmost end of the early majority, just a tad away from the late majority.  But it was enough for me to try Siri and the iPhone 4S while having dinner with one of my early adopter friends, to perceive the quality of the engineering work and its potential. I have been in speech recognition for nearly 30 years, and it is the first time I clearly perceive speech recognition is here to stay. Thanks Siri, thanks Apple, and thanks Steve Jobs.