New Video Technology Provides Realistic Speech Animation

Robin Rupli, Voice of America, May 27, 2002


A student thesis at the Massachusetts Institute of Technology (MIT) has resulted in the creation of groundbreaking technology that can make videos of people saying things they never said. This is good news for video editors and special effects technicians who will be able to fix misspoken lines or improve the look of dubbed movies. But other people are concerned that the technology raises some ethical questions about falsifying the moving image, also known as "video forgery."

You're listening to a computerized moving image of a woman singing in Japanese. The only thing different about it is that this woman can't speak Japanese and isn't really singing, her face has been digitally re-animated to only appear that way. MIT student Tony Ezzat is responsible for the latest advancement in what is known as "video speech animation technology."

He calls the woman, "Mary 101." "The first step is we actually collect several minutes of somebody's video and then we process that. We figured out a way to collect images from that video and morph them together in high dimensional morph space," he says. "So we can blend and combine some of these images to create trajectories of their mouth moving. Then we take this morph space and make it learn how to actually talk. Then once we do that, we can control it and make it say other things."

"The fire is allowed to burn itself out. . . He tossed in thirty-one points and eleven assists. . . .Warmer temperatures are predicted in some areas. . . ."

Graduate student Tony Ezzet's project was approved by Professor Tomaso Poggio, of MIT's department of Brain and Cognitive Sciences Artificial Intelligence Laboratory. ""This is one of several projects for trying to make more intelligent machines and so this is a project in trying to give a face to the computer and make a computer communicate better with people," he says.

Professor Poggio says their latest technology is far from being perfected. While the mouth of the image appears to move in a completely natural way, there is strangely no emotion conveyed from the rest of the face. Still, he foresees video speech animation as an effective educational tool where one can tune in to a 'virtual teacher' right on their home computer; learn foreign languages or receive speech therapy. "And in this process the goal is really to understand better not only how to communicate with a machine and have machines communicate with us, but also how we communicate with each other," he says.

Some scientists warn that being able to realistically manipulate videos of people so that they appear to say words they never actually said, raises the potential for fraud and propaganda. But others, like Massachusetts scientist Matthew Brand, who works for Mitsubishi Electronics and Research Laboratory, says the latest technology is only a refinement of video forgery that has been going on for years. "Look, video has been forgeable for about two decades now. If you had enough money and you really cared, there's no reason to believe that video forgery hasn't already appeared on the evening news somewhere in the last two decades. And even with the tapes of [Osama] Bin Laden, there's always speculation, particularly in the Arab world, when we put out the tapes, they claimed we forged it," he says. "They know already that Hollywood can do these things. Certainly if you live in any city you take a bus, you're very accustomed to looking at some subtly doctored image. There's no reason that the same sk

MIT scientists Tony Ezzet and Professor Tomaso Poggio are not being credited with inventing video speech animation technology. But their latest developments are considered to be the best attempt so far. And, most scientists and ethicists will agree that, in this day and age, not only should you "not believe everything you hear," but you should also be wary of believing everything you see.

 

 

 

 

( categories: )