Supasorn Suwajanakorn [ 7:15 | Machine Learning ] Look at these images and tell me which Obama here is real.
. . . to help families refinance their homes, to invest in things like high-tech manufacturing, clean energy and the infrastructure that creates good new jobs . . .
The answer is, “none of them.”
None of these is actually real.
Let me tell you how we got here. My inspiration for this work was a project meant to preserve our last chance for learning about Holocaust from the survivors.
It’s called New Dimensions in Testimony and it allows you to have interactive conversations with a hologram of a real Holocaust survivor.
How did you survive the Holocaust?
How did I survive? I survived, I believe, because Providence watched over me.
It turns out these answers were pre-recorded in a studio. Yet the effect is astounding. You feel so connected to his story and to him as a person.
I think there’s something special about human interaction that makes it much more profound and personal than what books or lectures or movies could ever teach us.
I saw this and began to wonder: can we create a model like this for anyone? A model that looks, talks and acts just like them?
So I set out to see if this could be done and eventually came up with the new solution — the computer model of a person using nothing but these existing photos and videos of a person.
If you can leverage this kind of passive information, just photos and videos that are out there — that’s the key to scaling to anyone.
By the way, here’s Richard Feynman who, in addition to being a Nobel Prize winner in physics, was also known as a legendary teacher.
Wouldn’t it be great if we could bring him back to give his lectures and inspire millions of kids? Perhaps not just in English but in any language?
Or if you could ask our grandparents for advice and hear those comforting words — even if they’re no longer with us.
Or maybe, using this tool, book authors — alive or not — could read aloud all of their books for anyone interested?
The creative possibilities here are endless and to me that’s very exciting.
Here’s how it’s working so far.
First we introduce a new technique that can reconstruct high detail 3D face model, from any image, without ever 3D scanning the person.
Here’s the same output model from different views.
This also works on videos.
By running the same algorithm on each video frame and generating a moving 3D model.
Here’s the same output model from different angles.
Turns out this problem is very challenging.
But the key trick is that we’re going to analyze a large photo collection of the person beforehand.
For George W Bush we can just search on Google.
From that, we’re able to build an average model and iteratively refine the model to recover the expression and fine details like creases and wrinkles.
What’s fascinating about this is that the photo collection can come from your typical photos.
It doesn’t really matter what expression you’re making or where you took those photos.
What matters is that there are a lot of them.
We’re still missing color here, so next we develop a new blending technique that improves upon a simple averaging method and produces sharp facial textures and colors.
This can be done for any expression.
Now we have a controllable model of a person and the way it’s controlled now is by a sequence of static photos.
Notice how the wrinkles come and go depending on the expression.
We can also use the video to drive the model.
. . . but um somehow we’ve managed to attract with another some more amazing people.
Here’s a lot of fun demo.
What you see here are a controllable model of people I built from their internet photos.
If you transfer the emotion from the input video, we can actually drive the entire party.
It’s a very difficult bill to pass because there’s a lot of moving parts and the legislative process is . . . can be ugly.
Coming back a little bit, our ultimate goal, rather, is to capture their mannerisms or the unique way each of these people talks and smile.
To do that can we actually teach the computer to imitate the way someone talks by only showing it video footage of the person.
What I did exactly was I let a computer watch 14 hours of pure Barack Obama giving addresses
Here’s what we can produce given only his audio.
The results are clear America’s businesses have created 14.5 million new jobs over 75 straight months.
So what’s being synthesized here is only the mouth region and here’s how we do it.
Our pipeline uses a neural network to convert an input audio into these mouth points.
. . . we get it through our job or through Medicare or Medicaid . . .
Then we synthesize the texture, enhance details in teeth, and blend it into the head and background from a source video.
Women can get free checkups and you can’t get charged more just for being a woman. Young people can stay on a parent’s plan until they turn 26. . .
I think these results seem very realistic and intriguing but at the same time frightening — even to me.
Our goal is to build an accurate model of a person — not to misrepresent them. But one thing that concerns me is its potential for misuse.
People have been thinking about this problem for a long time — since the days when Photoshop first hit the market.
As a researcher, I’m also working on countermeasure technology.
I’m part of an ongoing effort that AI Foundation, which uses a combination of machine learning and human moderators to detect fake images and videos fighting, against my own work.
One of the tools we plan to release is called Reality Defender — which is a web browser plug-in that can flag potentially fake contents automatically, right in the browser.
Despite all this though, fake videos could do a lot of damage even before anyone has a chance to verify.
So it’s very important that we make everyone aware of what’s currently possible, so we can have the right assumption and be critical about what we see.
There’s still a long way to go before we can truly model individual people and before we can ensure the safety of this technology.
But I’m excited and hopeful because if we use it right and carefully this tool can allow any individual positive impact on the world to be massively scaled and really help shape our future the way we want it to be.
FEATURED IMAGE CREDIT: Thomas Hawk
Supasorn just left Google Brain as a research resident.
If you haven’t seen Zeynep Tufekci’s relatively recent talk at Dartmouth’s Neukom Institute, you definitely won’t want to have missed that.