Data2vec is part of a big trend in AI toward models that can learn to understand the world in more than one way. “It’s a clever idea,” says Ani Kembhavi at the Allen Institute for AI in Seattle, who works on vision and language. “It’s a promising advance when it comes to generalized systems for learning.”
An important caveat is that although the same learning algorithm can be used for different skills, it can only learn one skill at a time. Once it has learned to recognize images, it must start from scratch to learn to recognize speech. Giving an AI multiple skills at once is hard, but that’s something the Meta AI team wants to look at next.
The researchers were surprised to find that their approach actually performed better than existing techniques at recognizing images and speech, and performed as well as leading language models on text understanding.
Mark Zuckerberg is already dreaming up potential metaverse applications. “This will all eventually get built into AR glasses with an AI assistant,” he posted to Facebook today. “It could help you cook dinner, noticing if you miss an ingredient, prompting you to turn down the heat, or more complex tasks.”
For Auli, the main takeaway is that researchers should step out of their silos. “Hey, you don’t need to focus on one thing,” he says. “If you have a good idea, it might actually help across the board.”