Facebook’s next big AI project trains its machines on users’ public videos

Teaching AI systems to understand what’s happening in videos as completely as a human being can is one of the toughest challenges – and the biggest potential breakthroughs – in the world of machine learning. . Today, Facebook announced a new initiative who hopes to give him an advantage in this important work: to train his AI on the public videos of Facebook users.

Access to training data is one of AI’s greatest competitive advantages, and by collecting this resource from millions and millions of users, tech giants like Facebook, Google, and Amazon have been able to go forward in various fields. And while Facebook has already trained machine vision models on billions of images collected from Instagram, it has yet to announce projects of similar ambition for video understanding.

“By learning from global streams of publicly available video spanning nearly every country and hundreds of languages, our AI systems will not only improve accuracy, but also adapt to our rapidly changing world and recognize the nuances and visual cues across different cultures and regions,” the company said in a blog post. The project, titled Learn from videosis also part of “Facebook’s broader efforts to build machines that learn like humans”.

The resulting machine learning models will be used to build new content recommendation systems and moderation tools, Facebook says, but could do much more in the future. AI that can understand video content could give Facebook unprecedented insight into users’ lives, allowing them to analyze their hobbies and interests, brand and clothing preferences, and countless other personal details. Of course, Facebook already has access to this information through its current ad targeting operation, but being able to analyze video via AI would add an incredibly rich (and invasive) source of data to its stores.

Facebook remains vague about its future plans for AI models trained on user videos. The company said The edge these models could be used for many purposes, from captioning videos to creating advanced search functions, but did not answer the question of whether or not they would be used to collect information for ad targeting . Similarly, when asked if users had to consent to having their videos used to train Facebook’s AI or if they could opt out, the company responded only by noting that its Data Policy indicates that content uploaded by users may be used for “product research and development”. Facebook also did not respond to questions asking exactly how many videos will be collected to train its AI systems or how access to that data by the company’s researchers will be overseen.

In its blog post announcing the project, however, the social network pointed to a speculative future use: using AI to recover “digital memories” captured by smart glasses.

Facebook plans to launch a pair of consumer smart glasses this year. Details on the device are vague, but it’s likely that these or future glasses will include built-in cameras to capture the wearer’s point of view. If AI systems can be trained to understand video content, it will allow users to search for past recordings, much like many photo apps allow people to search for specific places, objects or people. (This is information, by the way, that has often been indexed by AI systems trained on user data.)

Facebook has released images showing prototype pairs of its augmented reality smart glasses.
Picture: Facebook

As video recording with smart glasses “becomes the norm,” Facebook says, “people should be able to recall specific moments from their vast bank of digital memories as easily as they capture them.” He gives the example of a user searching with the phrase “Show me every time we sang happy birthday to Grandma”, only to receive relevant clips. As the company notes, such research would require AI systems to make connections between data types, teaching them “to match the phrase ‘happy birthday’ with cakes, candles, people singing various birthday songs, and more”. Just like humans, AI would need to understand rich concepts composed of different types of sensory input.

Looking to the future, the combination of smart glasses and machine learning would enable so-called “worldscraping” – capturing granular data about the world by turning smart glasses wearers into roving CCTV cameras. . As the practice was described in a report last year The Guardian: “Every time someone browses a supermarket, their smart glasses record real-time data on prices, stock levels and browsing habits; every time they opened a newspaper, their glasses would know what stories they were reading, what ads they were looking at, and what celebrity beach photos their gaze was lingering on.

This is an extreme result and not an avenue of research that Facebook says it is currently exploring. But it illustrates the potential importance of coupling advanced AI video analytics with smart glasses – something the social network is apparently keen to do.

By comparison, the sole use of its new AI video analytics tools that Facebook is currently disclosing is relatively mundane. Along with announcing Learning from Videos today, Facebook is announcing that it has rolled out a new content recommendation system based on its video work in its TikTok clone reels. “Popular videos often consist of the same music to the same dance moves, but created and performed by different people,” Facebook explains. By analyzing the content of videos, Facebook’s AI can suggest similar clips to users.

Such content recommendation algorithms are not without potential problems, however. A recent report from MIT Technology Review highlighted how the social network’s focus on user growth and engagement has prevented its artificial intelligence team from fully delving into how algorithms can spread misinformation and encourage political polarization. As the Technology Review the article says: “The [machine learning] models that maximize engagement also foster controversy, misinformation, and extremism. This creates a conflict between the duties of Facebook’s AI ethics researchers and the company’s credo of maximizing growth.

Facebook isn’t the only big tech company pursuing advanced AI video analytics, nor the only one leveraging user data to do so. Google, for example, maintains a publicly available search dataset containing 8 million curated and partially tagged YouTube videos in order “to help accelerate research in large-scale video understanding”. The search giant’s advertising operations could also benefit from artificial intelligence that understands the content of videos, even if the end result is simply to show more relevant ads on YouTube.

Facebook, however, believes it has a particular advantage over its competitors. Not only does it have plenty of training data, but it’s pushing more and more resources into an AI method known as self-supervised learning.

Usually, when AI models are trained on data, these inputs need to be labeled by humans: mark objects in images or transcribe audio recordings, for example. If you’ve ever solved a CAPTCHA identifying fire hydrants or crosswalks, you’ve probably tagged data that helped train the AI. But self-supervised learning removes labels, speeding up the training process and, some researchers say, leading to deeper and more meaningful analysis as AI systems learn to join the dots. Facebook is so optimistic about self-supervised learning that it is called it is “the dark matter of intelligence”.

The company says its future work on AI video analytics will focus on semi- and self-supervised learning methods, and that these techniques “have already improved our computer vision and speech recognition systems.” . With such an abundance of video content available from Facebook’s 2.8 billion users, skipping the tagging part of AI training certainly makes sense. And if the social network can teach its machine learning models to understand video seamlessly, who knows what they might learn?

James G. Williams