As tech companies creating facial recognition systems are rapidly reworking government scrutiny and eliminating personal privacy, they may have gotten help from an unexpected source: your face.
Companies, universities and government laboratories have used millions of images gathered from a variety of online sources to develop technology. Now, the researchers have built an online tool, Exposing.AI, that lets people search multiple collections of images for their old photos.
The tool, which matches images from online photo sharing service Flickr, provides a window of the vast amount of data needed to build a variety of AI technologies, from facial recognition to online “chatbots”. online.
“People need to realize that some of their closest moments have been weaponized,” said one of its creators, Liz O’Sullivan, chief technology officer at the Surveillance Technology Surveillance Project. , a privacy and civil rights group, said. She helped create Exposing.AI with Adam Harvey, a researcher and artist in Berlin.
Systems that use artificial intelligence do not magically become intelligent. They learn by identifying patterns in human-generated data – photos, audio recordings, books, Wikipedia articles, and all sorts of other documents. Technology is getting better every day, but it can learn human prejudices against women and minorities.
People may not know they are contributing to AI education. For some, this is a curiosity. For others, it was terrifying. And it could be against the law. A 2008 law in Illinois, the Biometric Information Privacy Act, imposes financial penalties if facial scans of residents are used without their consent.
In 2006, Brett Gaylor, a documentary filmmaker from Victoria, British Columbia, uploaded his honeymoon photos to Flickr, a service popular at the time. Nearly 15 years later, using the first version of Exposing.AI provided by Mr. Harvey, he discovered that those hundreds of photographs had been converted into multiple datasets that could be used to train systems. Face recognition around the world.
Flickr, which has been bought and sold by many companies for years and is now owned by the photo sharing service SmugMug, allows users to share their photos under a Creative Commons license. That license, common on the web, means that others may use the photo with certain restrictions, although these may have been ignored. In 2014, Yahoo, the company that owned Flickr at the time, used many of these images in a data set to aid in work with computer vision.
Gaylor, 43, wondered how his photos could be posted from place to place. He was later told that the photos may have contributed to surveillance systems in the United States and other countries, and that one of these systems was used to track Uighur’s population. China.
“My curiosity turned to horror,” he said.
How the honeymoon photos have helped build surveillance systems in China is, in a sense, a story of unintended – or unforeseen – consequences.
Years ago, AI researchers at leading universities and technology companies began collecting digital photos from a variety of sources, including photo sharing services, social media, and social media. Dating sites like OkCupid and even cameras installed on universities. They have shared those pictures with other organizations.
That is just the standard for researchers. They all needed data to put in their new AI systems, so they shared what they had. It is usually legal.
One example is MegaFace, a data set created by professors at the University of Washington in 2015. They built it without the understanding or consent of those whose images they have compiled. into a giant photo gallery. Professors have posted it on the internet for others to download.
MegaFace has been downloaded more than 6,000 times by companies and government agencies around the world, according to the New York Times public filings request. They include US defense contractor Northrop Grumman; In-Q-Tel, the investment arm of the Central Intelligence Agency; ByteDance, the parent company of Chinese social media app TikTok; and Chinese surveillance company Megvii.
Researchers have built MegaFace for use in an academic competition to promote the development of facial recognition systems. It is not intended for commercial use. But only a small percentage of those who publicly downloaded MegaFace entered the contest.
“We are not in a position to discuss third-party projects,” said Victor Balta, a spokesman for the University of Washington. “MegaFace has stopped working and MegaFace data is no longer being distributed.”
Some people who have downloaded the data have implemented a facial recognition system. Megvii was blacklisted by the Commerce Department last year after the Chinese government used its technology to monitor the country’s Uighur population.
The University of Washington brought MegaFace offline in May, and other organizations deleted other datasets. But copies of these files could be anywhere, and they may be providing new research.
Mrs. O’Sullivan and Mr. Harvey have spent years trying to build a tool that can reveal how all of that data is being used. It was more difficult than they expected.
They want to accept someone’s photo and use facial recognition, immediately letting that person know how many times their face was included in one of these datasets. But they worry that such a tool could be used in a bad way – by spies or by companies and countries.
“The potential for harm seems too great,” said O’Sullivan, vice president of responsible AI for Arthur, a company in New York.
Ultimately, they are forced to limit how people can search for the tool and the results it brings. The tool, as it works today, is not as effective as it should be. But researchers worry that they cannot expose the breadth of the problem without making it worse.
Exposure.AI does not by itself use facial recognition. It only identifies images if you already have a way to point to them online, such as an internet address. People can only search for photos posted to Flickr and they need a Flickr username, tag or internet address to be able to identify those photos. (This provides appropriate security and privacy protections, the researchers said.)
While this limits the tool’s usefulness, it is still an eye-opening tool. Flickr images make up a large variety of facial recognition datasets that are transmitted over the internet, including MegaFace.
It’s not hard to find pictures where people have a personal connection. Simply by searching for Flickr links in old emails, The Times found the photos that, according to Exposing.AI, were used in MegaFace and other facial recognition datasets.
Some belong to Parisa Tabriz, a well-known security researcher at Google. She did not respond to a request for comment.
Mr. Gaylor is particularly worried about what he discovers through the tool because he once believed that the free flow of information on the internet was mostly a positive thing. You use Flickr because it gives others access to your photos through a Creative Commons license.
“Now I am suffering the consequences,” he said.
His hope – and the hope of Ms. O’Sullivan and Mr. Harvey – is that companies and governments will develop new standards, policies and laws to prevent the mass collection of personal data. He’s working on a documentary about the long, winding, and sometimes messy way of his honeymoon photos to clear up the matter.
Mr. Harvey was adamant that something had to change. “We need to dismantle them as soon as possible – before they do more harm,” he said.