My Memories Are Just Meta’s Training Data Now

On Jun 21, 2024

In R. C. Sherriff’s novel The Hopkins Manuscript, readers are transported to a world 800 years after a cataclysmic event ended Western civilization. In pursuit of clues about a blank spot in their planet’s history, scientists belonging to a new world order discover diary entries in a swamp-infested wasteland formerly known as England. For the inhabitants of this new empire, it is only through this record of a retired school teacher’s humdrum rural life, his petty vanities and attempts to breed prize-winning chickens, that they begin to learn about 20th-century Britain.

If I were to teach futuristic beings about life on earth, I once believed I could produce a time capsule more profound than Sherriff’s small-minded protagonist, Edgar Hopkins. But scrolling through my decade-old Facebook posts this week, I was presented with the possibility that my legacy may be even more drab.

Earlier this month, Meta announced that my teenage status updates were exactly the kind of content it wants to pass on to future generations of artificial intelligence. From June 26, old public posts, holiday photos, and even the names of millions of Facebook and Instagram users around the world would effectively be treated as a time capsule of humanity and transformed into training data.

That means my mundane posts about university essay deadlines (“3 energy drinks down 1,000 words to go”) as well as unremarkable holiday snaps (one captures me slumped over my phone on a stationary ferry) are about to become part of that corpus. The fact that these memories are so dull, and also very personal, makes Meta’s interest more unsettling.

The company says it is only interested in content that is already public: private messages, posts shared exclusively with friends, and Instagram Stories are out of bounds. Despite that, AI is suddenly feasting on personal artifacts that have, for years, been gathering dust in unvisited corners of the internet. For those reading from outside Europe, the deed is already done. The deadline announced by Meta applied only to Europeans. The posts of American Facebook and Instagram users have been training Meta AI models since 2023, according to company spokesperson Matthew Pollard.

Meta is not the only company turning my online history into AI fodder. WIRED’s Reece Rogers recently discovered that Google’s AI search feature was copying his journalism. But finding out which personal remnants exactly are feeding future chatbots was not easy. Some sites I’ve contributed to over the years are hard to trace. Early social network Myspace was acquired by Time Inc. in 2016, which in turn was acquired by a company called Meredith Corporation two years later. When I asked Meredith about my old account, they replied that Myspace had since been spun off to an advertising firm, Viant Technology. An email to a company contact listed on its website was returned with a message that the address “couldn’t be found.”

Asking companies still in business about my old accounts was more straightforward. Blogging platform Tumblr, owned by WordPress owner Automattic, said unless I’d opted out, the public posts I made as a teenager will be shared with “a small network of content and research partners, including those that train AI models” per a February announcement. YahooMail, which I used for years, told me that a sample of old emails—which have apparently been “anonymized” and “aggregated”—are being “utilized” by an AI model internally to do things like summarize messages. Microsoft-owned LinkedIn also said my public posts were being used to train AI although some “personal” details included in those posts were excluded, according to a company spokesperson, who did not specify what those personal details were.

3 Advertising ai algorithms artificial intelligence as britain Business