In today's episode, I’m joined by David Stillwell, a professor of computational social science at the University of Cambridge, who uses big data to understand psychology.
I crossed paths with Professor Stillwell several years ago because of his groundbreaking work on the MyPersonality dataset. It consists of Facebook statuses and personality data from millions of study volunteers, which he made available to researchers. Dozens of researchers used it to understand how people behave online and what that reveals about other parts of their lives. His research was influential in tightening policy on how user data is managed.
My first first-author paper used this dataset to predict Big Five personality (among other things) from statuses. If that sounds familiar, it’s because the marketing firm Cambridge Analytica built models to do the same using a similar dataset. They claimed—falsely—that this provided an advantage in changing voters’ minds. Rather than read this as a marketing firm’s bluster, political journalists whipped it into an international scandal. The Guardian broke the story: ‘I made Steve Bannon’s psychological warfare tool’: meet the data war whistleblower. You can read their other coverage in the conveniently organized Cambridge Analytica Files, where they report that within two days of the news, “nearly $60bn was wiped off the Facebook market capitalisation.”
Compare that to how The Guardian treated Obama’s questionable use of Facebook data:
Obama, Facebook and the power of friendship: A unified computer database that gathers and refines information on millions of potential voters is at the forefront of campaign technology – and could be the key to an Obama win
There has been plenty of good reporting pushing back on the misinformation spread about Facebook, Cambridge Analytica, and the Trump campaign. This is a chance to listen to two researchers with a deep understanding of the claims that were made and whose research programs were significantly altered due to the media coverage.
MyPersonality Dataset: The dataset originated from a Facebook app created by Stillwell, which allowed users to take a personality test and share their results. This became a rich source of data for psychological research.
Connection to Cambridge Analytica: Stillwell clarifies that while Cambridge Analytica was inspired by his research, they developed their own models and data. He discusses the nuances of how they approached him and his eventual decision not to collaborate with them.
Predictive Power of Social Media Data: You and Stillwell discuss the predictive capabilities of social media data, particularly in the context of personality traits and other sensitive attributes. The conversation touches on the limitations and ethical concerns surrounding this.
Impact of the Cambridge Analytica Scandal: The scandal had a significant impact on public perception and the direction of your research, shifting focus away from social media data due to its controversial nature.
Psychology and Predictive Models: There's an exploration of the limitations of psychological models like the Big Five in predicting behavior, and how machine learning and computational models might offer more nuanced insights.
Future Directions in Psychological Research: The discussion veers towards the potential of language as a rich data source for understanding personality and behavior, emphasizing the need for a more fine-grained approach that goes beyond traditional models.
Ethical and Practical Considerations: The conversation highlights the balance needed in utilizing big data for psychological insights, considering ethical implications, privacy concerns, and the real-world utility of such research.