Today nearly half of the world’s population is online, and while we know we need to increase access to the Internet, access alone is not enough. Information must be high-quality and available on an equitable basis because it has a major impact on the choices we make. To that end, major platforms are investing heavily in developing algorithms to identify possibly misleading or hateful content and are partnering with fact-checking organizations and content moderation firms for human assessment of content. While these efforts are noteworthy, major challenges still exist.
Successfully countering misinformation requires a combination of many techniques including user-interface design, a better understanding of the economics behind misinformation and the attention economy, machine learning, and human-led investigation. The most successful efforts in this space are multifaceted, involving practitioners, researchers, policymakers and platforms working together at every step. We also need to examine the bigger picture looking at the factors that lead to misinformation being created and spread in the first place.
Identifying What to Fact Check
On unencrypted platforms, algorithms can scan for possibly misleading content. These algorithms can examine the content, reactions to the content (e.g., stance detection), and how the content is spreading. Startups like AuCoDe also scour the open-web for indicators like controversy. None of this is possible, however, on encrypted platforms like WhatsApp, LINE, Signal, and Telegram where end-to-end encryption offers additional protection against eavesdropping by governments, advertisers, and adversaries making it more difficult to detect misinformation. Instead, the primary way misinformation is identified on encrypted platforms is by users reporting it.
Since 2019 Meedan has worked with fact-checking organizations and WhatsApp to pioneer a new “tipline approach” to fact-checking on encrypted platforms. Meedan’s open-source software allows WhatsApp users to forward a suspicious message to a misinformation tipline operated by a fact-checking organization. If there is an existing fact-check, this is returned to the user immediately. If there is not, it is entered into a collaborative workflow system for human-led analysis.
Tiplines depend on building a large audience of people who will spot potentially misleading content and forward it to tiplines, but the process cannot be purely manual. Once submitted we need algorithms to help filter, categorize, and group content. We use natural language processing and image processing to remove spam as well as to search fact-check databases and existing content to see if the forwarded messages contain any claims that have already been fact-checked.
Algorithm-Assisted Humans Help
While there are cases where fully-automated fact-checking can work and much attention is on fully-algorithmic approaches, we’ve found humans assisted by algorithms can be highly effective in fact-checking content, and that this is what fact-checking organizations are doing in practice. Case in point: During the first five months of 2020, our fact-checking partners published nearly 6,000 fact-checks from content received on their tiplines.
Algorithms can help, too, by extracting metadata, searching the open web for past appearances or related content, summarizing search results, and providing related information from open-databases such as WikiData or this COVID-19 resource curated by Meedan’s Digital Health Lab. It is also essential to understand how prominent a piece of misinformation is so that a fact-check doesn’t accidentally amplify fringe content and result in it having a larger audience than it otherwise would have.
All of these tasks present novel research questions. Just searching for existing fact-checks or matching similar social media messages is a challenge given the short and informal nature of most social media messages—especially in languages beyond English. The expanding use of image and video requires expertise from computer vision, computational linguistics, and machine learning. Regional and cultural knowledge from the social sciences is required to understand local contexts and create solutions that work beyond English and the United States.
Dissemination of the Truth Matters
Debunking something is not a solution on its own, of course. Fact-checks need to be disseminated, and research shows fact-checks are often less “viral” than the original misinformation. Here again, the solution isn’t either human or algorithmic; it’s both. We need behavioral science and human-computer interaction research to understand how we can encourage people to share fact-checks more widely. Social theory already tells us that content from an acquaintance can carry greater weight than that from an unknown source and that receiving a debunk from multiple sources may help.
Once something is fact-checked, a warning or additional contextual information can be added to further shares of that information on unencrypted platforms. This isn’t the case for encrypted platforms, but researchers at the Federal University of Minas Gerais (UFMG) in Brazil and MIT say on-device fact-checking could make a large difference for encrypted platforms. On-device fact-checking works by storing a list of — hashes of — known misleading content on each phone or device. Messages can then be checked against this list on the device without breaking the end-to-end encryption. In analysis of WhatsApp messages shared in large public groups during the 2018 Brazilian and 2019 Indian elections, UFMG’s Julio C.S. Reis along with other researchers found 40 to 80 percent of shares of misleading images occurred after the content had already been fact-checked.
While Reis and his colleagues focused on images, the same may hold for text misinformation. We see very similar textual statements being repeated across multiple tiplines in India, Africa, and Brazil. Information about the supposed benefits of regularly drinking tea or water to prevent COVID-19 appeared hundreds of times across all three regions in our data. At the same time Brazilian researchers analyzing fact-checks around the virus found unique misinformation in countries as well. Effective misinformation often builds from local contexts, cultures, and historical narratives and thus countering it often requires regional expertise.
Can’t See the Forest for the Trees
Identifying potentially misleading content, fact-checking, and disseminating corrections deals with the results of misinformation, but we must also seek to understand the underlying causes and ways we might avoid misinformation in the first place.
In a yet-to-be-peer-reviewed field experiment, Gordon Pennycook from the University of Regina and colleagues from MIT and the University of Exeter Business School find that priming people to think about accuracy can increase the quality of the news they share on social media platforms. The authors write that “most people do not want to spread misinformation, but the social media context focuses their attention on factors other than truth and accuracy.” Work within Human-Computer Interaction and the behavioral sciences also suggest that adding ‘friction’ to make people think before re-sharing content may reduce the spread of misinformation.
Education, training, and media literacy are also critical. Hacks/Hackers, a co-founder with Meedan of the Credibility Coalition (CredCo), conduct training and events like MisinfoCon, and CredCo participants like The Propwatch Project focus on creating educational resources. Others in CredCo are thinking about the economics behind misinformation.
Academic interest in misinformation took-off after the 2016 US presidential election, and although the occupant of 1600 Pennsylvania Avenue will soon change, the problem of misinformation will not disappear. It is essential that we continue to strive to provide quality information to all people.
There is no one solution to misinformation. Instead, we must combine our knowledge and skills across industries and disciplines. The solutions must also be grounded in local, regional knowledge and work in languages beyond English. Increasingly misinformation is also multimodal with falsehoods implied through gestures or other visual elements, and approaches will need to incorporate knowledge from computer vision and speech and language processing as well as natural language processing.
Addressing these problems will require far more collaboration. We want to create secure data sharing environments that can supercharge policy and machine learning research not only in the US but globally. We’re building the infrastructure to allow data to be securely shared such that academics can train machine learning models and conduct analyses with our partners’ data and consent while the data remains secure on our servers. We then want to allow our partners to easily use the results in their day-to-day work and to develop the necessary insights and best practices to increase access to high-quality information. This is a win–win opening new, real-world data for academic research and putting academic findings into practice.
Often misinformation can be the result of missing information (so called, midinformation), where scientific consensus has not yet emerged. In other cases, there is a gap in terminology. Consider the simple example of searching for “olfactory dysfunction,” “anosmia,” or “loss of smell.” All three have very similar meanings, but the first page of Google search results shows only one result in common between the first two queries and no results in common with the third. While the results for all queries mentioned the connection with COVID-19, the results of the first two queries were mostly scientific publications intended for an academic audience. The popular press has filled the gap for the third query discussing the scientific literature with a general audience in mind, but for many emerging topics there may be an unfilled gap between the terms used in scientific discourse and those used by everyday Americans creating a data deficit. We need to identify such gaps and work to fill them with quality content. Qualitative research is an excellent way to understand how people really find information online.
Misinformation didn’t start in 2016, and it will not suddenly end. User-interface changes, changes in advertising and economic incentives, the development of machine learning and NLP algorithms, education, and human-led fact-checking will all contribute together to a brighter, more informed future. This is a problem where we need less isolation and more intermingling of ideas, and that is something we can be excited about in 2020.
Dr. Scott A. Hale is Director of Research at Meedan and a Senior Research Fellow at the Oxford Internet Institute, University of Oxford. Meedan is a global technology not-for-profit working to build a more equitable Internet in which all people regardless of their languages, locations, or incomes have the ability to effectively locate the most pertinent information, evaluate the quality and credibility of that information, and make informed decisions. Their flagship, open-source product is Check, an award winning collaborative fact-checking and misinformation response workflow system that integrates with popular messaging platforms such as WhatsApp. Dr. Hale sets strategy and oversees research on widening access to quality information online and seeks to foster greater academic–industry collaboration through chairing multistakeholder groups, developing and releasing real-world datasets, and connecting academic and industry organizations. He helps lead Credibility Coalition, a joint initiative of Meedan and Hacks/Hackers, that brings together civil society, technology, research, and journalism organizations.