Skip to content

In search of equitable and inclusive data

Safiya Umoja Noble
Researcher and author Safiya Umoja Noble discusses bias in data at the Entrepreneurship Research Bootcamp at the Kauffman Foundation Conference Center in May 2019.

Leveling the playing field for people to be successful in education and entrepreneurship must include exposing inherent biases in data.

Big tech companies are increasingly under fire, with accusations of anti-competitive behavior, mishandling of personal data, and a hands-off attitude about the content published on their platforms. Perhaps equally as insidious, according to one researcher, is the way that tech companies perpetuate societal biases.

Researcher and author Safiya Umoja Noble, co-director of the UCLA Center for Critical Internet Inquiry, has found that the algorithms that drive results on Google and other internet search engines reveal embedded negative biases against people who face demographic, socioeconomic, and geographic barriers – primarily women and people of color.

“People have access to more data than ever, and social, political, and economic inequality and injustice are rising with it,” Noble said. “Data from the past are organized from racist and sexist values. These biases are left out of the discussion, and we should pay attention to them.”

In her book Algorithms of Oppression, Noble shows how search engine algorithms misrepresent a variety of people, concepts, information, and knowledge. Noble spoke recently at the Kauffman Foundation to a group of researchers from across the country who seek to use inclusive and equitable methods to explore a range of entrepreneurship issues. Noble shed light on the inherent bias in search engines as an example of why it is so important to think carefully about where information comes from, and how it used.

“Search engine algorithms optimize content to make money, but people use them like public goods and view search results as facts,” said Noble. “Social media platforms are advertising platforms. People confuse their findings with fact, but the algorithms do not vet for accuracy or truth; they vet for ad dollars.”

Algorithms of Oppression by Safiya Noble
Algorithms of Oppression by Safiya Noble analyses the ways in which technology and data perpetuate bias.

All data are susceptible to bias, including in marketing and scholarly research. “There are no ‘neutral’ data,” Noble said. “Data reflect the values of the people who enter the information. Technology isn’t a neutral tool either. We pay to use social media with our data use and activities.”

Noble said just being aware of bias in data is the beginning of the solution.

The Kauffman Foundation continually explores new and more inclusive methods for measuring and creating mechanisms to measure entrepreneurial outcomes. “As researchers (and funders of research), there are so many new avenues to collect data, search for data, create data infrastructures – so it’s really important that the systems we build are inclusive and carefully thought through,” said Sameeksha Desai, Kauffman’s director of knowledge creation & research, Entrepreneurship. “For example, you can scrape websites and platforms, which means we need to think about the population using (or not) those platforms.”

Social and economic implications of biased data, whether used in an online search or a white paper, are everywhere today. Data play a significant role in policy and program decision-making. Propaganda to undermine research on the reality of climate change, misinformation to deter childhood vaccinations, and even disinformation to challenge empirical knowledge about the shape of the earth have been propagated in online digital platforms to contradict longstanding scientific findings and experience. “In many ways, disinformation and misleading stories go viral because they are titillating and excessively shared, even by people who know these sensational stories are inaccurate,” Noble said.

The algorithms used to organize information in search engines today will be incorporated into artificial intelligence in the future, Noble said, opening the way to still greater challenges to truth, evidence-based facts, science, equity, and fairness because artificial intelligence can’t understand the kind of nuances used in hate speech and other propaganda. In addition, platforms such as Instagram, Facebook, and YouTube are growing rapidly to keep up with the volume of user-generated content being uploaded by the minute, making it impossible to vet it all with software.

“The remedy to this isn’t a better search engine, it’s more search engines and non-commercial search engines or public-interest search vetted by subject matter experts,” Noble said. “And we need to reveal the sources of information and invest in other forms of knowledge, such as long-form investigative journalism, public and academic libraries, accessible and affordable higher education, and other checks.”

Noble offered her recommendations to reduce and reveal bias in data:

  • Demand factual and socially just information as a pillar of democracy. Other countries require online platforms comply with certain conditions to keep content on the internet.
  • Require digital literacy of policymakers. Policymakers’ digital literacy is not keeping up with the public, nor with rapid advances in technology. Lawmakers need to understand what they are governing and work closely with scholars and journalists who study the harms engendered by digital platforms and technologies.
  • Fund critical internet studies through universities, schools, libraries, and public media as democratic counterweights. These counterweights provide policymakers and the public with necessary checks by which they can evaluate the quality of data online.
  • Reduce technology over-development and its impact on privacy, people, and the planet. The more hardware and software we use, the more digital trash we generate that negatively impact the environment and contribute to worker harm.