Skip to content

The future of entrepreneurship data – getting to know CrunchBase

CrunchBase logo

Arnobio Morelix explores CrunchBase data and reports on co-hosted CrunchBase-Kauffman session in San Francisco.

Here at Kauffman we are actively thinking about ways of in which we can better measure entrepreneurship activity and ecosystems.

In many ways, entrepreneurship data beyond the traditional public and private data sources is still in relative infancy, and researchers are still learning how to use things like social media, crowdsourced, and news-based data.

As one of our forays into exploring the future of entrepreneurship data we partnered with CrunchBase for a session in San Francisco last month.

Getting to know CrunchBase

Last month in San Francisco we held a session with our friends at CrunchBase.

The session was focused on sharing insights by the CrunchBase team on how the data is assembled and how it can be used; Kauffman’s perspective on emerging datasets like CrunchBase and thoughts on further exploration and funding; discussion on the advantages and constraints of the data; and presentations by academic and industry users of the data.

About CrunchBase

CrunchBase is a leading platform to discover innovative companies and the people behind them. The CrunchBase Dataset is constantly expanding through contributions from their community of users, investment firms, and network of global partners. It now covers millions of users and businesses around the world. CrunchBase also has an open-access data license available to academic users.

About the session

Below is the agenda for the session, with slide decks and/or working papers shared below when possible.

Thoughts on the data

One of the main strengths of CrunchBase is, in my opinion, the fact that they have data on both the people (e.g., founders, employees, investors) and the companies. This allows for data users to get at some stuff not easily accessible, such as the connections among different ecosystems players.

As any dataset, it has limitations. One of the main limitations for academic research is that they are reporting is typically private, and we do not fully understand potential reporting biases. Usually, when that type of private information is made public, it is because of strategic reasons for the parties involved (e.g., a startup wants to show traction).

Andy Wu, PhD student at Wharton, sums it up well:

“The primary challenge for using Crunchbase is that we don’t fully understand the extent of missing data and more broadly the limitations for crowdsourced data. I suspect that we are missing a huge amount of data on the smallest investment events that go undisclosed without press releases or without SEC Form D filings; to be fair, this is a huge problem with all datasets in entrepreneurial finance. Furthermore, since the data is continually being backfilled, there is an implicit selection bias towards the inclusion of the most successful firms that are easiest to find historical information about.

Regardless, Crunchbase is definitely something for all entrepreneurship researchers to keep an eye on.”

How to access CrunchBase data

Just yesterday CrunchBase launched a new way of accessing their data, and you can learn more about it here.

Summing it up

CrunchBase is a really exciting dataset for entrepreneurship researchers – even though we are still learning about what are their main strengths and constraints.

If you are a researcher using the data and would like to share your thoughts on it or propose ways in which we can better understand and augment the data, I’d love if you would let me know here.

Next