A lot of times I find out about new data sources through working papers or conference presentations. In this case, Ben Hallen at the University of Maryland and Rory McDonald have a working paper on super angel investors which uses a new database – CrunchBase – and Ben seemed enthusiastic on the data, so I thought I’d take more of a look. Incidentally, also keep an eye out for the updated version of this paper as it was really interesting but for now the paper is not posted online and interested scholars should contact the scholars directly.

CrunchBase, which advertises itself as the “free tech company database,” is a great concept and one that can only become more powerful as more users see it and use it. It’s essentially technology company data collected via wiki. Here were the overview stats as of 5/14/2010:

CrunchBase Stats
Companies - 39,866
People - 54,684
Financial Organizations - 4,705
Service Providers - 2,305
Funding Rounds - 14,944
Acquisitions - 2,996

While many researchers will have concerns about data gathered using a bottom-up process, I suspect the data is actually much more accurate than we would expect. Now, this isn’t to say that the data should be taken as is shown because even CrunchBase acknowledges the following on their FAQ web page:

You do not know if the data is accurate. As multiple people edit CrunchBase profiles of companies, financial organizations and people, some mistakes might be added. Information might also be out of date. If you notice anything that needs changing you can go ahead and edit the page.

Most large data sets (even government data) have a significant amount of error at the individual firm level which, if random, washes itself out as the data gets aggregated up. Now, the true test of CrunchBase as a research tool will be to see if they close the cycle with researchers providing data and receiving updates back. In my experience, scholars are great at taking data, complaining about, and spending tons of time cleaning it but rarely actually do many scholars go to the next step of showing data producers where there were errors or things that could be improved. I hope for CrunchBase’s sake that this paradigm begins to change. 

In any case, for those looking at technology companies CrunchBase definitely seems worth a further exploration. I hope that those of you who have explored this data further than I have will offer comments related to how good or bad CrunchBase is at curating the data to allow for longitudinal analysis.

e.j. reedy data maven

E. J. Reedy

As a director in Research and Policy, E.J. Reedy oversees the Ewing Marion Kauffman Foundation’s research initiatives related to education, human capital development, and data.

Since joining the Kauffman Foundation in 2003, Reedy has been significantly involved in the coordination of the Foundation’s entrepreneurship and innovation data-related initiatives, including the Kauffman Firm Survey, for which he served as a principal investigator, and the Foundation’s multi-year series of symposiums on data, as well as many web-related projects and initiatives. He is a globally recognized expert in entrepreneurship and innovation measurement and has consulted for a variety of agencies.