Response rates to traditional surveys are declining. Polls relying on calling landline phones are becoming less and less representative. And even though this type of data is great for research, it is getting harder to collect it the traditional way.
On the other hand, data generated online is becoming more ubiquitous and easier to access. Most of us leave a long data trail every day in our online activities, from professional trajectories on LinkedIn, to political preference on Twitter. Yet, researchers do not yet know how to interpret these data, and it is still unclear how reliable or insightful it can actually be.
To help bridge the gap between web data ubiquity and actual use in research, Facebook hosted a data science conference in August for researchers at their headquarters in Silicon Valley. I attended it to present a Kauffman research I co-authored with colleagues Yas Motoyama, Jared Konczal, and Jordan Bell-Masterson (more info about the research in the sidear). I took home some interesting lessons from the conference and wanted to share them.
These are all words I have used in the last week alone to refer to people. And it is scary.
Facebook office's walls are more or less covered with posters like the one above, with various messages to their team. This specific message hit home. For my work at the Kauffman Foundation, I often sift through piles of data trying to figure out trends or insights. But it is easy to forget that we are the data represent real people.
If you are curious, you can see more Facebook posters and other office pictures here. The ones security will let you take pictures of, anyways. They are understandably very careful about what you can and cannot do at their offices.
The insight: People, not data points
The Facebook Data Science Team shared the tools they use on their own analyses. The majority of them are open-source, and tools mentioned include:
The insight: The data scientist needs an evolving toolkit. If you want to see more about it, check out this deck put together by Software Carpentry.
When I start working with a new dataset, I sometimes get frustrated with how much time I have to spend cleaning and munging the data before I can do produce anything really good with it.
This frustration carried a sort of hope, however. I told myself that, in the future, once I knew all the tools and was really good at data science, I would get done so quickly with my data cleaning that it would take almost no time at all.
The talks with the Facebook Data Science team shattered my naivety. Most of the data scientists there, arguably some of the best Silicon Valley can create, highlighted how much time they spend cleaning and munging data. Many cited that they spend around 80 percent of the time doing just that.
The insight: Data science requires a lot of schlep, and the only way do to it is getting your hands dirty.
A big part of the conference was dedicated to academics presenting how they are using social media or digitally collected data on their research. Since the Facebook conference was the day before the Annual American Sociological Association meeting in San Francisco, most attendees were sociologists.
Here are 3 of the projects presented:
The insight: Big data is making its way into social science research increasingly, but there remains a lot of room for innovation.
Facebook has so many users that simply plotting latitude and longitude for them in a blank space gives you a visualization that very closely resembles an actual map of the world.
See Arnobio's slide deck from the Facebook event.
Read the full research report.
Read commentary from Forbes, Business Insider and Venture Beat.
Occupations vs. Jobs, Part II
4 Data Science Insights from Facebook Headquarters
Arnobio Morelix is a senior research analyst and program officer in Research and Policy at the Ewing Marion Kauffman Foundation, where he is a principal investigator on the Kauffman Index of Entrepreneurship, the first and largest index tracking entrepreneurship across city, state, and national levels. For over a decade, the Kauffman Index has been a trusted source of entrepreneurship indicators in the United States—referenced in the policy world by institutions like the White House Office of the President of the United States, the Small Business Administration, and by U.S. Embassies and Consulates in several countries. Morelix also is an editor of Kauffman’s entrepreneurship research blog, Growthology.org.
Defining Entrepreneurship: From Dataset to Mindset
The 2016 Mayors Conference in 14 Tweets
2017 Kauffman Junior Faculty Fellowships: Top Scholars Wanted
Is Entrepreneurship the Most Productive Part of our Economy?
Highlights from the 2016 REER Conference