11/16/2011 10:03:36 AM By
We can't understand our economy without better firm-level information systems. That's my essential summary of a new report out today, "Assessing competitiveness: how firm-level data can help
," from Brugel, a European think tank. It's a well-crafted and important report as it makes the case for policymakers why the back-end systems are important investments. As the report shows, when we worry about averages and don't look more specifically at elements of the distributions and how those are changing or could be impacted by certain policies, we hurt our chances of driving competitiveness and economic growth.
In the appendix to the report the authors reference some of the international experiences related to firm-level data sets. I think their summary there was restrained in its criticisms. Indeed, I recently sat through a meeting of leading experts on measuring firm-level innovation and growth at the OECD and was appalled at some of the compromises that were going to have to be made in order to fulfill a European Commission mandate for immediate data. It was clear why the compromises had to be made - national statistical offices have not received the investments they need to develop their infrastructures but more importantly they are only just beginning to realize they need to find ways to encourage firm-level research with their data. However, even with the signs of change I've seen afoot in terms of determination from these offices, they have no legal, political, or practical systems in place to meet this challenge. Indeed, at a time when most countries are undergoing significant public sector cuts, I am worried that the clear needs here will not be met with dollars. I've seen many bodies, including the OECD which I had always thought of as the true champion of international analyses using government data moving towards more instances in which their reports rely more heavily on private data sources instead of government data. While private sources should have their place, they cannot be a replacement for official data. That said, if national statistical offices don't get behind some of the important issues Brugel outlined here, I fear that many nations may become more and more satisfied with substandard official statistics. Indeed, some of the current crises in Europe have been driven by lacking regulatory mechanisms on national statistics (see articles on Greece
2/9/2011 8:21:53 AM By
For those of you who are not experts in data dissemination but want a crash course, start with tomorrow's event in Washington, DC. "Responsible Data Sharing in the 21st Century"
will be hosted by University of Chicago NORC and the National Institute for Standards and Technology. Sadly, I can't attend in person, but am happy to report my colleague Alicia Robb will be presenting. She'll be discussing our private sector experience in disseminating the Kauffman Firm Survey
but most of the speakers are from government and will be discussing their various experiences.
I really hope people are coming to learn because the truth is that as a community of data stewards we are still making many more failures in disseminating our data than successes. Here I don't mean failure in the sense of a data breach but rather the science behind what scholars and communities need in addition to data access is still in its infancy. With the Kauffman Firm Survey we've tried to be a real-time laboratory on these activities, testing new modes of dissemination, programs to support outreach, as well as coaching and training. So we sincerely hope others can learn from our successes and failures in data dissemination and we look forward to hearing about similar experiences.
2/3/2011 8:27:30 AM By
With unemployment rates stubbornly high and the global economy increasingly competitive, the United States needs to better understand businesses, policies to support businesses, and, ultimately, how to spur job creation. Jobs don’t just appear or disappear; they are created (and destroyed) by businesses that are reacting to market conditions and opportunities. While our national statistical system is increasing its capacity to produce statistics on these dynamic processes, policymakers could better target job creation programs if the statistical system collected more data about how businesses finance operations and investment in innovation, especially at the regional/local level. Further, to bolster the value of data currently produced, we need to nourish active data user communities to advance the substantive scientific understanding of job creation policies and educate policymakers about the importance and utility of the data.
Read more on the AmStat website
9/8/2010 9:00:00 AM By
Last fall I participated in a really unique workshop at Yale put on by one of our Kauffman legal fellows, Victoria Stodden
. The purpose of the workshop was to discuss data and code sharing best practices and issues for creating replicable research. While the workshop was a bit more in the computational science space than I am fully comfortable, I found the conversation incredible and the goals of the effort beyond compelling. What has resulted is a Data and Code Sharing Declaration (just published in IEEE Computing in Science and Engineering
). This is a document that should be taken up for discussion at foundation and other funder events, in policy circles, and within the scientific academia as it lays out early and clear recommendations for actions each group can take to further data sharing and replication in the future. It is a document which anyone who curates data, journal editors, and all scientists should be discussing.
4/27/2010 9:00:00 AM By
When working with businesses, government regulators and statistical agencies are well aware that they must use the utmost care in the use of the data which businesses report to them. Beyond basic promises to survey respondents or other legal issues, much of this concern has to do with encouraging responsiveness and honesty in data collection by government from businesses. Yet, I believe there are some good reasons to thnk that past some grace period, say twenty years (maybe longer for legal reasons), that the actual risk associated with fully disclosing data reported by businesses diminishes. Businesses change. The economy changes. Each year, the data depreciate in private potential value. I would argue the public value of the data depreciates quickly initially but could potentially increase substantially if the data eventually became public record.
When I awoke this morning I wasn’t thinking on this topic but then I saw the summary report of a meeting the National Academies held examining the data needs necessary to avoid systematic financial risk
. One large strand of debate in the proceedings concerns the need for expanded data access by regulators and what data should actually be public. Current rules and regulations don’t allow different regulatory agencies to see data about specific companies across the spectrum of collection mechanisms. This is not a situation unique to financial regulation.
While I am not an expert on systematic risk and very little of the event had much language which would seem to impact entrepreneurship measurement directly, I was struck at how I’d come across the themes laid out in the report time and again in one form or another.
- Business data are never disclosable.
- Masking of the microdata (even from government regulators) diminishes the actual utility of the data.
- Data are collected for immediate regulatory and reporting purposes.
I don’t have immediate answers here but I am inspired in this area by the work of David Kirsch at the University of Maryland
to believe that more is possible. David is an economic historian by training and an entrepreneurship scholar by our good luck. What David seems to recognize more than anyone I have ever met is the potential threat to future academic research if the current laws and regulations surrounding business data remain intact. Specifically, David recognizes the many incentives that businesses today have to destroy most records concerning the company and not to archive things for posterity and research. While archiving of data and making data collected by the government public eventually
are two very different things, to me these concepts get to a larger need for proactive legal and curatorial management of business data for future generations of study.
While private databases multiply in their availability, there will always be a need for additional historic detail about businesses that can only come from honest answers to government officials or rich archival data from firms.
11/13/2009 9:18:05 AM By
Looks like I will be heading to Connecticut on Saturday, November 21 for a one day event at the Yale Law School on Data and Code Sharing in Computational Science. This is the first event I have ever been to which is being completely organized by wiki which means that the agenda, attendee list, and other logistics are all password protected. So, to give a sense of the event, I pulled down a PDF of the agenda
(realizing that it is being constantly edited). It is part of something called the Information Society Project
. We have worked for some time to try to make research data more accessible so the particular focus here on making data available to encourage replicability will be of great interest. I've pulled down the "resources and readings
" page, as well, as it is the most authoritative list of articles, blogs, and important background material
on data sharing that I have ever seen in this area.
11/11/2009 9:32:22 AM By
Friday, I was at an advisory committee meeting of the Statistics of Income (SOI) Division at the Internal Revenue Service (IRS)
. I wanted highlight one presentation and discussion from the SOI, and staffer Nick Greenia in particular, who continues to look very thoughtfully at their data dissemination activities. In their comments, I see a recognition of the value of engaging researchers in increasing the quality as well as understanding of their data. IRS is proceeding slowly with a researcher-engagement agenda but indicated at the meeting they would have a new call coming out in the next few months for research proposals. When this comes out, I will post a link.
Additionally, Mr. Greenia outlined a theme that I think most federal agencies are wrangling with currently - the increasing risk which public-use microdata files pose in a world of increasing data availability. For a definition of a public-use microdata file, I turned to Statistics Canada which has a definition online
as, "Microdata files that have been carefully anonymized (i.e., all identifying information has been removed) and scrutinized to ensure that no risk of breach of individual privacy or confidentiality exists." The concern is that as people make more information available about themselves online the level to which public-use data files must be stripped of content to keep them anonymous makes them of little relevance for most research. So what does Mr. Greenia propose? He doesn't come down in favor of any one option but highlights many of the advantages and disadvantages of the emerging solutions:
- Synthetic public-use data files
- Virtual data enclaves
- Data research centers
Mr. Greenia has been nice enough to consent to my posting of a PDF of his presentation
so you can read his thoughts directly.
Additionally, I thought I'd highlight that SOI released some updated tables on business financing
last week. These are aggregate tables, not the type of files I highlighted here, but still of interest for some research purposes.
10/22/2009 9:56:09 AM By
I am really excited to be participating in a November 18, 2009, workshop that NORC is putting on titled "Assessing the Results of Microdata Access
" in Washington, DC. It should be an interesting session in that it will have several different data producers discussing aspects of their strategies for making data available for research. More information is available on the NORC Data Enclave website
9/21/2009 12:07:35 PM By
Today President Obama delivered a somewhat high-profile speech on innovation
in New York. This coincided with his National Economic Council's release of a new white paper on innovation
. Both highlight the key role that they believe entrepreneurship and innovation can play in leading future economic growth. Of particular interest to the audience on this blog is the following paragraph:
- Stimulate entrepreneurship through increased access to government data. The Administration launched Data.gov, a one-stop shop for free access to data generated across all Federal agencies. By empowering the American people to find, use, and repackage data, Data.gov will give rise to new businesses (like the GPS and genomics industries that grew from increased access to public information) and empower entrepreneurs to evaluate opportunities.
I find this a really interesting perspective but am skeptical that most readers of the paper will agree on first pass. Entrepreneurship through data access? Entrepreneurship through data access! Yes! Government data can open up new industries, but government data is vital to all commerce - new firms, existing firms, domestic or international. From the decennial census to other surveys on commerce and technology, more often than not, it is government data which drives the models which feed into private business forecast. Yes, government data is often not detailed enough or timely enough for many private sector needs so data is imputed, assumptions are made, or other trade offs get considered. And while there are an increasing array of private sector data vendors on different topics, many of these sources could not exist without their government statistical couterparts. This issue is at the heart of my own passion for the subject of data and data availability. I hope the White House focus on it brings more light to the topic.
9/17/2009 9:16:19 AM By
Nature has a disheartening article
out published this last week on the success of different projects attempting to encourage data sharing among academics. Unfortunately, my experience has been entirely too in-line with what the authors found that data sharing is an entirely discipline-specific beast. Within Management, probably the discipline of study most associated with entrepreneurship scholarship (with Economics a close second), data sharing is not common. In fact, the discipline encourages, for the most part, studies which are based on proprietary data and that can never be replicated or accessed. It is so incredibly frustrating. In Economics, I would assess things as slightly better but not by much.
I have had numerous conversations with people over the years who are interested in changing these discipline specific norms. Unfortunately, these conversations often don't go very far. Personally, I think the only hope of creating holistic change needs to start at the discipline level and would need to come from the top down vs. the bottom up. If a coalition of the top journals in a given discipline were to come together and adopt new rules on data disclosure and sharing that were standard and implemented uniformly, every could change. As it stands, I have never seen something like that happen. Within Economics, many of the top journals in theory require data accessibility for publication but the actual implementation of these rules is spotty and hasn't spread to all publications.
With such a downtrodden take on the subject I should highlight a couple of cases that are the exceptions to the rule:
- We have a major innovation survey hitting the field at some point in the spring through Duke and Georgia Tech. The principals on that project, with a little prodding, have created a user research consortium including some emerging scholars, who will use NORC's Data Enclave tool to allow for a geographically diverse community of scholars to leverage the data collected.
- Rob Wiltbank and Rob Fairlie have both been very generous in making data they created for different Kauffman projects available. The Angel Investor Performance Project and the Kauffman Index of Entrepreneurial Activity both allow for micro-data-based research.
I am not including here a discussion of public-use data sets only because such data sets have an added layer of complexity although many of the issues identified here and in the Nature
article are also applicable to on public-use data files.
Are you passionate on this topic or have an idea? Let me know