Call for Proposals
Kauffman Firm Survey Data Extension – Data Matching
Deadline: January 15, 2012, 5 p.m. CST
As the Kauffman Foundation nears the completion of its eight-year panel study on new firms in the United States, the Kauffman Firm Survey (KFS), we are seeking interested scholars who would like to extend the core survey data in ways that do not increase the burden on survey respondents.1 While more than 6,000 variables are included in the confidential version of the KFS microdata, we recognize that there are additional opportunities for research that become available by incorporating new sources of data to leverage the existing KFS survey information. Through this grant program we hope to accomplish the following:
- Expand the community of experts involved with the KFS to include scholars with expertise in natural language processing, web scraping, and related approaches;
- Create reusable infrastructure;2 and
- Use the prototype/demo infrastructure to evaluate the effectiveness of the approach—missingness, accuracy, utility—to expand our understanding of different approaches for matching to existing data sets.
Some potential areas of extension:
- Website content and activities. Business websites contain information on operations and strategy. While the KFS survey asked about the presence of a business website and issues such as annual product turnover due to sales online, we suspect research in the area of online business activities might be augmented by variables that are extracted or summarized from business website content over time.3
- Patent content. Finding ways to automatically or semi-automatically match in additional information from public patent documentation could enhance our knowledge of the patents owned, licensed in, and licensed out, along with other types of intellectual property and intangible asset investments beyond the current numerical categories captured in the survey.
- Other. While we have identified some areas of likely interest in extending the data, we are open to other extensions not identified here.
Multiple projects are likely to be funded with individual project budgets up to $50,000 being preferred. The Foundation has a particular interest in projects that could be aligned to utilize funding from the Foundation as well as matching funding from other sources. For example, some currently open solicitations from the National Science Foundation appear to be in areas of possible overlapping interest, and scholars expressing interest are encouraged to explore and speak to their interest in these opportunities (e.g., http://www.nsf.gov/pubs/2011/nsf11584/nsf11584.htm and http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=501084). For scholars seeking to submit proposals to other funders for consideration, the Foundation will consider accelerated requests for conditional funding.
Confidentiality of the KFS respondents is of utmost importance to the Foundation, as is the actual use of the KFS data in meaningful research. On the current research files for the KFS, all data have been anonymized. Under this call, the Foundation is considering proposals for research that would require additional firm-specific identifying information in order to be carried out. Such research will only be carried out after clearing reviews at Mathematica Policy Research, the survey research firm collecting and curating the KFS, and the Kauffman Foundation. The work will require researchers to adhere to strict data protection requirements. Additionally, wherever possible, the Foundation will likely require the matching occur inside its NORC Data Enclave secure, remote environment to ensure the security of the data while allowing for desktop matching opportunities.4 Any matched data in the project would ultimately need to be anonymized and become available for use in the KFS portion of the secure NORC Data Enclave environment or through the Census Research Data Centers following an agreed upon period of time.5
Proposals are sought from academic, commercial, and governmental parties who have, or can develop, tools that they believe could be the basis for a fruitful extension of the KFS data. Two types of proposals will be considered:
- Funding. Proposals seeking partial or full grant funding from the Foundation to perform the proposed activities will only be considered for scholars located in the United States; should limit funding requests to $50,000; and not include indirect costs.6
- No funding. In instances where no funding is required from the Foundation to perform the proposed tasks because of existing research support, proposals will be considered from U.S. and foreign organizations.
Applications should include:
- a description of the data and/or variables that the interested parties can extract and would like to match into the KFS sample,
- a high level overview of: 1) the tools that would accomplish the data extraction; 2) the sources of data that they would tap as the source of extracted data/variables, and 3) the matching processes that would ensure correct integration/linking into existing KFS records
- a statement of benefits, i.e., why the matched data would provide additional research insights beyond what is already included in the KFS data,7
- a line-item budget, and
- background as to expertise of the research team as it specifically relates to the proposed tools, sources, and processes.
It is anticipated that funding would be awarded in spring 2012 with most of the proposed data augmentation activities occurring in 2012/13.
All applications and questions should be sent electronically to E.J. Reedy at firstname.lastname@example.org.
1A full bibliography of research is available at http://www.kauffman.org/kfs/KFSWiki/Related-Research.aspx
2This infrastructure can be final or prototype form, preferably using open source components to the greatest extent, that will support the data scraping and matching relevant to the KFS.
3Additional details on data available in the KFS for online activities will be forthcoming in “Casting a Wide Net: Online Activities of Small and New Businesses in the United States” to be published October 18 and available by request before.
7While most KFS questions have remained constant over time there have been some additions and modifications over the years. As the simplest means of exploring this, we suggest reviewing the baseline (2004) and 2008 questionnaires - http://www.kauffman.org/kfs/About-the-KFS.aspx.
FAQs - Data Extension RFP
Questions about the RFP that appear of general interest will be answered here.
Q1) You indicate a preference for using the NORC Data Enclave as a place for doing potential work. Do we need to allocate $5,200 of the budget to cover the cost of a NORC seat for this project?
A1) No, please do not include the cost of a NORC seat in your budget. All approved projects will receive a Kauffman-sponsored seat in the NORC Data Enclave at no direct cost to the project.
Q2) Do you have website URLs for businesses or just indicators of if the business does/does not have a website?
A2) We collected website URLs for businesses over time that would potentially be available under confidentiality agreement for researchers successful in this RFP. Otherwise, in October 2011 the KFS data files were amended to include a yes/no indicator of business website.