Improving Industry Classifications

A new paper out from the National Bureau of Economic Research - “Dynamic Text-Based Industry Classifications and Endogenous Product Differentiation” by Gordon M. Phillips and Gerard Hoberg - discusses the power large-scale text analysis can provide in examining industrial classifications and other traditionally nebulous areas of differentiation among firms and markets. 

Although it is convenient to use existing industry classifications such as SIC or NAICS for research purposes, these measures have limitations. Both do not adjust significantly over time as product markets evolve. Innovations can also create new product markets that do not exist in fixed classifications. In the late 1990s, hundreds of new technology and web-based firms were grouped into a large and nondescript SIC-based \business services" industry. More generally, fixed classifications like SIC and NAICS have at least four shortcomings: they only rarely re-classify firms into different industries as firm product offerings change, they do not allow for product markets themselves to evolve over time, they do not allow for the possibility that two firms that are rivals to a third firm, might not directly compete against each another, and lastly, they do not allow for within industry continuous measures of similarity to be computed.

This is a timely publication as the Office of Management and Budget (OMB) is in the final stages of seeking approval (and feedback) about the 2012 revisions to the North American Industrial Classification System. While there is a lot of effort made to update these industry classifications unfortunately I do not believe that government officials are yet taking advantage of some of the methods which are described in this paper which mine existing data to look for discontinuities in how industries are defined, when firms change industries, or other aspects of industrial organization. 

Now, the prospect of the government performing large-scale text analysis like this might scare some, but in my mind, there are groups like the Center for Economic Studies at the Census Bureau or other places like the Statistics of Income Division at the Internal Revenue Service who could do this responsibly if given the mandate, funding, and some lead time. These places house large quantities of text data yet maintain separate research functions and most importantly they maintain processes for seeking outside researcher proposals for cutting edge research which would benefit the agencies through improved data products. I’ve never heard staff at either of these locations discuss this NAICS redesign as a high priority but perhaps if OMB were using their coordinating powers and discretionary funding with more force, that could change. 

Identifying new industries clusters, and other big changes in the industrial organization faster and more accurately remains a key deficiency in the current national statistical system. The U.S. regions who are on the front line of economic development rely too much on private data to try to understand change in their economies because the federal system has too often missed the data needs of the diffused customers here.  Coincidentally, the Council for Community and Economic Research annual conference starts today in Washington, DC. This is the most organized group of individuals advocating for improved regional economic statistics in the United States. 

I should note that while there is great potential power in the methods employed by Phillips and Hoberg, the authors also note the potential gaming which could be used by firms if they felt the text they were sharing could be manipulated to effect government policies to the firms' advantage. “We also note that while our new measures are interesting for research or scientific purposes, they would not be good for policy and antitrust purposes as they could be manipulated by firms fairly easily if firms believed they were being used by policy makers.” I think these methods would be best added to an existing review process and not seen as a substitute. In that case, the ability to game the system could be reduced.

comments powered by Disqus
e.j. reedy data maven

E. J. Reedy

As a director in Research and Policy, E.J. Reedy oversees the Ewing Marion Kauffman Foundation’s research initiatives related to education, human capital development, and data.

Since joining the Kauffman Foundation in 2003, Reedy has been significantly involved in the coordination of the Foundation’s entrepreneurship and innovation data-related initiatives, including the Kauffman Firm Survey, for which he served as a principal investigator, and the Foundation’s multi-year series of symposiums on data, as well as many web-related projects and initiatives. He is a globally recognized expert in entrepreneurship and innovation measurement and has consulted for a variety of agencies.