Big Data and National Statistical Agencies

This is part three of a four-part blog series where Kauffman Foundation Research Director E.J. Reedy interviews Ron Jarmin, Assistant Director of Research and Methodology at the U.S. Census Bureau, to learn more about how Census is continuing to create new, innovative data resources and about some of its work as a partner with the newly created Institute for Research on Innovation and Science at the University of Michigan. The next part of this interview will be released on Kauffman’s Growthology blog. 

Read part one on Institute for Research on Innovation and Science.

Read part two on Finding More Timely Economic Statistics. 

Part 3: Big Data and National Statistical Agencies

Q (Reedy):            Now, Ron, I haven’t heard you say the words “big data.” Is that purposeful?

A (Jarmin):             Well, some of the things that we do people would call big data. I’m not sure big data is the right term … I think, like many others, we’re not really sure what big data means. The modern world gives us an opportunity to get much more data, much more quickly, and then to produce a much wider array of value-added products for our data users. So I think we’re just trying to take advantage of that, especially where it makes sense for a statistical agency like the Census Bureau to play that role.

One thing we’re looking at is how this landscape is changing. We’re going to move away from the model where the stat agencies are the soup-to-nuts providers of statistical information about the economy and population. If you go back decades, a place like the Census Bureau would be approached about producing data on X and they would go out and define the population from which they wanted to collect information on X. They would take a sample, they’d weight up the results, and they’d publish tabular data with the information. Nobody else was really involved. It was the Census Bureau from beginning to end.

I think we’re moving more toward a world where different parts of the economy or the population will contribute substantially to the production of some of the data before it comes to us. We will get it in a partially processed form and we will process it further. An important thing that a stat agency will do will be to provide some sort of credibility to the estimates, a level of trustworthiness that perhaps private sector organizations might not be able to do. We can provide a reliable methodology and transparent processes. Then we can push data out in ways that allow it to be further massaged and visualized and interpreted by other organizations, whether that be in the media or think tanks or academia, etc. So that there’s more of a value chain of the production of statistical information that the public would eventually consume. Where we play an important role is inserting solid methodology and transparent processes in the chain.

Q:            What is the difference, in your mind, between a one-time engagement on a particular topic and something that’s more of an ongoing data series?

A:             I want to stress that we don’t see the world going away from surveys completely. But for things that are the regular sort of economic measurement activities that occur on an ongoing basis, we would very much like to move those into an automated type of production. But there are things that you’re not going to be able to learn from those systems, like the regular bookkeeping systems of companies or other organizations. An example would be our survey on management practices. That kind of information is something that you actually have to ask somebody. You need a human to tell you what the answer is. So we see those sorts of things as still being where we do survey-type work. For one-time things, I think that’s probably where we would still rely on our expertise on survey methodology to collect the information. But always with the vision that it would be integrated with our richer ongoing infrastructure so that you don’t have to collect everything you need to know on a survey, only the things that you can’t get from other sources. So, we used to say we augmented our survey data with administrative records. I think in the future we’ll be saying, we augment our administrative records with survey data.

Q:            The change in the organization to more timely data collection and release is obviously a big process and a fairly big initiative for change within any sort of organization. How have you been taking into account internal staff proficiencies and capital as you undertake to make these types of expansions?

A:             We have engaged in offering some training that actually has been very closely related to the work we’ve been doing on the IRIS-IMI project in the sense that we’ve used the project to provide a ready-made in-class example for our staff to work through as they are receiving training and professional development. Obviously, I think we’re changing in some ways how we recruit folks to get more people with these sorts of skills in the building.

This is going to be an evolutionary change as opposed to changing how we measure the economy overnight. We’re going to use examples like what we’re doing with the IRIS-IMI project to prove in various aspects of this slowly, like ice crystals growing on a pond. Building on these early successes, we then proceed to increase the amount of the economy that we measure with new, modern methods. But, this is something that we have to be somewhat careful about because when people have been using our data for many decades, they’re going to want to have reliability and comparability. So we don’t really have the option of turning off the old way and moving to the new way. I don’t think we could anyway. We’ll need time to test new procedures, train staff and acclimate users to new products. But this is a huge challenge in an organization like the Census Bureau, and I think there are various things we’re doing within the research area to ready us for that. There are things going on in the production units and with the most senior management in the Bureau to enable this sort of thing to happen.

Q:            Do you have a name for this training class you’ve developed?

A:             We are calling it the University of Chicago/Census Bureau Big Data class.

Q:            And what are the elements of that class that you feel have been particularly helpful?

A:             I think the students learned a number of big data techniques like machine learning, text analysis, new software and skills, etc. But I think for a lot of the stuff that we’ve done so far, that some of the text analysis tools and stuff that they’ve been using to parse this rich glob of administrative data we’ve been getting from partner universities involved in the Institute for Research on Innovation and Science and make it more manageable has probably been some of the more useful. So we’re moving away from a world where, as a survey organization, you collect very structured information. We’re moving to a world where we’re going to encounter much more unstructured information. And so the ability of our people to take that unstructured information and turn it into statistical information is going to be key. I think that’s been one of the things that initially we didn’t quite expect was going to be a big deal. But it’s been turning out to be a really important aspect to this.

Q:            One of the elements I’ve been impressed with in the structure here is that, first off, you’re taking people from across the organization to be a part of this class. It’s not just the people that have immediate interest in this particular production of data. And, two, this is not a one-off class. Could you speak a little bit to some of the elements of the design that have been helpful to making this a group that’s really looking at the broader concepts of innovation?

A:             What we’ve tried to do with the big data classes is have a core set of skills and some knowledge transferred to the students. But we very much put this in their head that we want them working on projects after they’re done with the class so their skills stay sharp. And we’ve involved several of the students in some of our more innovative transformative projects like the IRIS-IMI project. But we also ask them as they go back to some of the more bread and butter stuff that the Bureau does to apply some of the tools they’ve gained in that environment as well. And I think we’ve seen some of that going on.

The class has been opened up to some partner agencies around town. So we’ve had folks in the Bureau of Labor Statistics, Bureau of Economic Analysis, Federal Reserve Board, and the U.S. Patent and Trademark Office all involved in the class, as well. So I think you see there’s a little bit of a beginning of a sort of a subgroup within the statistical infrastructure here in D.C. of trying to move out on some of these things. It’s still going to be hard within all of these organizations to devote enough resources to innovate while maintaining the production of the stuff that they currently do. But, I think this is important and we’re going to have to find a way to get that done because I don’t think we can maintain the status quo and be able to produce the kind of information that the economy is going to need going forward.


comments powered by Disqus
e.j. reedy data maven

E. J. Reedy

As a director in Research and Policy, E.J. Reedy oversees the Ewing Marion Kauffman Foundation’s research initiatives related to education, human capital development, and data.

Since joining the Kauffman Foundation in 2003, Reedy has been significantly involved in the coordination of the Foundation’s entrepreneurship and innovation data-related initiatives, including the Kauffman Firm Survey, for which he served as a principal investigator, and the Foundation’s multi-year series of symposiums on data, as well as many web-related projects and initiatives. He is a globally recognized expert in entrepreneurship and innovation measurement and has consulted for a variety of agencies.