11/11/2009 9:32:22 AM By
E.J. Reedy
Friday, I was at an advisory committee meeting of the
Statistics of Income (SOI) Division at the Internal Revenue Service (IRS). I wanted highlight one presentation and discussion from the SOI, and staffer Nick Greenia in particular, who continues to look very thoughtfully at their data dissemination activities. In their comments, I see a recognition of the value of engaging researchers in increasing the quality as well as understanding of their data. IRS is proceeding slowly with a researcher-engagement agenda but indicated at the meeting they would have a new call coming out in the next few months for research proposals. When this comes out, I will post a link.
Additionally, Mr. Greenia outlined a theme that I think most federal agencies are wrangling with currently - the increasing risk which public-use microdata files pose in a world of increasing data availability. For a definition of a public-use microdata file, I turned to
Statistics Canada which has a definition online as, "Microdata files that have been carefully anonymized (i.e., all identifying information has been removed) and scrutinized to ensure that no risk of breach of individual privacy or confidentiality exists." The concern is that as people make more information available about themselves online the level to which public-use data files must be stripped of content to keep them anonymous makes them of little relevance for most research. So what does Mr. Greenia propose? He doesn't come down in favor of any one option but highlights many of the advantages and disadvantages of the emerging solutions:
- Synthetic public-use data files
- Virtual data enclaves
- Data research centers
Mr. Greenia has been nice enough to consent to my posting of a
PDF of his presentation so you can read his thoughts directly.
Additionally, I thought I'd highlight that
SOI released some updated tables on business financing last week. These are aggregate tables, not the type of files I highlighted here, but still of interest for some research purposes.