Longitudinal and Survival Analysis of Business Data: A Two-Day Training Using the Kauffman Firm Survey

Utilizing the Kauffman Firm Survey, a panel data set with six years of data on businesses that began operations in the United States in 2004, this short-course provides a small-scale setting to explore research techniques in longitudinal analysis of business data.

Longitudinal data offer many opportunities and potential pitfalls. With such data, researchers can examine predictor and response variables at two or more points in time. These kinds of data have two major attractions: the ability to control for unobservables and the determination of causal ordering. However, problems can arise if longitudinal data are not used properly. Repeated observations are typically correlated and this invalidates the usual assumption that observations are independent. In this training we examine methods to deal with this dependence: robust standard errors, generalized estimating equations, random effects models and fixed effects models. We also examine different methods for quantitative outcomes, categorical outcomes, count data outcomes, and survival analysis.

Who should watch the workshop?

This workshop is for researchers seeking to learn better methods for analyzing longitudinal business data and have a basic statistical background. The primary audience are researchers currently using the KFS for analysis or who have an interest in possible research using the KFS. Viewers should have a good working knowledge of the principles and practice of econometrics and statistics.

Technical Requirements

This workshop used STATA (release 11) for the empirical examples.

Longitudinal and Survival Analysis of Business Data: A Two-Day Training Using the Kauffman Firm Survey, Recorded August 1-2, 2012 in Boston, Massachusetts

Download the presentation PDF >

Course Outline

Day One
  1. Opportunities and challenges of panel data
    • Data requirements
    • Controlling for unobservables
    • Determining causal order
    • Problem of dependence
  2. Linear models
    • Robust standard errors
    • Generalized estimating equations
    • Random effects models
    • Fixed effects models
    • Hybrid models
  3. Logistic regression models
    • Robust standard errors
    • Generalized estimating equations
    • Subject-specific vs. population averaged methods
    • Random effects models
    • Fixed effects models
    • Hybrid models
  4. Count data models
    • Poisson models
    • Negative binomial models
    • Fixed and random effects
  5. Linear structural equation models
    • Reciprocal causation with lagged effects

Day 2

  1. Fundamentals of Survival Analysis
    • Problems with conventional methods
    • Types of censoring
    • Kaplan-Meier estimation
    • Proportional hazards models
    • Partial likelihood estimation
    • Interpretation of parameters
    • Competing risks
    • Time dependent covariates
    • Discrete time analysis
    • Heterogeneity and time dependence
    • R-squared
  2. KFS Data: Specifics
    • Missing data and imputation
    • Summary of KFS research


The following sources provide the foundation for the course:

  • Fixed Effects Regression Methods for Longitudinal Data Using Stata by Paul Allison
  • An Introduction to Survival Analysis Using Stata by Cleves, Gould, Gutierrez, and Marchenko
  • Course Packs on Longitudinal Analysis and Survival Analysis using Stata by Paul Allison


Alicia Robb was a senior research fellow at the Ewing Marion Kauffman Foundation and was the principal investigator on the Kauffman Firm Survey. Previously, she was an economist at the Division of Research and Statistics, Board of Governors of the Federal Reserve System and an economist at the Office of Economic Research at the U.S. Small Business Administration. A leading expert in small business data in the U.S., she has a Ph.D. in Economics from the University of North Carolina, Chapel Hill with a specialization in econometrics.