Earnings Dynamics Tutorial - Part 2: AR(1) Process Simulation and Estimation
Overview
In this tutorial, we'll build a complete pipeline for simulating and estimating earnings dynamics using an AR(1) process. You'll learn to generate synthetic data, estimate parameters both manually and using libraries, and conduct Monte Carlo analysis.
Prerequisites
- Complete Part 1 of this tutorial series
- Have your Python environment set up with pandas, numpy, matplotlib, and scipy
Step 1: Reuse Your Python Environment
Make sure your environment from Part 1 is activated and all necessary packages are available. You should have:
Step 2: AR(1) Data Simulation
We'll create a function to simulate earnings data following an AR(1) process:
Your Task: Implement the following function signature:
def simulate_ar1_earnings(n_individuals, n_periods, rho, sigma_shock, sigma_intercept, seed=None):
"""
Simulate earnings data following AR(1) process: y_it = alpha_i + rho * y_{i,t-1} + epsilon_it
Parameters:
-----------
n_individuals : int
Number of individuals
n_periods : int
Number of time periods
rho : float
Autoregressive parameter
sigma_shock : float
Standard deviation of transitory shocks
sigma_intercept : float
Standard deviation of individual-specific intercepts
seed : int, optional
Random seed for reproducibility
Returns:
--------
pd.DataFrame
DataFrame with columns: ['individual_id', 'period', 'earnings', 'shock', 'intercept']
"""
# YOUR CODE HERE
pass
Hints: - Start each individual's earnings from their intercept plus a shock - Use numpy's random number generators - Remember that AR(1) means each period depends on the previous period - Return a long-format DataFrame (one row per individual-period)
Step 3: Manual AR(1) Parameter Estimation
Now we'll estimate the autoregressive parameter ρ using a simple regression approach.
Your Task: Implement the following function:
def estimate_rho_manual(df):
"""
Manually estimate rho parameter using least squares regression.
Parameters:
-----------
df : pd.DataFrame
DataFrame with earnings data (from simulate_ar1_earnings)
Returns:
--------
dict
Dictionary containing 'rho_estimate', 'intercept_estimate', 'residuals'
"""
# YOUR CODE HERE
# Steps:
# 1. Create lagged earnings variable
# 2. Drop first period (no lag available)
# 3. Set up X matrix and y vector
# 4. Solve normal equations: beta = (X'X)^(-1) X'y
# 5. Calculate residuals
pass
Key Steps: - Create a lagged earnings variable within each individual - Remove observations where lag is not available - Build the design matrix X with intercept and lagged earnings - Use matrix algebra to solve the normal equations
Step 4: Library-Based Estimation
Your Task: Implement using a statistical library:
def estimate_rho_library(df):
"""
Estimate rho using a statistical library (e.g., statsmodels or sklearn).
Parameters:
-----------
df : pd.DataFrame
DataFrame with earnings data
Returns:
--------
dict
Dictionary containing estimation results including standard errors
"""
# YOUR CODE HERE
pass
Step 5: Apply Estimators to Simulated Data
Your Task: Test your estimators:
# Generate test data
np.random.seed(42)
true_params = {'rho': 0.7, 'sigma_shock': 0.3, 'sigma_intercept': 0.5}
sim_data = simulate_ar1_earnings(
n_individuals=1000,
n_periods=10,
**true_params
)
# Apply both estimators
manual_results = estimate_rho_manual(sim_data)
library_results = estimate_rho_library(sim_data)
# Compare results
print(f"True rho: {true_params['rho']}")
print(f"Manual estimate: {manual_results['rho_estimate']:.4f}")
print(f"Library estimate: {library_results['rho_estimate']:.4f}")
Step 6: Apply to Real Data
Your Task: Load your real earnings data and apply the same estimation procedure:
def analyze_real_data(data_path):
"""
Apply AR(1) estimation to real earnings data.
Parameters:
-----------
data_path : str
Path to real earnings data
Returns:
--------
dict
Estimation results for real data
"""
# YOUR CODE HERE
pass
Step 7: Variance/Autocovariance Analysis
Your Task: Compare model fit by examining second moments:
def compare_moments(real_data, simulated_data, max_lag=5):
"""
Compare variance and autocovariances between real and simulated data.
Parameters:
-----------
real_data : pd.DataFrame
Real earnings data
simulated_data : pd.DataFrame
Simulated earnings data
max_lag : int
Maximum lag for autocovariance calculation
Returns:
--------
pd.DataFrame
Comparison of moments
"""
# YOUR CODE HERE
# Calculate:
# - Variance of earnings growth
# - Autocovariances at different lags
# - Create comparison table
pass
Step 8: Monte Carlo Analysis
Finally, we'll assess the finite-sample properties of our estimator through simulation.
Your Task: Implement the Monte Carlo study:
def monte_carlo_study(true_rho, n_simulations=200, n_individuals=500, n_periods=8,
sigma_shock=0.3, sigma_intercept=0.5):
"""
Conduct Monte Carlo study of rho estimation.
Parameters:
-----------
true_rho : float
True value of autoregressive parameter
n_simulations : int
Number of Monte Carlo replications
Other parameters as in simulate_ar1_earnings
Returns:
--------
dict
Dictionary with results including estimates array and summary statistics
"""
# YOUR CODE HERE
# For each simulation:
# 1. Generate new data
# 2. Estimate rho
# 3. Store estimate
# Calculate bias, standard error, coverage, etc.
pass
def plot_monte_carlo_results(mc_results, true_rho):
"""
Plot distribution of Monte Carlo estimates.
Parameters:
-----------
mc_results : dict
Results from monte_carlo_study
true_rho : float
True parameter value
"""
# YOUR CODE HERE
# Create histogram of estimates
# Add vertical line for true value
# Add summary statistics as text
pass
Part B: Measurement Error and Non-Normal Distributions
Now we'll extend our analysis to include measurement error and explore non-normal distributions with higher kurtosis.
Step 9: Adding Measurement Error
Your Task: Modify the simulation to include measurement error:
def simulate_ar1_with_measurement_error(n_individuals, n_periods, rho, sigma_shock,
sigma_intercept, sigma_measurement, seed=None):
"""
Simulate earnings data with measurement error.
True model: y*_it = alpha_i + rho * y*_{i,t-1} + epsilon_it
Observed: y_it = y*_it + u_it, where u_it ~ N(0, sigma_measurement^2)
Parameters:
-----------
sigma_measurement : float
Standard deviation of measurement error
Other parameters same as simulate_ar1_earnings
Returns:
--------
pd.DataFrame
DataFrame with columns: ['individual_id', 'period', 'earnings_true', 'earnings_observed',
'shock', 'measurement_error', 'intercept']
"""
# YOUR CODE HERE
pass
Step 10: Compare Autocovariances with Measurement Error
Your Task: Analyze how measurement error affects the autocovariance structure:
def compare_autocovariances_measurement_error(sigma_measurement_values, true_rho=0.7,
n_individuals=1000, n_periods=10):
"""
Compare autocovariances for different levels of measurement error.
Parameters:
-----------
sigma_measurement_values : list
List of measurement error standard deviations to test
Other parameters for simulation
Returns:
--------
pd.DataFrame
Table comparing autocovariances across measurement error levels
"""
# YOUR CODE HERE
pass
Step 11: Estimation Bias from Measurement Error
Your Task: Run Monte Carlo to show estimation bias:
def monte_carlo_measurement_error(true_rho, sigma_measurement_values, n_simulations=200):
"""
Monte Carlo study showing bias from measurement error.
Parameters:
-----------
true_rho : float
True autoregressive parameter
sigma_measurement_values : list
Different levels of measurement error to test
n_simulations : int
Number of Monte Carlo replications
Returns:
--------
dict
Results for each measurement error level
"""
# YOUR CODE HERE
pass
def plot_measurement_error_bias(mc_results, true_rho):
"""
Plot showing bias as function of measurement error.
Parameters:
-----------
mc_results : dict
Results from monte_carlo_measurement_error
true_rho : float
True parameter value
"""
# YOUR CODE HERE
# Create subplot with:
# 1. Histogram of estimates for each measurement error level
# 2. Bias vs measurement error plot
pass
Step 12: Non-Normal Distributions with Kurtosis
Your Task: Implement simulation with heavy-tailed shocks:
def simulate_ar1_heavy_tails(n_individuals, n_periods, rho, sigma_shock, sigma_intercept,
distribution='t', df_param=3, seed=None):
"""
Simulate AR(1) earnings with non-normal shock distributions.
Parameters:
-----------
distribution : str
Type of distribution ('normal', 't', 'laplace')
df_param : float
Degrees of freedom parameter for t-distribution
Other parameters same as before
Returns:
--------
pd.DataFrame
Simulated data with specified shock distribution
"""
# YOUR CODE HERE
pass
def compare_distributions_monte_carlo(true_rho, distributions_to_test, n_simulations=200):
"""
Compare estimator performance across different shock distributions.
Parameters:
-----------
true_rho : float
True autoregressive parameter
distributions_to_test : list
List of distribution specifications to test
n_simulations : int
Number of Monte Carlo replications
Returns:
--------
dict
Results comparing estimator performance across distributions
"""
# YOUR CODE HERE
pass
Exercise Workflow
- Start with simulation: Get your
simulate_ar1_earningsfunction working with small test cases - Manual estimation: Implement the regression by hand, verify with simple examples
- Library estimation: Compare your manual results with library functions
- Validation: Test both estimators on data with known parameters
- Real data analysis: Apply to actual earnings data
- Moment comparison: Check how well your model matches data properties
- Monte Carlo: Assess estimator properties through repeated sampling
- Measurement error: Understand how measurement error biases estimates
- Heavy tails: Explore robustness to distributional assumptions
Tips for Success
- Start with small datasets to debug your code
- Print intermediate results to verify calculations
- Use assertions to check your data transformations
- Plot your simulated data to ensure it looks reasonable
- Compare manual and library estimates as a sanity check
Questions for Reflection
- How does the bias of your estimator change with sample size?
- What happens to estimation accuracy as ρ approaches 1?
- How sensitive are your results to the choice of σ_shock and σ_intercept?
- Do the second moments of your simulated data match the real data well?
- How does measurement error affect the autocovariance structure?
- Why does measurement error create downward bias in ρ estimation?
- How robust is the estimator to heavy-tailed shock distributions?
Getting Help
If you get stuck on any function implementation, you can ask an AI assistant for help with the specific function signature. Focus on understanding the logic rather than just copying code.
Remember: The goal is to understand the relationship between theory, simulation, and estimation in earnings dynamics modeling!