Earnings Dynamics Tutorial - Part 2: AR(1) Process Simulation and Estimation

Overview

In this tutorial, we'll build a complete pipeline for simulating and estimating earnings dynamics using an AR(1) process. You'll learn to generate synthetic data, estimate parameters both manually and using libraries, and conduct Monte Carlo analysis.

Prerequisites

Complete Part 1 of this tutorial series
Have your Python environment set up with pandas, numpy, matplotlib, and scipy

Step 1: Reuse Your Python Environment

Make sure your environment from Part 1 is activated and all necessary packages are available. You should have:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

Step 2: AR(1) Data Simulation

We'll create a function to simulate earnings data following an AR(1) process:

Your Task: Implement the following function signature:

def simulate_ar1_earnings(n_individuals, n_periods, rho, sigma_shock, sigma_intercept, seed=None):
    """
    Simulate earnings data following AR(1) process: y_it = alpha_i + rho * y_{i,t-1} + epsilon_it

    Parameters:
    -----------
    n_individuals : int
        Number of individuals
    n_periods : int
        Number of time periods
    rho : float
        Autoregressive parameter
    sigma_shock : float
        Standard deviation of transitory shocks
    sigma_intercept : float
        Standard deviation of individual-specific intercepts
    seed : int, optional
        Random seed for reproducibility

    Returns:
    --------
    pd.DataFrame
        DataFrame with columns: ['individual_id', 'period', 'earnings', 'shock', 'intercept']
    """
    # YOUR CODE HERE
    pass

Hints: - Start each individual's earnings from their intercept plus a shock - Use numpy's random number generators - Remember that AR(1) means each period depends on the previous period - Return a long-format DataFrame (one row per individual-period)

Step 3: Manual AR(1) Parameter Estimation

Now we'll estimate the autoregressive parameter ρ using a simple regression approach.

Your Task: Implement the following function:

def estimate_rho_manual(df):
    """
    Manually estimate rho parameter using least squares regression.

    Parameters:
    -----------
    df : pd.DataFrame
        DataFrame with earnings data (from simulate_ar1_earnings)

    Returns:
    --------
    dict
        Dictionary containing 'rho_estimate', 'intercept_estimate', 'residuals'
    """
    # YOUR CODE HERE
    # Steps:
    # 1. Create lagged earnings variable
    # 2. Drop first period (no lag available)
    # 3. Set up X matrix and y vector
    # 4. Solve normal equations: beta = (X'X)^(-1) X'y
    # 5. Calculate residuals
    pass

Key Steps: - Create a lagged earnings variable within each individual - Remove observations where lag is not available - Build the design matrix X with intercept and lagged earnings - Use matrix algebra to solve the normal equations

Step 4: Library-Based Estimation

Your Task: Implement using a statistical library:

def estimate_rho_library(df):
    """
    Estimate rho using a statistical library (e.g., statsmodels or sklearn).

    Parameters:
    -----------
    df : pd.DataFrame
        DataFrame with earnings data

    Returns:
    --------
    dict
        Dictionary containing estimation results including standard errors
    """
    # YOUR CODE HERE
    pass

Step 5: Apply Estimators to Simulated Data

Your Task: Test your estimators:

# Generate test data
np.random.seed(42)
true_params = {'rho': 0.7, 'sigma_shock': 0.3, 'sigma_intercept': 0.5}
sim_data = simulate_ar1_earnings(
    n_individuals=1000, 
    n_periods=10, 
    **true_params
)

# Apply both estimators
manual_results = estimate_rho_manual(sim_data)
library_results = estimate_rho_library(sim_data)

# Compare results
print(f"True rho: {true_params['rho']}")
print(f"Manual estimate: {manual_results['rho_estimate']:.4f}")
print(f"Library estimate: {library_results['rho_estimate']:.4f}")

Step 6: Apply to Real Data

Your Task: Load your real earnings data and apply the same estimation procedure:

def analyze_real_data(data_path):
    """
    Apply AR(1) estimation to real earnings data.

    Parameters:
    -----------
    data_path : str
        Path to real earnings data

    Returns:
    --------
    dict
        Estimation results for real data
    """
    # YOUR CODE HERE
    pass

Step 7: Variance/Autocovariance Analysis

Your Task: Compare model fit by examining second moments:

def compare_moments(real_data, simulated_data, max_lag=5):
    """
    Compare variance and autocovariances between real and simulated data.

    Parameters:
    -----------
    real_data : pd.DataFrame
        Real earnings data
    simulated_data : pd.DataFrame  
        Simulated earnings data
    max_lag : int
        Maximum lag for autocovariance calculation

    Returns:
    --------
    pd.DataFrame
        Comparison of moments
    """
    # YOUR CODE HERE
    # Calculate:
    # - Variance of earnings growth
    # - Autocovariances at different lags
    # - Create comparison table
    pass

Step 8: Monte Carlo Analysis

Finally, we'll assess the finite-sample properties of our estimator through simulation.

Your Task: Implement the Monte Carlo study:

def monte_carlo_study(true_rho, n_simulations=200, n_individuals=500, n_periods=8, 
                     sigma_shock=0.3, sigma_intercept=0.5):
    """
    Conduct Monte Carlo study of rho estimation.

    Parameters:
    -----------
    true_rho : float
        True value of autoregressive parameter
    n_simulations : int
        Number of Monte Carlo replications
    Other parameters as in simulate_ar1_earnings

    Returns:
    --------
    dict
        Dictionary with results including estimates array and summary statistics
    """
    # YOUR CODE HERE
    # For each simulation:
    # 1. Generate new data
    # 2. Estimate rho
    # 3. Store estimate
    # Calculate bias, standard error, coverage, etc.
    pass

def plot_monte_carlo_results(mc_results, true_rho):
    """
    Plot distribution of Monte Carlo estimates.

    Parameters:
    -----------
    mc_results : dict
        Results from monte_carlo_study
    true_rho : float
        True parameter value
    """
    # YOUR CODE HERE
    # Create histogram of estimates
    # Add vertical line for true value
    # Add summary statistics as text
    pass

Part B: Measurement Error and Non-Normal Distributions

Now we'll extend our analysis to include measurement error and explore non-normal distributions with higher kurtosis.

Step 9: Adding Measurement Error

Your Task: Modify the simulation to include measurement error:

def simulate_ar1_with_measurement_error(n_individuals, n_periods, rho, sigma_shock, 
                                      sigma_intercept, sigma_measurement, seed=None):
    """
    Simulate earnings data with measurement error.
    True model: y*_it = alpha_i + rho * y*_{i,t-1} + epsilon_it
    Observed: y_it = y*_it + u_it, where u_it ~ N(0, sigma_measurement^2)

    Parameters:
    -----------
    sigma_measurement : float
        Standard deviation of measurement error
    Other parameters same as simulate_ar1_earnings

    Returns:
    --------
    pd.DataFrame
        DataFrame with columns: ['individual_id', 'period', 'earnings_true', 'earnings_observed', 
                                'shock', 'measurement_error', 'intercept']
    """
    # YOUR CODE HERE
    pass

Step 10: Compare Autocovariances with Measurement Error

Your Task: Analyze how measurement error affects the autocovariance structure:

def compare_autocovariances_measurement_error(sigma_measurement_values, true_rho=0.7, 
                                            n_individuals=1000, n_periods=10):
    """
    Compare autocovariances for different levels of measurement error.

    Parameters:
    -----------
    sigma_measurement_values : list
        List of measurement error standard deviations to test
    Other parameters for simulation

    Returns:
    --------
    pd.DataFrame
        Table comparing autocovariances across measurement error levels
    """
    # YOUR CODE HERE
    pass

Step 11: Estimation Bias from Measurement Error

Your Task: Run Monte Carlo to show estimation bias:

def monte_carlo_measurement_error(true_rho, sigma_measurement_values, n_simulations=200):
    """
    Monte Carlo study showing bias from measurement error.

    Parameters:
    -----------
    true_rho : float
        True autoregressive parameter
    sigma_measurement_values : list
        Different levels of measurement error to test
    n_simulations : int
        Number of Monte Carlo replications

    Returns:
    --------
    dict
        Results for each measurement error level
    """
    # YOUR CODE HERE
    pass

def plot_measurement_error_bias(mc_results, true_rho):
    """
    Plot showing bias as function of measurement error.

    Parameters:
    -----------
    mc_results : dict
        Results from monte_carlo_measurement_error
    true_rho : float
        True parameter value
    """
    # YOUR CODE HERE
    # Create subplot with:
    # 1. Histogram of estimates for each measurement error level
    # 2. Bias vs measurement error plot
    pass

Step 12: Non-Normal Distributions with Kurtosis

Your Task: Implement simulation with heavy-tailed shocks:

def simulate_ar1_heavy_tails(n_individuals, n_periods, rho, sigma_shock, sigma_intercept, 
                           distribution='t', df_param=3, seed=None):
    """
    Simulate AR(1) earnings with non-normal shock distributions.

    Parameters:
    -----------
    distribution : str
        Type of distribution ('normal', 't', 'laplace')
    df_param : float
        Degrees of freedom parameter for t-distribution
    Other parameters same as before

    Returns:
    --------
    pd.DataFrame
        Simulated data with specified shock distribution
    """
    # YOUR CODE HERE
    pass

def compare_distributions_monte_carlo(true_rho, distributions_to_test, n_simulations=200):
    """
    Compare estimator performance across different shock distributions.

    Parameters:
    -----------
    true_rho : float
        True autoregressive parameter  
    distributions_to_test : list
        List of distribution specifications to test
    n_simulations : int
        Number of Monte Carlo replications

    Returns:
    --------
    dict
        Results comparing estimator performance across distributions
    """
    # YOUR CODE HERE
    pass

Exercise Workflow

Start with simulation: Get your simulate_ar1_earnings function working with small test cases
Manual estimation: Implement the regression by hand, verify with simple examples
Library estimation: Compare your manual results with library functions
Validation: Test both estimators on data with known parameters
Real data analysis: Apply to actual earnings data
Moment comparison: Check how well your model matches data properties
Monte Carlo: Assess estimator properties through repeated sampling
Measurement error: Understand how measurement error biases estimates
Heavy tails: Explore robustness to distributional assumptions

Tips for Success

Start with small datasets to debug your code
Print intermediate results to verify calculations
Use assertions to check your data transformations
Plot your simulated data to ensure it looks reasonable
Compare manual and library estimates as a sanity check

Questions for Reflection

How does the bias of your estimator change with sample size?
What happens to estimation accuracy as ρ approaches 1?
How sensitive are your results to the choice of σ_shock and σ_intercept?
Do the second moments of your simulated data match the real data well?
How does measurement error affect the autocovariance structure?
Why does measurement error create downward bias in ρ estimation?
How robust is the estimator to heavy-tailed shock distributions?

Getting Help

If you get stuck on any function implementation, you can ask an AI assistant for help with the specific function signature. Focus on understanding the logic rather than just copying code.

Remember: The goal is to understand the relationship between theory, simulation, and estimation in earnings dynamics modeling!