Sorkin example

[1]:
# Add PyTwoWay to system path (do not run this)
# import sys
# sys.path.append('../../..')

Import the PyTwoWay package

Make sure to install it using pip install pytwoway.

[2]:
from pandas import Series
import pytwoway as tw
import bipartitepandas as bpd

First, check out parameter options

Do this by running:

  • Cleaning - bpd.clean_params().describe_all()

  • Simulating - bpd.sim_params().describe_all()

Alternatively, run x_params().keys() to view all the keys for a parameter dictionary, then x_params().describe(key) to get a description for a single key.

Second, set parameter choices

Note

The Sorkin estimator requires a strongly connected set of firms, so we set connectedness='strongly_connected' in clean_params.

Note

We set copy=False in clean_params to avoid unnecessary copies (although this may modify the original dataframe).

[3]:
# Cleaning
clean_params = bpd.clean_params(
    {
        'connectedness': 'strongly_connected',
        'drop_single_stayers': True,
        'drop_returns': 'returners',
        'copy': False
    }
)
# Simulating
sim_params = bpd.sim_params(
    {
        'n_workers': 1000,
        'firm_size': 5,
        'alpha_sig': 2, 'w_sig': 2,
        'c_sort': 1.5, 'c_netw': 1.5,
        'p_move': 0.1
    }
)

Third, extract data (we simulate for the example)

BipartitePandas contains the class SimBipartite which we use here to simulate a bipartite network. If you have your own data, you can import it during this step. Load it as a Pandas DataFrame and then convert it into a BipartitePandas DataFrame in the next step.

[4]:
sim_data = bpd.SimBipartite(sim_params).simulate()

Fourth, prepare data

This is exactly how you should prepare real data prior to running the Sorkin estimator.

  • First, we convert the data into a BipartitePandas DataFrame

  • Second, we clean the data (e.g. drop NaN observations, make sure firm and worker ids are contiguous, construct the strongly connected set, etc.)

  • Third, we collapse the data at the worker-firm spell level (take mean wage over the spell)

  • Fourth, we convert the data into event study format

Further details on BipartitePandas can be found in the package documentation, available here.

[5]:
# Convert into BipartitePandas DataFrame
bdf = bpd.BipartiteDataFrame(sim_data)
# Clean
bdf = bdf.clean(clean_params)
# Collapse
bdf = bdf.collapse(is_sorted=True, copy=False)
# Convert to event study format
bdf = bdf.to_eventstudy(is_sorted=True, copy=False)
checking required columns and datatypes
sorting rows
dropping NaN observations
generating 'm' column
keeping highest paying job for i-t (worker-year) duplicates (how='max')
dropping workers who leave a firm then return to it (how='returners')
making 'i' ids contiguous
making 'j' ids contiguous
computing largest connected set (how='strongly_connected')
making 'i' ids contiguous
making 'j' ids contiguous
sorting columns
resetting index

Fifth, initialize and run the estimator

[6]:
# Initialize Sorkin estimator
sorkin_estimator = tw.SorkinEstimator()
# Fit Sorkin estimator
sorkin_estimator.fit(bdf)

Finally, investigate the results

Estimated firm values are stored in the class attribute .V_EE.

[7]:
display(Series(sorkin_estimator.V_EE))
0     -5.642380
1     -6.047845
2     -4.543767
3     -5.642380
4     -4.949233
         ...
102   -5.354698
103   -5.642380
104   -6.335527
105   -5.354698
106   -4.949233
Length: 107, dtype: float64