Sorkin example

[1]:

# Add PyTwoWay to system path (do not run this)
# import sys
# sys.path.append('../../..')

Import the PyTwoWay package

Make sure to install it using pip install pytwoway.

[2]:

from pandas import Series
import pytwoway as tw
import bipartitepandas as bpd

First, check out parameter options

Do this by running:

Cleaning - bpd.clean_params().describe_all()
Simulating - bpd.sim_params().describe_all()

Alternatively, run x_params().keys() to view all the keys for a parameter dictionary, then x_params().describe(key) to get a description for a single key.

Second, set parameter choices

Note

The Sorkin estimator requires a strongly connected set of firms, so we set connectedness='strongly_connected' in clean_params.

Note

We set copy=False in clean_params to avoid unnecessary copies (although this may modify the original dataframe).

[3]:

# Cleaning
clean_params = bpd.clean_params(
    {
        'connectedness': 'strongly_connected',
        'drop_single_stayers': True,
        'drop_returns': 'returners',
        'copy': False
    }
)
# Simulating
sim_params = bpd.sim_params(
    {
        'n_workers': 1000,
        'firm_size': 5,
        'alpha_sig': 2, 'w_sig': 2,
        'c_sort': 1.5, 'c_netw': 1.5,
        'p_move': 0.1
    }
)

Third, extract data (we simulate for the example)

BipartitePandas contains the class SimBipartite which we use here to simulate a bipartite network. If you have your own data, you can import it during this step. Load it as a Pandas DataFrame and then convert it into a BipartitePandas DataFrame in the next step.

[4]:

sim_data = bpd.SimBipartite(sim_params).simulate()

Fourth, prepare data

This is exactly how you should prepare real data prior to running the Sorkin estimator.

First, we convert the data into a BipartitePandas DataFrame
Second, we clean the data (e.g. drop NaN observations, make sure firm and worker ids are contiguous, construct the strongly connected set, etc.)
Third, we collapse the data at the worker-firm spell level (take mean wage over the spell)
Fourth, we convert the data into event study format

Further details on BipartitePandas can be found in the package documentation, available here.

[5]:

# Convert into BipartitePandas DataFrame
bdf = bpd.BipartiteDataFrame(sim_data)
# Clean
bdf = bdf.clean(clean_params)
# Collapse
bdf = bdf.collapse(is_sorted=True, copy=False)
# Convert to event study format
bdf = bdf.to_eventstudy(is_sorted=True, copy=False)

checking required columns and datatypes
sorting rows
dropping NaN observations
generating 'm' column
keeping highest paying job for i-t (worker-year) duplicates (how='max')
dropping workers who leave a firm then return to it (how='returners')
making 'i' ids contiguous
making 'j' ids contiguous
computing largest connected set (how='strongly_connected')
making 'i' ids contiguous
making 'j' ids contiguous
sorting columns
resetting index

Fifth, initialize and run the estimator

[6]:

# Initialize Sorkin estimator
sorkin_estimator = tw.SorkinEstimator()
# Fit Sorkin estimator
sorkin_estimator.fit(bdf)

Finally, investigate the results

Estimated firm values are stored in the class attribute .V_EE.

[7]:

display(Series(sorkin_estimator.V_EE))

0     -5.642380
1     -6.047845
2     -4.543767
3     -5.642380
4     -4.949233
         ...
102   -5.354698
103   -5.642380
104   -6.335527
105   -5.354698
106   -4.949233
Length: 107, dtype: float64