Sorkin example
[1]:
# Add PyTwoWay to system path (do not run this)
# import sys
# sys.path.append('../../..')
Import the PyTwoWay package
Make sure to install it using pip install pytwoway
.
[2]:
from pandas import Series
import pytwoway as tw
import bipartitepandas as bpd
First, check out parameter options
Do this by running:
Cleaning -
bpd.clean_params().describe_all()
Simulating -
bpd.sim_params().describe_all()
Alternatively, run x_params().keys()
to view all the keys for a parameter dictionary, then x_params().describe(key)
to get a description for a single key.
Second, set parameter choices
Note
The Sorkin estimator requires a strongly connected set of firms, so we set connectedness='strongly_connected'
in clean_params
.
Note
We set copy=False
in clean_params
to avoid unnecessary copies (although this may modify the original dataframe).
[3]:
# Cleaning
clean_params = bpd.clean_params(
{
'connectedness': 'strongly_connected',
'drop_single_stayers': True,
'drop_returns': 'returners',
'copy': False
}
)
# Simulating
sim_params = bpd.sim_params(
{
'n_workers': 1000,
'firm_size': 5,
'alpha_sig': 2, 'w_sig': 2,
'c_sort': 1.5, 'c_netw': 1.5,
'p_move': 0.1
}
)
Third, extract data (we simulate for the example)
BipartitePandas
contains the class SimBipartite
which we use here to simulate a bipartite network. If you have your own data, you can import it during this step. Load it as a Pandas DataFrame
and then convert it into a BipartitePandas DataFrame
in the next step.
[4]:
sim_data = bpd.SimBipartite(sim_params).simulate()
Fourth, prepare data
This is exactly how you should prepare real data prior to running the Sorkin estimator.
First, we convert the data into a
BipartitePandas DataFrame
Second, we clean the data (e.g. drop NaN observations, make sure firm and worker ids are contiguous, construct the strongly connected set, etc.)
Third, we collapse the data at the worker-firm spell level (take mean wage over the spell)
Fourth, we convert the data into event study format
Further details on BipartitePandas
can be found in the package documentation, available here.
[5]:
# Convert into BipartitePandas DataFrame
bdf = bpd.BipartiteDataFrame(sim_data)
# Clean
bdf = bdf.clean(clean_params)
# Collapse
bdf = bdf.collapse(is_sorted=True, copy=False)
# Convert to event study format
bdf = bdf.to_eventstudy(is_sorted=True, copy=False)
checking required columns and datatypes
sorting rows
dropping NaN observations
generating 'm' column
keeping highest paying job for i-t (worker-year) duplicates (how='max')
dropping workers who leave a firm then return to it (how='returners')
making 'i' ids contiguous
making 'j' ids contiguous
computing largest connected set (how='strongly_connected')
making 'i' ids contiguous
making 'j' ids contiguous
sorting columns
resetting index
Fifth, initialize and run the estimator
[6]:
# Initialize Sorkin estimator
sorkin_estimator = tw.SorkinEstimator()
# Fit Sorkin estimator
sorkin_estimator.fit(bdf)
Finally, investigate the results
Estimated firm values are stored in the class attribute .V_EE
.
[7]:
display(Series(sorkin_estimator.V_EE))
0 -5.642380
1 -6.047845
2 -4.543767
3 -5.642380
4 -4.949233
...
102 -5.354698
103 -5.642380
104 -6.335527
105 -5.354698
106 -4.949233
Length: 107, dtype: float64