CRE example
[1]:
# Add PyTwoWay to system path (do not run this)
# import sys
# sys.path.append('../../..')
Import the PyTwoWay package
Make sure to install it using pip install pytwoway
.
[2]:
import pytwoway as tw
import bipartitepandas as bpd
First, check out parameter options
Do this by running:
CRE -
tw.cre_params().describe_all()
Clustering -
bpd.cluster_params().describe_all()
Cleaning -
bpd.clean_params().describe_all()
Simulating -
bpd.sim_params().describe_all()
Alternatively, run x_params().keys()
to view all the keys for a parameter dictionary, then x_params().describe(key)
to get a description for a single key.
Second, set parameter choices
Note
We set copy=False
in clean_params
to avoid unnecessary copies (although this may modify the original dataframe).
[3]:
# CRE
cre_params = tw.cre_params()
## Clustering ##
# Use firm-level cdfs of income as our measure
measures = bpd.measures.CDFs()
# Group using k-means
grouping = bpd.grouping.KMeans()
# General clustering
cluster_params = bpd.cluster_params(
{
'measures': measures,
'grouping': grouping
}
)
# Cleaning
clean_params = bpd.clean_params(
{
'connectedness': 'leave_out_spell',
'collapse_at_connectedness_measure': True,
'drop_single_stayers': True,
'drop_returns': 'returners',
'copy': False
}
)
# Simulating
sim_params = bpd.sim_params(
{
'n_workers': 1000,
'firm_size': 5,
'alpha_sig': 2, 'w_sig': 2,
'c_sort': 1.5, 'c_netw': 1.5,
'p_move': 0.1
}
)
Third, extract data (we simulate for the example)
BipartitePandas
contains the class SimBipartite
which we use here to simulate a bipartite network. If you have your own data, you can import it during this step. Load it as a Pandas DataFrame
and then convert it into a BipartitePandas DataFrame
in the next step.
[4]:
sim_data = bpd.SimBipartite(sim_params).simulate()
Fourth, prepare data
This is exactly how you should prepare real data prior to running the CRE estimator.
First, we convert the data into a
BipartitePandas DataFrame
Second, we clean the data (e.g. drop NaN observations, make sure firm and worker ids are contiguous, construct the leave-one-out connected set, etc.). This also collapses the data at the worker-firm spell level (taking mean wage over the spell), because we set
collapse_at_connectedness_measure=True
.Third, we cluster firms by their wage distributions, to generate firm classes (columns
g1
andg2
). Alternatively, manually set the columnsg1
andg2
to pre-estimated clusters (but make sure to add them correctly!).Fourth, we convert the data into cross-section format
Further details on BipartitePandas
can be found in the package documentation, available here.
Note
Since leave-one-out connectedness is not maintained after data is collapsed at the spell/match level, if you set collapse_at_connectedness_measure=False
, then data must be cleaned WITHOUT taking the leave-one-out set, collapsed at the spell/match level, and then finally the largest leave-one-out connected set can be computed.
[5]:
# Convert into BipartitePandas DataFrame
bdf = bpd.BipartiteDataFrame(sim_data)
# Clean and collapse
bdf = bdf.clean(clean_params)
# Cluster
bdf = bdf.cluster(cluster_params)
# Convert to cross-section format
bdf_cs = bdf.to_eventstudy(is_sorted=True, copy=False).get_cs(copy=False)
checking required columns and datatypes
sorting rows
dropping NaN observations
generating 'm' column
keeping highest paying job for i-t (worker-year) duplicates (how='max')
dropping workers who leave a firm then return to it (how='returners')
making 'i' ids contiguous
making 'j' ids contiguous
computing largest connected set (how=None)
sorting columns
resetting index
checking required columns and datatypes
sorting rows
generating 'm' column
computing largest connected set (how='leave_out_observation')
making 'i' ids contiguous
making 'j' ids contiguous
sorting columns
resetting index
Fifth, initialize and run the estimator
[6]:
# Initialize CRE estimator
cre_estimator = tw.CREEstimator(bdf_cs, cre_params)
# Fit CRE estimator
cre_estimator.fit()
Finally, investigate the results
[7]:
cre_estimator.summary
[7]:
{'var_y': 6.700106866872871,
'var_bw': 1.5696696168532582,
'cov_bw': 1.1955111878642923,
'var_tot': 1.442983307938817,
'cov_tot': 1.1647823062712872}