Card-Heining-Kline event study example

[1]:
# Add PyTwoWay to system path (do not run this)
# import sys
# sys.path.append('../../..')

Import the PyTwoWay package

Make sure to install it using pip install pytwoway.

[2]:
import pytwoway as tw
import bipartitepandas as bpd

Get your data ready

For this notebook, we simulate data.

[3]:
df = bpd.SimBipartite().simulate()
display(df)
i j y t l k alpha psi
0 0 81 0.283938 0 2 4 0.000000 -0.114185
1 0 81 -1.613504 1 2 4 0.000000 -0.114185
2 0 175 1.057488 2 2 8 0.000000 0.908458
3 0 45 0.894313 3 2 2 0.000000 -0.604585
4 0 90 1.122351 4 2 4 0.000000 -0.114185
... ... ... ... ... ... ... ... ...
49995 9999 2 -2.978308 0 0 0 -0.967422 -1.335178
49996 9999 2 -2.532244 1 0 0 -0.967422 -1.335178
49997 9999 14 -2.428863 2 0 0 -0.967422 -1.335178
49998 9999 14 -0.928106 3 0 0 -0.967422 -1.335178
49999 9999 21 -3.030404 4 0 1 -0.967422 -0.908458

50000 rows × 8 columns

Prepare data

This is exactly how you should prepare real data prior to generating the CHK plot.

  • First, we convert the data into a BipartitePandas DataFrame

  • Second, we clean the data (e.g. drop NaN observations, make sure firm and worker ids are contiguous, etc.)

  • Third, we cluster firms by the quartile of their mean income, to generate firm classes (columns g1 and g2). Alternatively, manually set the columns g1 and g2 to pre-estimated clusters (but make sure to add them correctly!).

  • Fourth, we convert the data into extended event study format

Further details on BipartitePandas can be found in the package documentation, available here.

Note

In general, the CHK event study is generated by clustering the data using quartiles of firm-level mean income.

[4]:
measures = bpd.measures.Moments(measures='mean')
grouping = bpd.grouping.Quantiles(n_quantiles=4)
cluster_params = bpd.cluster_params(
    {
        'measures': measures,
        'grouping': grouping
    }
)

bdf = bpd.BipartiteDataFrame(
    i=df['i'], j=df['j'], y=df['y'], t=df['t']
) \
    .clean() \
    .cluster(cluster_params) \
    .to_extendedeventstudy(transition_col='j', periods_pre=2, periods_post=2)
checking required columns and datatypes
sorting rows
dropping NaN observations
generating 'm' column
keeping highest paying job for i-t (worker-year) duplicates (how='max')
dropping workers who leave a firm then return to it (how=False)
making 'i' ids contiguous
making 'j' ids contiguous
computing largest connected set (how=None)
sorting columns
resetting index

Creating CHK event study plot

Once the data is cleaned, clustered, and in extended event study form, we can generate the event study plots.

[5]:
tw.diagnostics.plot_extendedeventstudy(bdf, periods_pre=2, periods_post=2)
../_images/notebooks_chk_example_9_0.png

Warning

Be careful not to include too many clusters!

[6]:
measures = bpd.measures.Moments(measures='mean')
grouping = bpd.grouping.Quantiles(n_quantiles=7)
cluster_params = bpd.cluster_params(
    {
        'measures': measures,
        'grouping': grouping
    }
)

bdf = bpd.BipartiteDataFrame(
    i=df['i'], j=df['j'], y=df['y'], t=df['t']
) \
    .clean() \
    .cluster(cluster_params) \
    .to_extendedeventstudy(transition_col='j', periods_pre=2, periods_post=2)

tw.diagnostics.plot_extendedeventstudy(bdf, periods_pre=2, periods_post=2)
checking required columns and datatypes
sorting rows
dropping NaN observations
generating 'm' column
keeping highest paying job for i-t (worker-year) duplicates (how='max')
dropping workers who leave a firm then return to it (how=False)
making 'i' ids contiguous
making 'j' ids contiguous
computing largest connected set (how=None)
sorting columns
resetting index
../_images/notebooks_chk_example_11_1.png