Formats

Import the BipartitePandas package

Make sure to install it using pip install bipartitepandas.

[1]:
import bipartitepandas as bpd

Get your data ready

For this notebook, we simulate data.

[2]:
df = bpd.SimBipartite().simulate()
display(df)
i j y t l k alpha psi
0 0 31 -0.066354 0 2 1 0.000000 -0.908458
1 0 24 -1.319229 1 2 1 0.000000 -0.908458
2 0 24 -2.141775 2 2 1 0.000000 -0.908458
3 0 24 -1.831186 3 2 1 0.000000 -0.908458
4 0 24 0.086822 4 2 1 0.000000 -0.908458
... ... ... ... ... ... ... ... ...
49995 9999 188 2.877201 0 4 9 0.967422 1.335178
49996 9999 126 1.415829 1 4 6 0.967422 0.348756
49997 9999 126 0.772108 2 4 6 0.967422 0.348756
49998 9999 176 2.263156 3 4 8 0.967422 0.908458
49999 9999 176 1.796405 4 4 8 0.967422 0.908458

50000 rows × 8 columns

Columns

BipartitePandas includes seven pre-defined general columns:

Required

  • i: worker id (any type)

  • j: firm id (any type)

  • y: income (float or int)

Optional

  • t: time (int)

  • g: firm type (any type)

  • w: weight (float or int)

  • m: move indicator (int)

Formats

BipartitePandas includes six formats:

  • Long - each row gives a single observation

  • Collapsed Long - like Long, but employment spells at the same firm, or entire worker-firm matches, are collapsed into a single observation (these will differ if there are workers who leave, then return to, a particular firm)

  • Event Study - each row gives two consecutive observations

  • Collapsed Event Study - like Event Study, but employment spells at the same firm, or entire worker-firm matches, are collapsed into a single observation (these will differ if there are workers who leave, then return to, a particular firm)

  • Extended Event Study - each row gives arbitrarily many consecutive observations

  • Collapsed Extended Event Study - like Extended Event Study, but employment spells at the same firm, or entire worker-firm matches, are collapsed into a single observation (these will differ if there are workers who leave, then return to, a particular firm)

These formats divide general columns differently:

  • Long - i, j, y, t, g, w, m

  • Collapsed Long - i, j, y, t1, t2, g, w, m

  • Event Study - i, j1, j2, y1, y2, t1, t2, g1, g2, w1, w2, m

  • Collapsed Event Study - i, j1, j2, y1, y2, t11, t12, t21, t22, g1, g2, w1, w2, m

  • Extended Event Study - i, j1, …, jp, y1, …, yp, t1, …, tp, g1, …, gp, w1, …, wp, m

  • Collapsed Extended Event Study - i, j1, …, jp, y1, …, yp, t11, t12, …, tp1, tp2, g1, …, gp, w1, …, wp, m

Note

Event Study and Extended Event Study differ even if Extended Event Study has 2 periods. This is because Event Study treats each stayer observation as a new event study, while Extended Event Study treats stayers the same as movers: event studies are based on consecutive observations.

In addition to the fact that stayers are treated differently for non-collapsed data, Collapsed Event Study will contain stayers, but Collapsed Extended Event Study will not.

Constructing DataFrames

Our simulated data is in Long format. How do we construct a Long dataframe?

[3]:
bdf_long = bpd.BipartiteDataFrame(
    i=df['i'], j=df['j'], y=df['y'], t=df['t']
)
display(bdf_long)
i j y t
0 0 31 -0.066354 0
1 0 24 -1.319229 1
2 0 24 -2.141775 2
3 0 24 -1.831186 3
4 0 24 0.086822 4
... ... ... ... ...
49995 9999 188 2.877201 0
49996 9999 126 1.415829 1
49997 9999 126 0.772108 2
49998 9999 176 2.263156 3
49999 9999 176 1.796405 4

50000 rows × 4 columns

Are we sure this is long? Let’s check the datatype:

[4]:
type(bdf_long)
[4]:
bipartitepandas.bipartitelong.BipartiteLong

This method works to construct any format! Just make sure not to mix up columns between formats.

Converting between formats

Converting between formats is meant to be easy. Methods exist to go from:

  • Long to Collapsed Long (.collapse())

  • Long to Event Study (.to_eventstudy())

  • Long to Extended Event Study (.to_extendedeventstudy())

  • Collapsed Long to Long (.uncollapse())

  • Collapsed Long to Collapsed Event Study (.to_eventstudy())

  • Collapsed Long to Collapsed Extended Event Study (.to_extendedeventstudy())

  • Event Study to Long (.to_long())

  • Collapsed Event Study to Collapsed Long (.to_long())

Let’s experiment with these and see what happens. Before we start, we just need to clean our data to make sure the conversions work properly (notice the new m column).

[5]:
bdf_long = bdf_long.clean()
display(bdf_long)
checking required columns and datatypes
sorting rows
dropping NaN observations
generating 'm' column
keeping highest paying job for i-t (worker-year) duplicates (how='max')
dropping workers who leave a firm then return to it (how=False)
making 'i' ids contiguous
making 'j' ids contiguous
computing largest connected set (how=None)
sorting columns
resetting index
i j y t m
0 0 31 -0.066354 0 1
1 0 24 -1.319229 1 1
2 0 24 -2.141775 2 0
3 0 24 -1.831186 3 0
4 0 24 0.086822 4 0
... ... ... ... ... ...
49995 9999 188 2.877201 0 1
49996 9999 126 1.415829 1 1
49997 9999 126 0.772108 2 1
49998 9999 176 2.263156 3 1
49999 9999 176 1.796405 4 0

50000 rows × 5 columns

Long to Collapsed Long

Notice that:

  • We specify level='spell' to collapse employment spells at the same firm into single observations

  • t splits into t1 and t2, which indicate the start the end of the spell, respectively

  • w is new - it gives the number of observations in the spell

[6]:
bdf_collapsedlong = bdf_long.collapse(level='spell')
display(bdf_collapsedlong)
i j y t1 t2 w m
0 0 31 -0.066354 0 0 1 1
1 0 24 -1.301342 1 4 4 1
2 1 140 0.237748 0 1 2 1
3 1 81 -1.262375 2 2 1 2
4 1 9 -1.098072 3 4 2 1
... ... ... ... ... ... ... ...
29684 9998 139 0.361819 0 3 4 1
29685 9998 30 -0.136980 4 4 1 1
29686 9999 188 2.877201 0 0 1 1
29687 9999 126 1.093969 1 2 2 2
29688 9999 176 2.029781 3 4 2 1

29689 rows × 7 columns

Long to Event Study

Notice that:

  • j splits into j1 and j2, which indicate the first and second firm id in the event study, respectively

  • y splits into y1 and y2, which indicate the first and second income in the event study, respectively

  • t splits into t1 and t2, which indicate the first and second period in the event study, respectively

Note

For stayers (individuals who stay at the same firm for all their observations), each row in the event study represents a single observation, since they never move firms.

[7]:
bdf_eventstudy = bdf_long.to_eventstudy()
display(bdf_eventstudy)
i j1 j2 y1 y2 t1 t2 m
0 0 31 24 -0.066354 -1.319229 0 1 1
1 0 24 24 -1.319229 -2.141775 1 2 0
2 0 24 24 -2.141775 -1.831186 2 3 0
3 0 24 24 -1.831186 0.086822 3 4 0
4 1 140 140 1.157531 -0.682035 0 1 0
... ... ... ... ... ... ... ... ...
40637 9998 139 30 -0.256372 -0.136980 3 4 1
40638 9999 188 126 2.877201 1.415829 0 1 1
40639 9999 126 126 1.415829 0.772108 1 2 0
40640 9999 126 176 0.772108 2.263156 2 3 1
40641 9999 176 176 2.263156 1.796405 3 4 0

40642 rows × 8 columns

Collapsed Long to Collapsed Event Study

Notice that:

  • j splits into j1 and j2, which indicate the first and second firm id in the event study, respectively

  • y splits into y1 and y2, which indicate the first and second income in the event study, respectively

  • t1 splits into t11 and t12, which indicate the start the end of the spell for the first observation in the event study, respectively

  • t2 splits into t21 and t22, which indicate the start the end of the spell for the second observation in the event study, respectively

  • w splits into w1 and w2, which indicate number of observations in the first and second spell in the event study, respectively

[8]:
bdf_collapsedeventstudy = bdf_collapsedlong.to_eventstudy()
display(bdf_collapsedeventstudy)
i j1 j2 y1 y2 t11 t12 t21 t22 w1 w2 m
0 0 31 24 -0.066354 -1.301342 0 0 1 4 1 4 1
1 1 140 81 0.237748 -1.262375 0 1 2 2 2 1 1
2 1 81 9 -1.262375 -1.098072 2 2 3 4 1 2 1
3 2 126 131 0.538577 0.628338 0 0 1 2 1 2 1
4 2 131 33 0.628338 -0.646154 1 2 3 4 2 2 1
... ... ... ... ... ... ... ... ... ... ... ... ...
20326 9996 105 156 0.493585 0.097240 0 1 2 4 2 3 1
20327 9997 147 98 -0.331835 -1.363738 0 1 2 4 2 3 1
20328 9998 139 30 0.361819 -0.136980 0 3 4 4 4 1 1
20329 9999 188 126 2.877201 1.093969 0 0 1 2 1 2 1
20330 9999 126 176 1.093969 2.029781 1 2 3 4 2 2 1

20331 rows × 12 columns

We showed how to get from Long to any other format, but feel free to experiment and see what happens when you convert in other directions!

Initializing from different formats

If your data is saved in a format other than Long, it’s simple to construct a BipartiteDataFrame.

Initializing from Collapsed Long format

[9]:
i = bdf_collapsedlong['i']
j = bdf_collapsedlong['j']
y = bdf_collapsedlong['y']
t1 = bdf_collapsedlong['t1']
t2 = bdf_collapsedlong['t2']
bdf_collapsedlong = bpd.BipartiteDataFrame(
    i=i, j=j, y=y, t1=t1, t2=t2
)
display(bdf_collapsedlong)
i j y t1 t2
0 0 31 -0.066354 0 0
1 0 24 -1.301342 1 4
2 1 140 0.237748 0 1
3 1 81 -1.262375 2 2
4 1 9 -1.098072 3 4
... ... ... ... ... ...
29684 9998 139 0.361819 0 3
29685 9998 30 -0.136980 4 4
29686 9999 188 2.877201 0 0
29687 9999 126 1.093969 1 2
29688 9999 176 2.029781 3 4

29689 rows × 5 columns

Let’s check the datatype:

[10]:
type(bdf_collapsedlong)
[10]:
bipartitepandas.bipartitelongcollapsed.BipartiteLongCollapsed

Initializing from Event Study format

[11]:
i = bdf_eventstudy['i']
j1 = bdf_eventstudy['j1']
j2 = bdf_eventstudy['j2']
y1 = bdf_eventstudy['y1']
y2 = bdf_eventstudy['y2']
t1 = bdf_eventstudy['t1']
t2 = bdf_eventstudy['t2']
bdf_eventstudy = bpd.BipartiteDataFrame(
    i=i, j1=j1, j2=j2, y1=y1, y2=y2, t1=t1, t2=t2
)
display(bdf_eventstudy)
i j1 j2 y1 y2 t1 t2
0 0 31 24 -0.066354 -1.319229 0 1
1 0 24 24 -1.319229 -2.141775 1 2
2 0 24 24 -2.141775 -1.831186 2 3
3 0 24 24 -1.831186 0.086822 3 4
4 1 140 140 1.157531 -0.682035 0 1
... ... ... ... ... ... ... ...
40637 9998 139 30 -0.256372 -0.136980 3 4
40638 9999 188 126 2.877201 1.415829 0 1
40639 9999 126 126 1.415829 0.772108 1 2
40640 9999 126 176 0.772108 2.263156 2 3
40641 9999 176 176 2.263156 1.796405 3 4

40642 rows × 7 columns

Let’s check the datatype:

[12]:
type(bdf_eventstudy)
[12]:
bipartitepandas.bipartiteeventstudy.BipartiteEventStudy

Initializing from Collapsed Event Study format

[13]:
i = bdf_collapsedeventstudy['i']
j1 = bdf_collapsedeventstudy['j1']
j2 = bdf_collapsedeventstudy['j2']
y1 = bdf_collapsedeventstudy['y1']
y2 = bdf_collapsedeventstudy['y2']
t11 = bdf_collapsedeventstudy['t11']
t12 = bdf_collapsedeventstudy['t12']
t21 = bdf_collapsedeventstudy['t21']
t22 = bdf_collapsedeventstudy['t22']
bdf_collapsedeventstudy = bpd.BipartiteDataFrame(
    i=i, j1=j1, j2=j2, y1=y1, y2=y2,
    t11=t11, t12=t12, t21=t21, t22=t22
)
display(bdf_collapsedeventstudy)
i j1 j2 y1 y2 t11 t12 t21 t22
0 0 31 24 -0.066354 -1.301342 0 0 1 4
1 1 140 81 0.237748 -1.262375 0 1 2 2
2 1 81 9 -1.262375 -1.098072 2 2 3 4
3 2 126 131 0.538577 0.628338 0 0 1 2
4 2 131 33 0.628338 -0.646154 1 2 3 4
... ... ... ... ... ... ... ... ... ...
20326 9996 105 156 0.493585 0.097240 0 1 2 4
20327 9997 147 98 -0.331835 -1.363738 0 1 2 4
20328 9998 139 30 0.361819 -0.136980 0 3 4 4
20329 9999 188 126 2.877201 1.093969 0 0 1 2
20330 9999 126 176 1.093969 2.029781 1 2 3 4

20331 rows × 9 columns

Let’s check the datatype:

[14]:
type(bdf_collapsedeventstudy)
[14]:
bipartitepandas.bipartiteeventstudycollapsed.BipartiteEventStudyCollapsed