Formats
Import the BipartitePandas package
Make sure to install it using pip install bipartitepandas
.
[1]:
import bipartitepandas as bpd
Get your data ready
For this notebook, we simulate data.
[2]:
df = bpd.SimBipartite().simulate()
display(df)
i | j | y | t | l | k | alpha | psi | |
---|---|---|---|---|---|---|---|---|
0 | 0 | 31 | -0.066354 | 0 | 2 | 1 | 0.000000 | -0.908458 |
1 | 0 | 24 | -1.319229 | 1 | 2 | 1 | 0.000000 | -0.908458 |
2 | 0 | 24 | -2.141775 | 2 | 2 | 1 | 0.000000 | -0.908458 |
3 | 0 | 24 | -1.831186 | 3 | 2 | 1 | 0.000000 | -0.908458 |
4 | 0 | 24 | 0.086822 | 4 | 2 | 1 | 0.000000 | -0.908458 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
49995 | 9999 | 188 | 2.877201 | 0 | 4 | 9 | 0.967422 | 1.335178 |
49996 | 9999 | 126 | 1.415829 | 1 | 4 | 6 | 0.967422 | 0.348756 |
49997 | 9999 | 126 | 0.772108 | 2 | 4 | 6 | 0.967422 | 0.348756 |
49998 | 9999 | 176 | 2.263156 | 3 | 4 | 8 | 0.967422 | 0.908458 |
49999 | 9999 | 176 | 1.796405 | 4 | 4 | 8 | 0.967422 | 0.908458 |
50000 rows × 8 columns
Columns
BipartitePandas includes seven pre-defined general columns:
Required
i
: worker id (any type)j
: firm id (any type)y
: income (float or int)
Optional
t
: time (int)g
: firm type (any type)w
: weight (float or int)m
: move indicator (int)
Formats
BipartitePandas includes six formats:
Long - each row gives a single observation
Collapsed Long - like Long, but employment spells at the same firm, or entire worker-firm matches, are collapsed into a single observation (these will differ if there are workers who leave, then return to, a particular firm)
Event Study - each row gives two consecutive observations
Collapsed Event Study - like Event Study, but employment spells at the same firm, or entire worker-firm matches, are collapsed into a single observation (these will differ if there are workers who leave, then return to, a particular firm)
Extended Event Study - each row gives arbitrarily many consecutive observations
Collapsed Extended Event Study - like Extended Event Study, but employment spells at the same firm, or entire worker-firm matches, are collapsed into a single observation (these will differ if there are workers who leave, then return to, a particular firm)
These formats divide general columns differently:
Long -
i
,j
,y
,t
,g
,w
,m
Collapsed Long -
i
,j
,y
,t1
,t2
,g
,w
,m
Event Study -
i
,j1
,j2
,y1
,y2
,t1
,t2
,g1
,g2
,w1
,w2
,m
Collapsed Event Study -
i
,j1
,j2
,y1
,y2
,t11
,t12
,t21
,t22
,g1
,g2
,w1
,w2
,m
Extended Event Study -
i
,j1
, …,jp
,y1
, …,yp
,t1
, …,tp
,g1
, …,gp
,w1
, …,wp
,m
Collapsed Extended Event Study -
i
,j1
, …,jp
,y1
, …,yp
,t11
,t12
, …,tp1
,tp2
,g1
, …,gp
,w1
, …,wp
,m
Note
Event Study and Extended Event Study differ even if Extended Event Study has 2 periods. This is because Event Study treats each stayer observation as a new event study, while Extended Event Study treats stayers the same as movers: event studies are based on consecutive observations.
In addition to the fact that stayers are treated differently for non-collapsed data, Collapsed Event Study will contain stayers, but Collapsed Extended Event Study will not.
Constructing DataFrames
Our simulated data is in Long format. How do we construct a Long dataframe?
[3]:
bdf_long = bpd.BipartiteDataFrame(
i=df['i'], j=df['j'], y=df['y'], t=df['t']
)
display(bdf_long)
i | j | y | t | |
---|---|---|---|---|
0 | 0 | 31 | -0.066354 | 0 |
1 | 0 | 24 | -1.319229 | 1 |
2 | 0 | 24 | -2.141775 | 2 |
3 | 0 | 24 | -1.831186 | 3 |
4 | 0 | 24 | 0.086822 | 4 |
... | ... | ... | ... | ... |
49995 | 9999 | 188 | 2.877201 | 0 |
49996 | 9999 | 126 | 1.415829 | 1 |
49997 | 9999 | 126 | 0.772108 | 2 |
49998 | 9999 | 176 | 2.263156 | 3 |
49999 | 9999 | 176 | 1.796405 | 4 |
50000 rows × 4 columns
Are we sure this is long? Let’s check the datatype:
[4]:
type(bdf_long)
[4]:
bipartitepandas.bipartitelong.BipartiteLong
This method works to construct any format! Just make sure not to mix up columns between formats.
Converting between formats
Converting between formats is meant to be easy. Methods exist to go from:
Long to Collapsed Long (
.collapse()
)Long to Event Study (
.to_eventstudy()
)Long to Extended Event Study (
.to_extendedeventstudy()
)Collapsed Long to Long (
.uncollapse()
)Collapsed Long to Collapsed Event Study (
.to_eventstudy()
)Collapsed Long to Collapsed Extended Event Study (
.to_extendedeventstudy()
)Event Study to Long (
.to_long()
)Collapsed Event Study to Collapsed Long (
.to_long()
)
Let’s experiment with these and see what happens. Before we start, we just need to clean our data to make sure the conversions work properly (notice the new m
column).
[5]:
bdf_long = bdf_long.clean()
display(bdf_long)
checking required columns and datatypes
sorting rows
dropping NaN observations
generating 'm' column
keeping highest paying job for i-t (worker-year) duplicates (how='max')
dropping workers who leave a firm then return to it (how=False)
making 'i' ids contiguous
making 'j' ids contiguous
computing largest connected set (how=None)
sorting columns
resetting index
i | j | y | t | m | |
---|---|---|---|---|---|
0 | 0 | 31 | -0.066354 | 0 | 1 |
1 | 0 | 24 | -1.319229 | 1 | 1 |
2 | 0 | 24 | -2.141775 | 2 | 0 |
3 | 0 | 24 | -1.831186 | 3 | 0 |
4 | 0 | 24 | 0.086822 | 4 | 0 |
... | ... | ... | ... | ... | ... |
49995 | 9999 | 188 | 2.877201 | 0 | 1 |
49996 | 9999 | 126 | 1.415829 | 1 | 1 |
49997 | 9999 | 126 | 0.772108 | 2 | 1 |
49998 | 9999 | 176 | 2.263156 | 3 | 1 |
49999 | 9999 | 176 | 1.796405 | 4 | 0 |
50000 rows × 5 columns
Long to Collapsed Long
Notice that:
We specify
level='spell'
to collapse employment spells at the same firm into single observationst
splits intot1
andt2
, which indicate the start the end of the spell, respectivelyw
is new - it gives the number of observations in the spell
[6]:
bdf_collapsedlong = bdf_long.collapse(level='spell')
display(bdf_collapsedlong)
i | j | y | t1 | t2 | w | m | |
---|---|---|---|---|---|---|---|
0 | 0 | 31 | -0.066354 | 0 | 0 | 1 | 1 |
1 | 0 | 24 | -1.301342 | 1 | 4 | 4 | 1 |
2 | 1 | 140 | 0.237748 | 0 | 1 | 2 | 1 |
3 | 1 | 81 | -1.262375 | 2 | 2 | 1 | 2 |
4 | 1 | 9 | -1.098072 | 3 | 4 | 2 | 1 |
... | ... | ... | ... | ... | ... | ... | ... |
29684 | 9998 | 139 | 0.361819 | 0 | 3 | 4 | 1 |
29685 | 9998 | 30 | -0.136980 | 4 | 4 | 1 | 1 |
29686 | 9999 | 188 | 2.877201 | 0 | 0 | 1 | 1 |
29687 | 9999 | 126 | 1.093969 | 1 | 2 | 2 | 2 |
29688 | 9999 | 176 | 2.029781 | 3 | 4 | 2 | 1 |
29689 rows × 7 columns
Long to Event Study
Notice that:
j
splits intoj1
andj2
, which indicate the first and second firm id in the event study, respectivelyy
splits intoy1
andy2
, which indicate the first and second income in the event study, respectivelyt
splits intot1
andt2
, which indicate the first and second period in the event study, respectively
Note
For stayers (individuals who stay at the same firm for all their observations), each row in the event study represents a single observation, since they never move firms.
[7]:
bdf_eventstudy = bdf_long.to_eventstudy()
display(bdf_eventstudy)
i | j1 | j2 | y1 | y2 | t1 | t2 | m | |
---|---|---|---|---|---|---|---|---|
0 | 0 | 31 | 24 | -0.066354 | -1.319229 | 0 | 1 | 1 |
1 | 0 | 24 | 24 | -1.319229 | -2.141775 | 1 | 2 | 0 |
2 | 0 | 24 | 24 | -2.141775 | -1.831186 | 2 | 3 | 0 |
3 | 0 | 24 | 24 | -1.831186 | 0.086822 | 3 | 4 | 0 |
4 | 1 | 140 | 140 | 1.157531 | -0.682035 | 0 | 1 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
40637 | 9998 | 139 | 30 | -0.256372 | -0.136980 | 3 | 4 | 1 |
40638 | 9999 | 188 | 126 | 2.877201 | 1.415829 | 0 | 1 | 1 |
40639 | 9999 | 126 | 126 | 1.415829 | 0.772108 | 1 | 2 | 0 |
40640 | 9999 | 126 | 176 | 0.772108 | 2.263156 | 2 | 3 | 1 |
40641 | 9999 | 176 | 176 | 2.263156 | 1.796405 | 3 | 4 | 0 |
40642 rows × 8 columns
Collapsed Long to Collapsed Event Study
Notice that:
j
splits intoj1
andj2
, which indicate the first and second firm id in the event study, respectivelyy
splits intoy1
andy2
, which indicate the first and second income in the event study, respectivelyt1
splits intot11
andt12
, which indicate the start the end of the spell for the first observation in the event study, respectivelyt2
splits intot21
andt22
, which indicate the start the end of the spell for the second observation in the event study, respectivelyw
splits intow1
andw2
, which indicate number of observations in the first and second spell in the event study, respectively
[8]:
bdf_collapsedeventstudy = bdf_collapsedlong.to_eventstudy()
display(bdf_collapsedeventstudy)
i | j1 | j2 | y1 | y2 | t11 | t12 | t21 | t22 | w1 | w2 | m | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 31 | 24 | -0.066354 | -1.301342 | 0 | 0 | 1 | 4 | 1 | 4 | 1 |
1 | 1 | 140 | 81 | 0.237748 | -1.262375 | 0 | 1 | 2 | 2 | 2 | 1 | 1 |
2 | 1 | 81 | 9 | -1.262375 | -1.098072 | 2 | 2 | 3 | 4 | 1 | 2 | 1 |
3 | 2 | 126 | 131 | 0.538577 | 0.628338 | 0 | 0 | 1 | 2 | 1 | 2 | 1 |
4 | 2 | 131 | 33 | 0.628338 | -0.646154 | 1 | 2 | 3 | 4 | 2 | 2 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
20326 | 9996 | 105 | 156 | 0.493585 | 0.097240 | 0 | 1 | 2 | 4 | 2 | 3 | 1 |
20327 | 9997 | 147 | 98 | -0.331835 | -1.363738 | 0 | 1 | 2 | 4 | 2 | 3 | 1 |
20328 | 9998 | 139 | 30 | 0.361819 | -0.136980 | 0 | 3 | 4 | 4 | 4 | 1 | 1 |
20329 | 9999 | 188 | 126 | 2.877201 | 1.093969 | 0 | 0 | 1 | 2 | 1 | 2 | 1 |
20330 | 9999 | 126 | 176 | 1.093969 | 2.029781 | 1 | 2 | 3 | 4 | 2 | 2 | 1 |
20331 rows × 12 columns
We showed how to get from Long to any other format, but feel free to experiment and see what happens when you convert in other directions!
Initializing from different formats
If your data is saved in a format other than Long, it’s simple to construct a BipartiteDataFrame.
Initializing from Collapsed Long format
[9]:
i = bdf_collapsedlong['i']
j = bdf_collapsedlong['j']
y = bdf_collapsedlong['y']
t1 = bdf_collapsedlong['t1']
t2 = bdf_collapsedlong['t2']
bdf_collapsedlong = bpd.BipartiteDataFrame(
i=i, j=j, y=y, t1=t1, t2=t2
)
display(bdf_collapsedlong)
i | j | y | t1 | t2 | |
---|---|---|---|---|---|
0 | 0 | 31 | -0.066354 | 0 | 0 |
1 | 0 | 24 | -1.301342 | 1 | 4 |
2 | 1 | 140 | 0.237748 | 0 | 1 |
3 | 1 | 81 | -1.262375 | 2 | 2 |
4 | 1 | 9 | -1.098072 | 3 | 4 |
... | ... | ... | ... | ... | ... |
29684 | 9998 | 139 | 0.361819 | 0 | 3 |
29685 | 9998 | 30 | -0.136980 | 4 | 4 |
29686 | 9999 | 188 | 2.877201 | 0 | 0 |
29687 | 9999 | 126 | 1.093969 | 1 | 2 |
29688 | 9999 | 176 | 2.029781 | 3 | 4 |
29689 rows × 5 columns
Let’s check the datatype:
[10]:
type(bdf_collapsedlong)
[10]:
bipartitepandas.bipartitelongcollapsed.BipartiteLongCollapsed
Initializing from Event Study format
[11]:
i = bdf_eventstudy['i']
j1 = bdf_eventstudy['j1']
j2 = bdf_eventstudy['j2']
y1 = bdf_eventstudy['y1']
y2 = bdf_eventstudy['y2']
t1 = bdf_eventstudy['t1']
t2 = bdf_eventstudy['t2']
bdf_eventstudy = bpd.BipartiteDataFrame(
i=i, j1=j1, j2=j2, y1=y1, y2=y2, t1=t1, t2=t2
)
display(bdf_eventstudy)
i | j1 | j2 | y1 | y2 | t1 | t2 | |
---|---|---|---|---|---|---|---|
0 | 0 | 31 | 24 | -0.066354 | -1.319229 | 0 | 1 |
1 | 0 | 24 | 24 | -1.319229 | -2.141775 | 1 | 2 |
2 | 0 | 24 | 24 | -2.141775 | -1.831186 | 2 | 3 |
3 | 0 | 24 | 24 | -1.831186 | 0.086822 | 3 | 4 |
4 | 1 | 140 | 140 | 1.157531 | -0.682035 | 0 | 1 |
... | ... | ... | ... | ... | ... | ... | ... |
40637 | 9998 | 139 | 30 | -0.256372 | -0.136980 | 3 | 4 |
40638 | 9999 | 188 | 126 | 2.877201 | 1.415829 | 0 | 1 |
40639 | 9999 | 126 | 126 | 1.415829 | 0.772108 | 1 | 2 |
40640 | 9999 | 126 | 176 | 0.772108 | 2.263156 | 2 | 3 |
40641 | 9999 | 176 | 176 | 2.263156 | 1.796405 | 3 | 4 |
40642 rows × 7 columns
Let’s check the datatype:
[12]:
type(bdf_eventstudy)
[12]:
bipartitepandas.bipartiteeventstudy.BipartiteEventStudy
Initializing from Collapsed Event Study format
[13]:
i = bdf_collapsedeventstudy['i']
j1 = bdf_collapsedeventstudy['j1']
j2 = bdf_collapsedeventstudy['j2']
y1 = bdf_collapsedeventstudy['y1']
y2 = bdf_collapsedeventstudy['y2']
t11 = bdf_collapsedeventstudy['t11']
t12 = bdf_collapsedeventstudy['t12']
t21 = bdf_collapsedeventstudy['t21']
t22 = bdf_collapsedeventstudy['t22']
bdf_collapsedeventstudy = bpd.BipartiteDataFrame(
i=i, j1=j1, j2=j2, y1=y1, y2=y2,
t11=t11, t12=t12, t21=t21, t22=t22
)
display(bdf_collapsedeventstudy)
i | j1 | j2 | y1 | y2 | t11 | t12 | t21 | t22 | |
---|---|---|---|---|---|---|---|---|---|
0 | 0 | 31 | 24 | -0.066354 | -1.301342 | 0 | 0 | 1 | 4 |
1 | 1 | 140 | 81 | 0.237748 | -1.262375 | 0 | 1 | 2 | 2 |
2 | 1 | 81 | 9 | -1.262375 | -1.098072 | 2 | 2 | 3 | 4 |
3 | 2 | 126 | 131 | 0.538577 | 0.628338 | 0 | 0 | 1 | 2 |
4 | 2 | 131 | 33 | 0.628338 | -0.646154 | 1 | 2 | 3 | 4 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
20326 | 9996 | 105 | 156 | 0.493585 | 0.097240 | 0 | 1 | 2 | 4 |
20327 | 9997 | 147 | 98 | -0.331835 | -1.363738 | 0 | 1 | 2 | 4 |
20328 | 9998 | 139 | 30 | 0.361819 | -0.136980 | 0 | 3 | 4 | 4 |
20329 | 9999 | 188 | 126 | 2.877201 | 1.093969 | 0 | 0 | 1 | 2 |
20330 | 9999 | 126 | 176 | 1.093969 | 2.029781 | 1 | 2 | 3 | 4 |
20331 rows × 9 columns
Let’s check the datatype:
[14]:
type(bdf_collapsedeventstudy)
[14]:
bipartitepandas.bipartiteeventstudycollapsed.BipartiteEventStudyCollapsed