BipartiteEventStudyBase class

class bipartitepandas.bipartiteeventstudybase.BipartiteEventStudyBase(*args, col_reference_dict=None, **kwargs)

Bases: BipartiteBase

Base class for BipartiteEventStudy and BipartiteEventStudyCollapsed, which give bipartite networks of firms and workers in event study and collapsed event study form, respectively. Contains generalized methods. Inherits from BipartiteBase.

Parameters

*args – arguments for BipartiteBase
col_reference_dict (dict or None) – clarify which columns are associated with a general column name, e.g. {‘i’: ‘i’, ‘j’: [‘j1’, ‘j2’]}; None is equivalent to {}
**kwargs – keyword arguments for BipartiteBase

clean(params=None)

Clean data to make sure there are no NaN or duplicate observations, observations where workers leave a firm then return to it are removed, firms are connected by movers, and categorical ids are contiguous.

Parameters: params (ParamsDict or None) – dictionary of parameters for cleaning. Run bpd.clean_params().describe_all() for descriptions of all valid parameters. None is equivalent to bpd.clean_params().
Returns: dataframe with cleaned data
Return type: (BipartiteEventStudyBase)

construct_artificial_time(time_per_worker=False, is_sorted=False, copy=True)

Construct artificial time columns to enable conversion to (collapsed) long format. Only adds columns if time columns not already included.

Parameters

time_per_worker (bool) – if True, set time independently for each worker (note that this is significantly more computationally costly)
is_sorted (bool) – set to True if dataframe is already sorted by i (this avoids a sort inside a groupby if time_per_worker=True, but this groupby will not sort the returned dataframe)
copy (bool) – if False, avoid copy

Returns

dataframe with artificial time columns

Return type

(BipartiteEventStudyBase)

diagnostic(): Run diagnostic and print diagnostic report.

drop_ids(id_col, drop_ids_list, drop_returns_to_stays=False, is_sorted=False, reset_index=False, copy=True)

Drop ids belonging to a given set of ids.

Parameters

id_col (str) – column of ids to consider (‘i’, ‘j’, or ‘g’)
drop_ids_list (list) – ids to drop
drop_returns_to_stays (bool) – used only if id_col is ‘j’ or ‘g’ and using BipartiteEventStudyCollapsed format. If True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer).
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
reset_index (bool) – not used for event study format
copy (bool) – if False, avoid copy

Returns

dataframe with ids outside the given set

Return type

(BipartiteEventStudyBase)

gen_m(force=False, copy=True)

Generate m column for data (m == 0 if stayer, m == 1 if mover).

Parameters

force (bool) – if True, reset ‘m’ column even if it exists
copy (bool) – if False, avoid copy

Returns

dataframe with m column

Return type

(BipartiteEventStudyBase)

get_cs(copy=True)

Return (collapsed) event study data reformatted into cross section data.

Parameters: copy (bool) – if False, avoid copy
Returns: cross section data
Return type: (Pandas DataFrame)

keep_ids(id_col, keep_ids_list, drop_returns_to_stays=False, is_sorted=False, reset_index=False, copy=True)

Only keep ids belonging to a given set of ids.

Parameters

id_col (str) – column of ids to consider (‘i’, ‘j’, or ‘g’)
keep_ids_list (list) – ids to keep
drop_returns_to_stays (bool) – used only if id_col is ‘j’ or ‘g’ and using BipartiteEventStudyCollapsed format. If True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer).
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
reset_index (bool) – not used for event study format
copy (bool) – if False, avoid copy

Returns

dataframe with ids in the given set

Return type

(BipartiteEventStudyBase)

keep_rows(rows, drop_returns_to_stays=False, is_sorted=False, reset_index=False, copy=True)

Only keep particular rows.

Parameters

rows (list) – rows to keep
drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
reset_index (bool) – not used for event study format
copy (bool) – if False, avoid copy

Returns

dataframe with given rows

Return type

(BipartiteEventStudyBase)

min_joint_obs_frame(threshold_1=2, threshold_2=2, id_col_1='j', id_col_2='i', drop_returns_to_stays=False, is_sorted=False, copy=True)

Return dataframe where column 1 ids have at least threshold_1 many observations and column 2 ids have at least threshold_2 many observations.

Parameters

threshold_1 (int) – minimum number of observations required to keep an id from column 1
threshold_2 (int) – minimum number of observations required to keep an id from column 2
id_col_1 (str) – column to check ids (‘i’, ‘j’, or ‘g’). Use general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’.
id_col_2 (str) – column to check ids (‘i’, ‘j’, or ‘g’). Use general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’.
drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)
is_sorted (bool) – used for event study format. If False, dataframe will be sorted by i (and t, if included). Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – used for event study format. If False, avoid copy.

Returns

dataframe of ids with sufficiently many observations

Return type

(BipartiteEventStudyBase)

min_movers_frame(threshold=15, drop_returns_to_stays=False, is_sorted=False, reset_index=True, copy=True)

Return dataframe where all firms have at least threshold many movers. This method employs loops, as dropping firms that don’t meet the threshold may lower the number of movers at other firms.

Parameters

threshold (int) – minimum number of movers required to keep a firm
drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
reset_index (bool) – not used for event study format
copy (bool) – if False, avoid copy.

Returns

dataframe of firms with sufficiently many movers

Return type

(BipartiteEventStudyBase)

min_moves_firms(threshold=2, is_sorted=False, copy=True)

List firms with at least threshold many moves. Note that a single mover can have multiple moves at the same firm.

Parameters

threshold (int) – minimum number of moves required to keep a firm
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – if False, avoid copy

Returns

firms with sufficiently many moves

Return type

(NumPy Array)

min_moves_frame(threshold=2, drop_returns_to_stays=False, is_sorted=False, reset_index=False, copy=True)

Return dataframe where all firms have at least threshold many moves. Note that a single worker can have multiple moves at the same firm. This method employs loops, as dropping firms that don’t meet the threshold may lower the number of moves at other firms.

Parameters

threshold (int) – minimum number of moves required to keep a firm
drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
reset_index (bool) – not used for event study format
copy (bool) – if False, avoid copy

Returns

dataframe of firms with sufficiently many moves

Return type

(BipartiteEventStudyBase)

min_obs_frame(threshold=2, id_col='j', drop_returns_to_stays=False, is_sorted=False, copy=True)

Return dataframe of column ids with at least threshold many observations.

Parameters

threshold (int) – minimum number of observations required to keep an id
id_col (str) – column to check ids (‘i’, ‘j’, or ‘g’). Use general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’.
drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – if False, avoid copy

Returns

dataframe of ids with sufficiently many observations

Return type

(BipartiteEventStudyBase)

min_obs_ids(threshold=2, id_col='j', is_sorted=False, copy=True)

List column ids with at least threshold many observations.

Parameters

threshold (int) – minimum number of observations required to keep an id
id_col (str) – column to check ids (‘i’, ‘j’, or ‘g’). Use general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’.
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – if False, avoid copy

Returns

ids with sufficiently many observations

Return type

(NumPy Array)

min_workers_firms(threshold=2, is_sorted=False, copy=True)

List firms with at least threshold many workers.

Parameters

threshold (int) – minimum number of workers required to keep a firm
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – if False, avoid copy

Returns

firms with sufficiently many workers

Return type

(NumPy Array)

min_workers_frame(threshold=15, drop_returns_to_stays=False, is_sorted=False, copy=True)

Return dataframe of firms with at least threshold many workers.

Parameters

threshold (int) – minimum number of workers required to keep a firm
drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – if False, avoid copy

Returns

dataframe of firms with sufficiently many workers

Return type

(BipartiteEventStudyBase)

to_long(is_clean=True, drop_no_split_columns=True, is_sorted=False, copy=True)

Return (collapsed) event study data reformatted into (collapsed) long form.

Parameters

is_clean (bool) – if True, data is already clean (this ensures that observations that are in two consecutive event studies appear only once, e.g. the event study A -> B, B -> C turns into A -> B -> C; otherwise, it will become A -> B -> B -> C). Set to False if duplicates will be handled manually.
drop_no_split_columns (bool) – if True, columns marked by self.col_long_es_dict as None (i.e. they should be dropped) will not be dropped
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – if False, avoid copy

Returns

long format dataframe generated from event study data

Return type

(BipartiteLongBase)