BipartiteEventStudyBase class
- class bipartitepandas.bipartiteeventstudybase.BipartiteEventStudyBase(*args, col_reference_dict=None, **kwargs)
Bases:
BipartiteBase
Base class for BipartiteEventStudy and BipartiteEventStudyCollapsed, which give bipartite networks of firms and workers in event study and collapsed event study form, respectively. Contains generalized methods. Inherits from BipartiteBase.
- Parameters
*args – arguments for BipartiteBase
col_reference_dict (dict or None) – clarify which columns are associated with a general column name, e.g. {‘i’: ‘i’, ‘j’: [‘j1’, ‘j2’]}; None is equivalent to {}
**kwargs – keyword arguments for BipartiteBase
- clean(params=None)
Clean data to make sure there are no NaN or duplicate observations, observations where workers leave a firm then return to it are removed, firms are connected by movers, and categorical ids are contiguous.
- Parameters
params (ParamsDict or None) – dictionary of parameters for cleaning. Run bpd.clean_params().describe_all() for descriptions of all valid parameters. None is equivalent to bpd.clean_params().
- Returns
dataframe with cleaned data
- Return type
- construct_artificial_time(time_per_worker=False, is_sorted=False, copy=True)
Construct artificial time columns to enable conversion to (collapsed) long format. Only adds columns if time columns not already included.
- Parameters
time_per_worker (bool) – if True, set time independently for each worker (note that this is significantly more computationally costly)
is_sorted (bool) – set to True if dataframe is already sorted by i (this avoids a sort inside a groupby if time_per_worker=True, but this groupby will not sort the returned dataframe)
copy (bool) – if False, avoid copy
- Returns
dataframe with artificial time columns
- Return type
- diagnostic()
Run diagnostic and print diagnostic report.
- drop_ids(id_col, drop_ids_list, drop_returns_to_stays=False, is_sorted=False, reset_index=False, copy=True)
Drop ids belonging to a given set of ids.
- Parameters
id_col (str) – column of ids to consider (‘i’, ‘j’, or ‘g’)
drop_ids_list (list) – ids to drop
drop_returns_to_stays (bool) – used only if id_col is ‘j’ or ‘g’ and using BipartiteEventStudyCollapsed format. If True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer).
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
reset_index (bool) – not used for event study format
copy (bool) – if False, avoid copy
- Returns
dataframe with ids outside the given set
- Return type
- gen_m(force=False, copy=True)
Generate m column for data (m == 0 if stayer, m == 1 if mover).
- Parameters
force (bool) – if True, reset ‘m’ column even if it exists
copy (bool) – if False, avoid copy
- Returns
dataframe with m column
- Return type
- get_cs(copy=True)
Return (collapsed) event study data reformatted into cross section data.
- Parameters
copy (bool) – if False, avoid copy
- Returns
cross section data
- Return type
(Pandas DataFrame)
- keep_ids(id_col, keep_ids_list, drop_returns_to_stays=False, is_sorted=False, reset_index=False, copy=True)
Only keep ids belonging to a given set of ids.
- Parameters
id_col (str) – column of ids to consider (‘i’, ‘j’, or ‘g’)
keep_ids_list (list) – ids to keep
drop_returns_to_stays (bool) – used only if id_col is ‘j’ or ‘g’ and using BipartiteEventStudyCollapsed format. If True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer).
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
reset_index (bool) – not used for event study format
copy (bool) – if False, avoid copy
- Returns
dataframe with ids in the given set
- Return type
- keep_rows(rows, drop_returns_to_stays=False, is_sorted=False, reset_index=False, copy=True)
Only keep particular rows.
- Parameters
rows (list) – rows to keep
drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
reset_index (bool) – not used for event study format
copy (bool) – if False, avoid copy
- Returns
dataframe with given rows
- Return type
- min_joint_obs_frame(threshold_1=2, threshold_2=2, id_col_1='j', id_col_2='i', drop_returns_to_stays=False, is_sorted=False, copy=True)
Return dataframe where column 1 ids have at least threshold_1 many observations and column 2 ids have at least threshold_2 many observations.
- Parameters
threshold_1 (int) – minimum number of observations required to keep an id from column 1
threshold_2 (int) – minimum number of observations required to keep an id from column 2
id_col_1 (str) – column to check ids (‘i’, ‘j’, or ‘g’). Use general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’.
id_col_2 (str) – column to check ids (‘i’, ‘j’, or ‘g’). Use general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’.
drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)
is_sorted (bool) – used for event study format. If False, dataframe will be sorted by i (and t, if included). Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – used for event study format. If False, avoid copy.
- Returns
dataframe of ids with sufficiently many observations
- Return type
- min_movers_frame(threshold=15, drop_returns_to_stays=False, is_sorted=False, reset_index=True, copy=True)
Return dataframe where all firms have at least threshold many movers. This method employs loops, as dropping firms that don’t meet the threshold may lower the number of movers at other firms.
- Parameters
threshold (int) – minimum number of movers required to keep a firm
drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
reset_index (bool) – not used for event study format
copy (bool) – if False, avoid copy.
- Returns
dataframe of firms with sufficiently many movers
- Return type
- min_moves_firms(threshold=2, is_sorted=False, copy=True)
List firms with at least threshold many moves. Note that a single mover can have multiple moves at the same firm.
- Parameters
threshold (int) – minimum number of moves required to keep a firm
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – if False, avoid copy
- Returns
firms with sufficiently many moves
- Return type
(NumPy Array)
- min_moves_frame(threshold=2, drop_returns_to_stays=False, is_sorted=False, reset_index=False, copy=True)
Return dataframe where all firms have at least threshold many moves. Note that a single worker can have multiple moves at the same firm. This method employs loops, as dropping firms that don’t meet the threshold may lower the number of moves at other firms.
- Parameters
threshold (int) – minimum number of moves required to keep a firm
drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
reset_index (bool) – not used for event study format
copy (bool) – if False, avoid copy
- Returns
dataframe of firms with sufficiently many moves
- Return type
- min_obs_frame(threshold=2, id_col='j', drop_returns_to_stays=False, is_sorted=False, copy=True)
Return dataframe of column ids with at least threshold many observations.
- Parameters
threshold (int) – minimum number of observations required to keep an id
id_col (str) – column to check ids (‘i’, ‘j’, or ‘g’). Use general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’.
drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – if False, avoid copy
- Returns
dataframe of ids with sufficiently many observations
- Return type
- min_obs_ids(threshold=2, id_col='j', is_sorted=False, copy=True)
List column ids with at least threshold many observations.
- Parameters
threshold (int) – minimum number of observations required to keep an id
id_col (str) – column to check ids (‘i’, ‘j’, or ‘g’). Use general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’.
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – if False, avoid copy
- Returns
ids with sufficiently many observations
- Return type
(NumPy Array)
- min_workers_firms(threshold=2, is_sorted=False, copy=True)
List firms with at least threshold many workers.
- Parameters
threshold (int) – minimum number of workers required to keep a firm
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – if False, avoid copy
- Returns
firms with sufficiently many workers
- Return type
(NumPy Array)
- min_workers_frame(threshold=15, drop_returns_to_stays=False, is_sorted=False, copy=True)
Return dataframe of firms with at least threshold many workers.
- Parameters
threshold (int) – minimum number of workers required to keep a firm
drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – if False, avoid copy
- Returns
dataframe of firms with sufficiently many workers
- Return type
- to_long(is_clean=True, drop_no_split_columns=True, is_sorted=False, copy=True)
Return (collapsed) event study data reformatted into (collapsed) long form.
- Parameters
is_clean (bool) – if True, data is already clean (this ensures that observations that are in two consecutive event studies appear only once, e.g. the event study A -> B, B -> C turns into A -> B -> C; otherwise, it will become A -> B -> B -> C). Set to False if duplicates will be handled manually.
drop_no_split_columns (bool) – if True, columns marked by self.col_long_es_dict as None (i.e. they should be dropped) will not be dropped
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – if False, avoid copy
- Returns
long format dataframe generated from event study data
- Return type