BipartiteEventStudyBase class

class bipartitepandas.bipartiteeventstudybase.BipartiteEventStudyBase(*args, col_reference_dict=None, **kwargs)

Bases: BipartiteBase

Base class for BipartiteEventStudy and BipartiteEventStudyCollapsed, which give bipartite networks of firms and workers in event study and collapsed event study form, respectively. Contains generalized methods. Inherits from BipartiteBase.

Parameters
  • *args – arguments for BipartiteBase

  • col_reference_dict (dict or None) – clarify which columns are associated with a general column name, e.g. {‘i’: ‘i’, ‘j’: [‘j1’, ‘j2’]}; None is equivalent to {}

  • **kwargs – keyword arguments for BipartiteBase

clean(params=None)

Clean data to make sure there are no NaN or duplicate observations, observations where workers leave a firm then return to it are removed, firms are connected by movers, and categorical ids are contiguous.

Parameters

params (ParamsDict or None) – dictionary of parameters for cleaning. Run bpd.clean_params().describe_all() for descriptions of all valid parameters. None is equivalent to bpd.clean_params().

Returns

dataframe with cleaned data

Return type

(BipartiteEventStudyBase)

construct_artificial_time(time_per_worker=False, is_sorted=False, copy=True)

Construct artificial time columns to enable conversion to (collapsed) long format. Only adds columns if time columns not already included.

Parameters
  • time_per_worker (bool) – if True, set time independently for each worker (note that this is significantly more computationally costly)

  • is_sorted (bool) – set to True if dataframe is already sorted by i (this avoids a sort inside a groupby if time_per_worker=True, but this groupby will not sort the returned dataframe)

  • copy (bool) – if False, avoid copy

Returns

dataframe with artificial time columns

Return type

(BipartiteEventStudyBase)

diagnostic()

Run diagnostic and print diagnostic report.

drop_ids(id_col, drop_ids_list, drop_returns_to_stays=False, is_sorted=False, reset_index=False, copy=True)

Drop ids belonging to a given set of ids.

Parameters
  • id_col (str) – column of ids to consider (‘i’, ‘j’, or ‘g’)

  • drop_ids_list (list) – ids to drop

  • drop_returns_to_stays (bool) – used only if id_col is ‘j’ or ‘g’ and using BipartiteEventStudyCollapsed format. If True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer).

  • is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.

  • reset_index (bool) – not used for event study format

  • copy (bool) – if False, avoid copy

Returns

dataframe with ids outside the given set

Return type

(BipartiteEventStudyBase)

gen_m(force=False, copy=True)

Generate m column for data (m == 0 if stayer, m == 1 if mover).

Parameters
  • force (bool) – if True, reset ‘m’ column even if it exists

  • copy (bool) – if False, avoid copy

Returns

dataframe with m column

Return type

(BipartiteEventStudyBase)

get_cs(copy=True)

Return (collapsed) event study data reformatted into cross section data.

Parameters

copy (bool) – if False, avoid copy

Returns

cross section data

Return type

(Pandas DataFrame)

keep_ids(id_col, keep_ids_list, drop_returns_to_stays=False, is_sorted=False, reset_index=False, copy=True)

Only keep ids belonging to a given set of ids.

Parameters
  • id_col (str) – column of ids to consider (‘i’, ‘j’, or ‘g’)

  • keep_ids_list (list) – ids to keep

  • drop_returns_to_stays (bool) – used only if id_col is ‘j’ or ‘g’ and using BipartiteEventStudyCollapsed format. If True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer).

  • is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.

  • reset_index (bool) – not used for event study format

  • copy (bool) – if False, avoid copy

Returns

dataframe with ids in the given set

Return type

(BipartiteEventStudyBase)

keep_rows(rows, drop_returns_to_stays=False, is_sorted=False, reset_index=False, copy=True)

Only keep particular rows.

Parameters
  • rows (list) – rows to keep

  • drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)

  • is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.

  • reset_index (bool) – not used for event study format

  • copy (bool) – if False, avoid copy

Returns

dataframe with given rows

Return type

(BipartiteEventStudyBase)

min_joint_obs_frame(threshold_1=2, threshold_2=2, id_col_1='j', id_col_2='i', drop_returns_to_stays=False, is_sorted=False, copy=True)

Return dataframe where column 1 ids have at least threshold_1 many observations and column 2 ids have at least threshold_2 many observations.

Parameters
  • threshold_1 (int) – minimum number of observations required to keep an id from column 1

  • threshold_2 (int) – minimum number of observations required to keep an id from column 2

  • id_col_1 (str) – column to check ids (‘i’, ‘j’, or ‘g’). Use general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’.

  • id_col_2 (str) – column to check ids (‘i’, ‘j’, or ‘g’). Use general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’.

  • drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)

  • is_sorted (bool) – used for event study format. If False, dataframe will be sorted by i (and t, if included). Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.

  • copy (bool) – used for event study format. If False, avoid copy.

Returns

dataframe of ids with sufficiently many observations

Return type

(BipartiteEventStudyBase)

min_movers_frame(threshold=15, drop_returns_to_stays=False, is_sorted=False, reset_index=True, copy=True)

Return dataframe where all firms have at least threshold many movers. This method employs loops, as dropping firms that don’t meet the threshold may lower the number of movers at other firms.

Parameters
  • threshold (int) – minimum number of movers required to keep a firm

  • drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)

  • is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.

  • reset_index (bool) – not used for event study format

  • copy (bool) – if False, avoid copy.

Returns

dataframe of firms with sufficiently many movers

Return type

(BipartiteEventStudyBase)

min_moves_firms(threshold=2, is_sorted=False, copy=True)

List firms with at least threshold many moves. Note that a single mover can have multiple moves at the same firm.

Parameters
  • threshold (int) – minimum number of moves required to keep a firm

  • is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.

  • copy (bool) – if False, avoid copy

Returns

firms with sufficiently many moves

Return type

(NumPy Array)

min_moves_frame(threshold=2, drop_returns_to_stays=False, is_sorted=False, reset_index=False, copy=True)

Return dataframe where all firms have at least threshold many moves. Note that a single worker can have multiple moves at the same firm. This method employs loops, as dropping firms that don’t meet the threshold may lower the number of moves at other firms.

Parameters
  • threshold (int) – minimum number of moves required to keep a firm

  • drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)

  • is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.

  • reset_index (bool) – not used for event study format

  • copy (bool) – if False, avoid copy

Returns

dataframe of firms with sufficiently many moves

Return type

(BipartiteEventStudyBase)

min_obs_frame(threshold=2, id_col='j', drop_returns_to_stays=False, is_sorted=False, copy=True)

Return dataframe of column ids with at least threshold many observations.

Parameters
  • threshold (int) – minimum number of observations required to keep an id

  • id_col (str) – column to check ids (‘i’, ‘j’, or ‘g’). Use general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’.

  • drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)

  • is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.

  • copy (bool) – if False, avoid copy

Returns

dataframe of ids with sufficiently many observations

Return type

(BipartiteEventStudyBase)

min_obs_ids(threshold=2, id_col='j', is_sorted=False, copy=True)

List column ids with at least threshold many observations.

Parameters
  • threshold (int) – minimum number of observations required to keep an id

  • id_col (str) – column to check ids (‘i’, ‘j’, or ‘g’). Use general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’.

  • is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.

  • copy (bool) – if False, avoid copy

Returns

ids with sufficiently many observations

Return type

(NumPy Array)

min_workers_firms(threshold=2, is_sorted=False, copy=True)

List firms with at least threshold many workers.

Parameters
  • threshold (int) – minimum number of workers required to keep a firm

  • is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.

  • copy (bool) – if False, avoid copy

Returns

firms with sufficiently many workers

Return type

(NumPy Array)

min_workers_frame(threshold=15, drop_returns_to_stays=False, is_sorted=False, copy=True)

Return dataframe of firms with at least threshold many workers.

Parameters
  • threshold (int) – minimum number of workers required to keep a firm

  • drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)

  • is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.

  • copy (bool) – if False, avoid copy

Returns

dataframe of firms with sufficiently many workers

Return type

(BipartiteEventStudyBase)

to_long(is_clean=True, drop_no_split_columns=True, is_sorted=False, copy=True)

Return (collapsed) event study data reformatted into (collapsed) long form.

Parameters
  • is_clean (bool) – if True, data is already clean (this ensures that observations that are in two consecutive event studies appear only once, e.g. the event study A -> B, B -> C turns into A -> B -> C; otherwise, it will become A -> B -> B -> C). Set to False if duplicates will be handled manually.

  • drop_no_split_columns (bool) – if True, columns marked by self.col_long_es_dict as None (i.e. they should be dropped) will not be dropped

  • is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.

  • copy (bool) – if False, avoid copy

Returns

long format dataframe generated from event study data

Return type

(BipartiteLongBase)