BipartiteLongBase class

class bipartitepandas.bipartitelongbase.BipartiteLongBase(*args, col_reference_dict=None, **kwargs)

Bases: BipartiteBase

Base class for BipartiteLong and BipartiteLongCollapsed, where BipartiteLong and BipartiteLongCollapsed give a bipartite network of firms and workers in long and collapsed long form, respectively. Contains generalized methods. Inherits from BipartiteBase.

Parameters
  • *args – arguments for BipartiteBase

  • col_reference_dict (dict or None) – clarify which columns are associated with a general column name, e.g. {‘i’: ‘i’, ‘j’: [‘j1’, ‘j2’]}; None is equivalent to {}

  • **kwargs – keyword arguments for BipartiteBase

clean(params=None)

Clean data to make sure there are no NaN or duplicate observations, observations where workers leave a firm then return to it are removed, firms are connected by movers, and categorical ids are contiguous.

Parameters

params (ParamsDict or None) – dictionary of parameters for cleaning. Run bpd.clean_params().describe_all() for descriptions of all valid parameters. None is equivalent to bpd.clean_params().

Returns

dataframe with cleaned data

Return type

(BipartiteLongBase)

construct_artificial_time(time_per_worker=False, is_sorted=False, copy=True)

Construct artificial time column(s) to enable conversion to (collapsed) event study format. Only adds column(s) if time column(s) not already included.

Parameters
  • time_per_worker (bool) – if True, set time independently for each worker (note that this is significantly more computationally costly)

  • is_sorted (bool) – set to True if dataframe is already sorted by i (this avoids a sort inside a groupby if time_per_worker=True, but this groupby will not sort the returned dataframe)

  • copy (bool) – if False, avoid copy

Returns

dataframe with artificial time column(s)

Return type

(BipartiteLongBase)

drop_ids(id_col, drop_ids_list, drop_returns_to_stays=False, is_sorted=False, reset_index=True, copy=True)

Drop ids belonging to a given set of ids.

Parameters
  • id_col (str) – column of ids to consider (‘i’, ‘j’, or ‘g’)

  • drop_ids_list (list) – ids to drop

  • drop_returns_to_stays (bool) – used only if id_col is ‘j’ or ‘g’ and using BipartiteLongCollapsed format. If True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer).

  • is_sorted (bool) – if False, dataframe may be sorted by i (and t, if included) if data is collapsed long format. Returned dataframe is not guaranteed to be sorted if original dataframe is not sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.

  • reset_index (bool) – if True, reset index at end

  • copy (bool) – if False, avoid copy

Returns

dataframe with ids outside the given set

Return type

(BipartiteLongBase)

gen_m(force=False, copy=True)

Generate m column for data (m == 0 if stayer, m == 1 or 2 if mover).

Parameters
  • force (bool) – if True, reset ‘m’ column even if it exists

  • copy (bool) – if False, avoid copy

Returns

dataframe with m column

Return type

(BipartiteLongBase)

keep_ids(id_col, keep_ids_list, drop_returns_to_stays=False, is_sorted=False, reset_index=True, copy=True)

Only keep ids belonging to a given set of ids.

Parameters
  • id_col (str) – column of ids to consider (‘i’, ‘j’, or ‘g’)

  • keep_ids_list (list) – ids to keep

  • drop_returns_to_stays (bool) – used only if id_col is ‘j’ or ‘g’ and using BipartiteLongCollapsed format. If True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer).

  • is_sorted (bool) – if False, dataframe may be sorted by i (and t, if included) if data is collapsed long format. Returned dataframe is not guaranteed to be sorted if original dataframe is not sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.

  • reset_index (bool) – if True, reset index at end

  • copy (bool) – if False, avoid copy

Returns

dataframe with ids in the given set

Return type

(BipartiteLongBase)

keep_rows(rows_list, drop_returns_to_stays=False, is_sorted=False, reset_index=True, copy=True)

Only keep particular rows.

Parameters
  • rows_list (list) – rows to keep

  • drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)

  • is_sorted (bool) – if False, dataframe may be sorted by i (and t, if included) if data is collapsed long format. Returned dataframe is not guaranteed to be sorted if original dataframe is not sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.

  • reset_index (bool) – if True, reset index at end

  • copy (bool) – if False, avoid copy

Returns

dataframe with given rows

Return type

(BipartiteLongBase)

min_joint_obs_frame(threshold_1=2, threshold_2=2, id_col_1='j', id_col_2='i', drop_returns_to_stays=False, is_sorted=False, copy=True)

Return dataframe where column 1 ids have at least threshold_1 many observations and column 2 ids have at least threshold_2 many observations.

Parameters
  • threshold_1 (int) – minimum number of observations required to keep an id from column 1

  • threshold_2 (int) – minimum number of observations required to keep an id from column 2

  • id_col_1 (str) – column to check ids (‘i’, ‘j’, or ‘g’). Use general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’.

  • id_col_2 (str) – column to check ids (‘i’, ‘j’, or ‘g’). Use general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’.

  • drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)

  • is_sorted (bool) – used for event study format. If False, dataframe will be sorted by i (and t, if included). Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.

  • copy (bool) – used for event study format. If False, avoid copy.

Returns

dataframe of ids with sufficiently many observations

Return type

(BipartiteLongBase)

min_movers_frame(threshold=15, drop_returns_to_stays=False, is_sorted=False, reset_index=True, copy=True)

Return dataframe where all firms have at least threshold many movers. This method employs loops, as dropping firms that don’t meet the threshold may lower the number of movers at other firms.

Parameters
  • threshold (int) – minimum number of movers required to keep a firm

  • drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)

  • is_sorted (bool) – if False, dataframe may be sorted by i (and t, if included) if data is collapsed long format. Returned dataframe is not guaranteed to be sorted if original dataframe is not sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.

  • reset_index (bool) – if True, reset index at end

  • copy (bool) – if False, avoid copy

Returns

dataframe of firms with sufficiently many movers

Return type

(BipartiteLongBase)

min_moves_firms(threshold=2)

List firms with at least threshold many moves. Note that a single mover can have multiple moves at the same firm.

Parameters

threshold (int) – minimum number of moves required to keep a firm

Returns

firms with sufficiently many moves

Return type

(NumPy Array)

min_moves_frame(threshold=2, drop_returns_to_stays=False, is_sorted=False, reset_index=True, copy=True)

Return dataframe where all firms have at least threshold many moves. Note that a single worker can have multiple moves at the same firm. This method employs loops, as dropping firms that don’t meet the threshold may lower the number of moves at other firms.

Parameters
  • threshold (int) – minimum number of moves required to keep a firm

  • drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)

  • is_sorted (bool) – if False, dataframe may be sorted by i (and t, if included) if data is collapsed long format. Returned dataframe is not guaranteed to be sorted if original dataframe is not sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.

  • reset_index (bool) – if True, reset index at end

  • copy (bool) – if False, avoid copy

Returns

dataframe of firms with sufficiently many moves

Return type

(BipartiteLongBase)

min_obs_frame(threshold=2, id_col='j', drop_returns_to_stays=False, is_sorted=False, copy=True)

Return dataframe of column ids with at least threshold many observations.

Parameters
  • threshold (int) – minimum number of observations required to keep an id

  • id_col (str) – column to check ids (‘i’, ‘j’, or ‘g’). Use general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’.

  • drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)

  • is_sorted (bool) – if False, dataframe may be sorted by i (and t, if included) if data is collapsed long format. Returned dataframe is not guaranteed to be sorted if original dataframe is not sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.

  • copy (bool) – if False, avoid copy

Returns

dataframe of ids with sufficiently many observations

Return type

(BipartiteLongBase)

min_obs_ids(threshold=2, id_col='j', is_sorted=False, copy=True)

List column ids with at least threshold many observations.

Parameters
  • threshold (int) – minimum number of observations required to keep an id

  • id_col (str) – column to check ids (‘i’, ‘j’, or ‘g’). Use general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’.

  • is_sorted (bool) – not used for long format

  • copy (bool) – not used for long format

Returns

ids with sufficiently many observations

Return type

(NumPy Array)

min_workers_firms(threshold=15, is_sorted=False, copy=True)

List firms with at least threshold many workers.

Parameters
  • threshold (int) – minimum number of workers required to keep a firm

  • is_sorted (bool) – not used for long format

  • copy (bool) – not used for long format

Returns

list of firms with sufficiently many workers

Return type

(NumPy Array)

min_workers_frame(threshold=15, drop_returns_to_stays=False, is_sorted=False, copy=True)

Return dataframe of firms with at least threshold many workers.

Parameters
  • threshold (int) – minimum number of workers required to keep a firm

  • drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)

  • is_sorted (bool) – if False, dataframe may be sorted by i (and t, if included) if data is collapsed long format. Returned dataframe is not guaranteed to be sorted if original dataframe is not sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.

  • copy (bool) – if False, avoid copy

Returns

dataframe of firms with sufficiently many workers

Return type

(BipartiteLongBase)

to_eventstudy(move_to_worker=False, is_sorted=False, copy=True)

Return (collapsed) long form data reformatted into (collapsed) event study data.

Parameters
  • move_to_worker (bool) – if True, each move is treated as a new worker

  • is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.

  • copy (bool) – if False, avoid copy

Returns

event study dataframe

Return type

(BipartiteEventStudyBase)

to_extendedeventstudy(periods_pre=2, periods_post=2, stable_pre=None, stable_post=None, transition_col=None, move_to_worker=True, is_sorted=False, copy=True)

Return (collapsed) long form data reformatted into (collapsed) extended event study data.

Parameters
  • periods_pre (int) – number of periods each event study will include before a transition (if a transition is not specified, any new observation is considered a transition)

  • periods_post (int) – number of periods each event study will include after a transition (if a transition is not specified, any new observation is considered a transition)

  • stable_pre (str or list of str or None) – column name or list of column names, where each event study should be kept only if the values in all listed columns are constant before the transition; None is equivalent to []

  • stable_post (str or list of str or None) – column name or list of column names, where each event study should be kept only if the values in all listed columns are constant after the transition; None is equivalent to []

  • transition_col (str or None) – column to use to define a transition; if None, any new observation is considered a transition

  • move_to_worker (bool) – if True, each move is treated as a new worker

  • is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.

  • copy (bool) – if False, avoid copy

Returns

extended event study dataframe

Return type

(BipartiteExtendedEventStudyBase)