BipartiteLongBase class

class bipartitepandas.bipartitelongbase.BipartiteLongBase(*args, col_reference_dict=None, **kwargs)

Bases: BipartiteBase

Base class for BipartiteLong and BipartiteLongCollapsed, where BipartiteLong and BipartiteLongCollapsed give a bipartite network of firms and workers in long and collapsed long form, respectively. Contains generalized methods. Inherits from BipartiteBase.

Parameters

*args – arguments for BipartiteBase
col_reference_dict (dict or None) – clarify which columns are associated with a general column name, e.g. {‘i’: ‘i’, ‘j’: [‘j1’, ‘j2’]}; None is equivalent to {}
**kwargs – keyword arguments for BipartiteBase

clean(params=None)

Clean data to make sure there are no NaN or duplicate observations, observations where workers leave a firm then return to it are removed, firms are connected by movers, and categorical ids are contiguous.

Parameters: params (ParamsDict or None) – dictionary of parameters for cleaning. Run bpd.clean_params().describe_all() for descriptions of all valid parameters. None is equivalent to bpd.clean_params().
Returns: dataframe with cleaned data
Return type: (BipartiteLongBase)

construct_artificial_time(time_per_worker=False, is_sorted=False, copy=True)

Construct artificial time column(s) to enable conversion to (collapsed) event study format. Only adds column(s) if time column(s) not already included.

Parameters

time_per_worker (bool) – if True, set time independently for each worker (note that this is significantly more computationally costly)
is_sorted (bool) – set to True if dataframe is already sorted by i (this avoids a sort inside a groupby if time_per_worker=True, but this groupby will not sort the returned dataframe)
copy (bool) – if False, avoid copy

Returns

dataframe with artificial time column(s)

Return type

(BipartiteLongBase)

drop_ids(id_col, drop_ids_list, drop_returns_to_stays=False, is_sorted=False, reset_index=True, copy=True)

Drop ids belonging to a given set of ids.

Parameters

id_col (str) – column of ids to consider (‘i’, ‘j’, or ‘g’)
drop_ids_list (list) – ids to drop
drop_returns_to_stays (bool) – used only if id_col is ‘j’ or ‘g’ and using BipartiteLongCollapsed format. If True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer).
is_sorted (bool) – if False, dataframe may be sorted by i (and t, if included) if data is collapsed long format. Returned dataframe is not guaranteed to be sorted if original dataframe is not sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
reset_index (bool) – if True, reset index at end
copy (bool) – if False, avoid copy

Returns

dataframe with ids outside the given set

Return type

(BipartiteLongBase)

gen_m(force=False, copy=True)

Generate m column for data (m == 0 if stayer, m == 1 or 2 if mover).

Parameters

force (bool) – if True, reset ‘m’ column even if it exists
copy (bool) – if False, avoid copy

Returns

dataframe with m column

Return type

(BipartiteLongBase)

keep_ids(id_col, keep_ids_list, drop_returns_to_stays=False, is_sorted=False, reset_index=True, copy=True)

Only keep ids belonging to a given set of ids.

Parameters

id_col (str) – column of ids to consider (‘i’, ‘j’, or ‘g’)
keep_ids_list (list) – ids to keep
drop_returns_to_stays (bool) – used only if id_col is ‘j’ or ‘g’ and using BipartiteLongCollapsed format. If True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer).
is_sorted (bool) – if False, dataframe may be sorted by i (and t, if included) if data is collapsed long format. Returned dataframe is not guaranteed to be sorted if original dataframe is not sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
reset_index (bool) – if True, reset index at end
copy (bool) – if False, avoid copy

Returns

dataframe with ids in the given set

Return type

(BipartiteLongBase)

keep_rows(rows_list, drop_returns_to_stays=False, is_sorted=False, reset_index=True, copy=True)

Only keep particular rows.

Parameters

rows_list (list) – rows to keep
drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)
is_sorted (bool) – if False, dataframe may be sorted by i (and t, if included) if data is collapsed long format. Returned dataframe is not guaranteed to be sorted if original dataframe is not sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
reset_index (bool) – if True, reset index at end
copy (bool) – if False, avoid copy

Returns

dataframe with given rows

Return type

(BipartiteLongBase)

min_joint_obs_frame(threshold_1=2, threshold_2=2, id_col_1='j', id_col_2='i', drop_returns_to_stays=False, is_sorted=False, copy=True)

Return dataframe where column 1 ids have at least threshold_1 many observations and column 2 ids have at least threshold_2 many observations.

Parameters

threshold_1 (int) – minimum number of observations required to keep an id from column 1
threshold_2 (int) – minimum number of observations required to keep an id from column 2
id_col_1 (str) – column to check ids (‘i’, ‘j’, or ‘g’). Use general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’.
id_col_2 (str) – column to check ids (‘i’, ‘j’, or ‘g’). Use general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’.
drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)
is_sorted (bool) – used for event study format. If False, dataframe will be sorted by i (and t, if included). Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – used for event study format. If False, avoid copy.

Returns

dataframe of ids with sufficiently many observations

Return type

(BipartiteLongBase)

min_movers_frame(threshold=15, drop_returns_to_stays=False, is_sorted=False, reset_index=True, copy=True)

Return dataframe where all firms have at least threshold many movers. This method employs loops, as dropping firms that don’t meet the threshold may lower the number of movers at other firms.

Parameters

threshold (int) – minimum number of movers required to keep a firm
drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)
is_sorted (bool) – if False, dataframe may be sorted by i (and t, if included) if data is collapsed long format. Returned dataframe is not guaranteed to be sorted if original dataframe is not sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
reset_index (bool) – if True, reset index at end
copy (bool) – if False, avoid copy

Returns

dataframe of firms with sufficiently many movers

Return type

(BipartiteLongBase)

min_moves_firms(threshold=2)

List firms with at least threshold many moves. Note that a single mover can have multiple moves at the same firm.

Parameters: threshold (int) – minimum number of moves required to keep a firm
Returns: firms with sufficiently many moves
Return type: (NumPy Array)

min_moves_frame(threshold=2, drop_returns_to_stays=False, is_sorted=False, reset_index=True, copy=True)

Return dataframe where all firms have at least threshold many moves. Note that a single worker can have multiple moves at the same firm. This method employs loops, as dropping firms that don’t meet the threshold may lower the number of moves at other firms.

Parameters

threshold (int) – minimum number of moves required to keep a firm
drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)
is_sorted (bool) – if False, dataframe may be sorted by i (and t, if included) if data is collapsed long format. Returned dataframe is not guaranteed to be sorted if original dataframe is not sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
reset_index (bool) – if True, reset index at end
copy (bool) – if False, avoid copy

Returns

dataframe of firms with sufficiently many moves

Return type

(BipartiteLongBase)

min_obs_frame(threshold=2, id_col='j', drop_returns_to_stays=False, is_sorted=False, copy=True)

Return dataframe of column ids with at least threshold many observations.

Parameters

threshold (int) – minimum number of observations required to keep an id
id_col (str) – column to check ids (‘i’, ‘j’, or ‘g’). Use general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’.
drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)
is_sorted (bool) – if False, dataframe may be sorted by i (and t, if included) if data is collapsed long format. Returned dataframe is not guaranteed to be sorted if original dataframe is not sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – if False, avoid copy

Returns

dataframe of ids with sufficiently many observations

Return type

(BipartiteLongBase)

min_obs_ids(threshold=2, id_col='j', is_sorted=False, copy=True)

List column ids with at least threshold many observations.

Parameters

threshold (int) – minimum number of observations required to keep an id
id_col (str) – column to check ids (‘i’, ‘j’, or ‘g’). Use general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’.
is_sorted (bool) – not used for long format
copy (bool) – not used for long format

Returns

ids with sufficiently many observations

Return type

(NumPy Array)

min_workers_firms(threshold=15, is_sorted=False, copy=True)

List firms with at least threshold many workers.

Parameters

threshold (int) – minimum number of workers required to keep a firm
is_sorted (bool) – not used for long format
copy (bool) – not used for long format

Returns

list of firms with sufficiently many workers

Return type

(NumPy Array)

min_workers_frame(threshold=15, drop_returns_to_stays=False, is_sorted=False, copy=True)

Return dataframe of firms with at least threshold many workers.

Parameters

threshold (int) – minimum number of workers required to keep a firm
drop_returns_to_stays (bool) – if True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)
is_sorted (bool) – if False, dataframe may be sorted by i (and t, if included) if data is collapsed long format. Returned dataframe is not guaranteed to be sorted if original dataframe is not sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – if False, avoid copy

Returns

dataframe of firms with sufficiently many workers

Return type

(BipartiteLongBase)

to_eventstudy(move_to_worker=False, is_sorted=False, copy=True)

Return (collapsed) long form data reformatted into (collapsed) event study data.

Parameters

move_to_worker (bool) – if True, each move is treated as a new worker
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – if False, avoid copy

Returns

event study dataframe

Return type

(BipartiteEventStudyBase)

to_extendedeventstudy(periods_pre=2, periods_post=2, stable_pre=None, stable_post=None, transition_col=None, move_to_worker=True, is_sorted=False, copy=True)

Return (collapsed) long form data reformatted into (collapsed) extended event study data.

Parameters

periods_pre (int) – number of periods each event study will include before a transition (if a transition is not specified, any new observation is considered a transition)
periods_post (int) – number of periods each event study will include after a transition (if a transition is not specified, any new observation is considered a transition)
stable_pre (str or list of str or None) – column name or list of column names, where each event study should be kept only if the values in all listed columns are constant before the transition; None is equivalent to []
stable_post (str or list of str or None) – column name or list of column names, where each event study should be kept only if the values in all listed columns are constant after the transition; None is equivalent to []
transition_col (str or None) – column to use to define a transition; if None, any new observation is considered a transition
move_to_worker (bool) – if True, each move is treated as a new worker
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – if False, avoid copy

Returns

extended event study dataframe

Return type

(BipartiteExtendedEventStudyBase)