BipartiteLong class

class bipartitepandas.bipartitelong.BipartiteLong(*args, col_reference_dict=None, col_collapse_dict=None, **kwargs)

Bases: BipartiteLongBase

Class for bipartite networks of firms and workers in long form. Inherits from BipartiteLongBase.

Parameters

*args – arguments for BipartiteLongBase
col_reference_dict (dict or None) – clarify which columns are associated with a general column name, e.g. {‘i’: ‘i’, ‘j’: [‘j1’, ‘j2’]}; None is equivalent to {}
col_collapse_dict (dict or None) – how to collapse column (None indicates the column should be dropped), e.g. {‘y’: ‘mean’}; None is equivalent to {}
**kwargs – keyword arguments for BipartiteLongBase

clean(params=None)

Clean data to make sure there are no NaN or duplicate observations, observations where workers leave a firm then return to it are removed, firms are connected by movers, and categorical ids are contiguous.

Parameters: params (ParamsDict or None) – dictionary of parameters for cleaning. Run bpd.clean_params().describe_all() for descriptions of all valid parameters. None is equivalent to bpd.clean_params().
Returns: dataframe with cleaned data
Return type: (BipartiteLongBase)

collapse(level='spell', is_sorted=False, copy=True)

Collapse long data at the worker-firm spell/match level (so each spell/match for a particular worker at a particular firm becomes one observation).

Parameters

level (str) – if ‘spell’, collapse at the worker-firm spell level; if ‘match’, collapse at the worker-firm match level (‘spell’ and ‘match’ will differ if a worker leaves then returns to a firm)
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – if False, avoid copy

Returns

collapsed long data generated by collapsing long data at the worker-firm spell level

Return type

(BipartiteLongCollapsed)

fill_missing_periods(fill_dict=None, is_sorted=False, copy=True)

Return Pandas dataframe of long format data with missing periods filled in as unemployed. By default j is filled in as - 1, and y and m are filled in as pd.NA, but these values can be specified.

Parameters

fill_dict (dict or None) – dictionary linking general column to value to fill in for missing rows. None is equivalent to {}. Set value to ‘prev’ to set to previous value that appeared in the dataframe (cannot use ‘next’ because this method iterates forward over the dataframe). Can set value for any column except i. Any column not listed will default to pd.NA, except ‘j’ will always default to -1 unless overridden.
is_sorted (bool) – if False, dataframe will be sorted by i and t. Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – if False, avoid copy

Returns

dataframe with missing periods filled in as unemployed

Return type

(Pandas DataFrame)

get_worker_m(is_sorted=False)

Get NumPy array indicating whether the worker associated with each observation is a mover.

Parameters: is_sorted (bool) – if False, dataframe will be sorted by i in a groupby (but self will not be not sorted). Set is_sorted to True if dataframe is already sorted.
Returns: indicates whether the worker associated with each observation is a mover
Return type: (NumPy Array)