BipartiteLong class
- class bipartitepandas.bipartitelong.BipartiteLong(*args, col_reference_dict=None, col_collapse_dict=None, **kwargs)
Bases:
BipartiteLongBase
Class for bipartite networks of firms and workers in long form. Inherits from BipartiteLongBase.
- Parameters
*args – arguments for BipartiteLongBase
col_reference_dict (dict or None) – clarify which columns are associated with a general column name, e.g. {‘i’: ‘i’, ‘j’: [‘j1’, ‘j2’]}; None is equivalent to {}
col_collapse_dict (dict or None) – how to collapse column (None indicates the column should be dropped), e.g. {‘y’: ‘mean’}; None is equivalent to {}
**kwargs – keyword arguments for BipartiteLongBase
- clean(params=None)
Clean data to make sure there are no NaN or duplicate observations, observations where workers leave a firm then return to it are removed, firms are connected by movers, and categorical ids are contiguous.
- Parameters
params (ParamsDict or None) – dictionary of parameters for cleaning. Run bpd.clean_params().describe_all() for descriptions of all valid parameters. None is equivalent to bpd.clean_params().
- Returns
dataframe with cleaned data
- Return type
- collapse(level='spell', is_sorted=False, copy=True)
Collapse long data at the worker-firm spell/match level (so each spell/match for a particular worker at a particular firm becomes one observation).
- Parameters
level (str) – if ‘spell’, collapse at the worker-firm spell level; if ‘match’, collapse at the worker-firm match level (‘spell’ and ‘match’ will differ if a worker leaves then returns to a firm)
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – if False, avoid copy
- Returns
collapsed long data generated by collapsing long data at the worker-firm spell level
- Return type
- fill_missing_periods(fill_dict=None, is_sorted=False, copy=True)
Return Pandas dataframe of long format data with missing periods filled in as unemployed. By default j is filled in as - 1, and y and m are filled in as pd.NA, but these values can be specified.
- Parameters
fill_dict (dict or None) – dictionary linking general column to value to fill in for missing rows. None is equivalent to {}. Set value to ‘prev’ to set to previous value that appeared in the dataframe (cannot use ‘next’ because this method iterates forward over the dataframe). Can set value for any column except i. Any column not listed will default to pd.NA, except ‘j’ will always default to -1 unless overridden.
is_sorted (bool) – if False, dataframe will be sorted by i and t. Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – if False, avoid copy
- Returns
dataframe with missing periods filled in as unemployed
- Return type
(Pandas DataFrame)
- get_worker_m(is_sorted=False)
Get NumPy array indicating whether the worker associated with each observation is a mover.
- Parameters
is_sorted (bool) – if False, dataframe will be sorted by i in a groupby (but self will not be not sorted). Set is_sorted to True if dataframe is already sorted.
- Returns
indicates whether the worker associated with each observation is a mover
- Return type
(NumPy Array)