BipartiteLong class

class bipartitepandas.bipartitelong.BipartiteLong(*args, col_reference_dict=None, col_collapse_dict=None, **kwargs)

Bases: BipartiteLongBase

Class for bipartite networks of firms and workers in long form. Inherits from BipartiteLongBase.

Parameters
  • *args – arguments for BipartiteLongBase

  • col_reference_dict (dict or None) – clarify which columns are associated with a general column name, e.g. {‘i’: ‘i’, ‘j’: [‘j1’, ‘j2’]}; None is equivalent to {}

  • col_collapse_dict (dict or None) – how to collapse column (None indicates the column should be dropped), e.g. {‘y’: ‘mean’}; None is equivalent to {}

  • **kwargs – keyword arguments for BipartiteLongBase

clean(params=None)

Clean data to make sure there are no NaN or duplicate observations, observations where workers leave a firm then return to it are removed, firms are connected by movers, and categorical ids are contiguous.

Parameters

params (ParamsDict or None) – dictionary of parameters for cleaning. Run bpd.clean_params().describe_all() for descriptions of all valid parameters. None is equivalent to bpd.clean_params().

Returns

dataframe with cleaned data

Return type

(BipartiteLongBase)

collapse(level='spell', is_sorted=False, copy=True)

Collapse long data at the worker-firm spell/match level (so each spell/match for a particular worker at a particular firm becomes one observation).

Parameters
  • level (str) – if ‘spell’, collapse at the worker-firm spell level; if ‘match’, collapse at the worker-firm match level (‘spell’ and ‘match’ will differ if a worker leaves then returns to a firm)

  • is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.

  • copy (bool) – if False, avoid copy

Returns

collapsed long data generated by collapsing long data at the worker-firm spell level

Return type

(BipartiteLongCollapsed)

fill_missing_periods(fill_dict=None, is_sorted=False, copy=True)

Return Pandas dataframe of long format data with missing periods filled in as unemployed. By default j is filled in as - 1, and y and m are filled in as pd.NA, but these values can be specified.

Parameters
  • fill_dict (dict or None) – dictionary linking general column to value to fill in for missing rows. None is equivalent to {}. Set value to ‘prev’ to set to previous value that appeared in the dataframe (cannot use ‘next’ because this method iterates forward over the dataframe). Can set value for any column except i. Any column not listed will default to pd.NA, except ‘j’ will always default to -1 unless overridden.

  • is_sorted (bool) – if False, dataframe will be sorted by i and t. Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.

  • copy (bool) – if False, avoid copy

Returns

dataframe with missing periods filled in as unemployed

Return type

(Pandas DataFrame)

get_worker_m(is_sorted=False)

Get NumPy array indicating whether the worker associated with each observation is a mover.

Parameters

is_sorted (bool) – if False, dataframe will be sorted by i in a groupby (but self will not be not sorted). Set is_sorted to True if dataframe is already sorted.

Returns

indicates whether the worker associated with each observation is a mover

Return type

(NumPy Array)