BipartiteBase class

class bipartitepandas.bipartitebase.BipartiteBase(*args, columns_req=None, columns_opt=None, columns_contig=None, col_reference_dict=None, col_dtype_dict=None, col_collapse_dict=None, col_long_es_dict=None, track_id_changes=False, log=False, **kwargs)

Bases: DataFrame

Base class for BipartitePandas, where BipartitePandas gives a bipartite network of firms and workers. Contains generalized methods. Inherits from DataFrame.

Parameters

*args – arguments for Pandas DataFrame
columns_req (list or None) – required columns (only put general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’; then put the joint columns in col_reference_dict); None is equivalent to []
columns_opt (list or None) – optional columns (only put general column names for joint columns, e.g. put ‘g’ instead of ‘g1’, ‘g2’; then put the joint columns in col_reference_dict); None is equivalent to []
columns_contig (dict or None) – columns of categorical ids linked to boolean of whether those ids are contiguous, or None if column(s) not included, e.g. {‘i’: False, ‘j’: False, ‘g’: None} (only put general column names for joint columns); None is equivalent to {}
col_reference_dict (dict or None) – clarify which joint columns are associated with a general column name, e.g. {‘i’: ‘i’, ‘j’: [‘j1’, ‘j2’]}; None is equivalent to {}
col_dtype_dict (dict or None) – link column to datatype, e.g. {‘m’: ‘int’}; None is equivalent to {}
col_collapse_dict (dict or None) – how to collapse column (None indicates the column should be dropped), e.g. {‘y’: ‘mean’}; None is equivalent to {}
col_long_es_dict (dict or None) – whether each column should split into two when converting from long to event study (None indicates the column should be dropped), e.g. {‘y’: True, ‘m’: None}; None is equivalent to {}
track_id_changes (bool) – if True, create dictionary of Pandas dataframes linking original categorical id values to updated contiguous id values
log (bool) – if True, will create log file(s)
**kwargs – keyword arguments for Pandas DataFrame

add_column(col_name, col_data=None, col_reference=None, is_categorical=False, dtype='any', how_collapse='first', long_es_split=True, copy=True)

Safe method for adding custom columns. Columns added with this method will be compatible with conversions between long, collapsed long, event study, and collapsed event study formats.

Parameters

col_name (str) – general column name
col_data (NumPy Array or Pandas Series or list of (NumPy Array or Pandas Series) or None) – data for column, or list of data for columns; set to None if columns already added to dataframe via column assignment
col_reference (str or list of str) – if column has multiple subcolumns (e.g. firm ids are associated with the columns [‘j1’, ‘j2’]) this must be specified; otherwise, None will automatically default to the column name (plus a column number, if more than one column is listed) (e.g. firm ids are associated with the column ‘j’ if one column is included, or [‘j1’, ‘j2’] if two columns are included)
is_categorical (bool) – if True, column is categorical
dtype (str) – column datatype, must be one of ‘int’, ‘float’, ‘any’, or ‘categorical’
how_collapse (function or str or None) – how to collapse data at the worker-firm spell level, must be a valid input for Pandas groupby; if None, column will be dropped during collapse/uncollapse
long_es_split (bool or None) if True, column should split into two when converting from long to event study; if None, column will be dropped when converting between (collapsed) long and (collapsed) –
copy (bool) – if False, avoid copy

Returns

dataframe with new column(s)

Return type

(BipartiteBase)

cluster(params=None, rng=None)

Cluster data and assign a new column giving the cluster for each firm.

Parameters

params (ParamsDict or None) – dictionary of parameters for clustering. Run bpd.cluster_params().describe_all() for descriptions of all valid parameters. None is equivalent to bpd.cluster_params().
rng (np.random.Generator) – NumPy random number generator; None is equivalent to np.random.default_rng(None)

Returns

if silhouette=False, return dataframe with clusters; if silhouette=True, return tuple where first element is dataframe with clusters and second element is NumPy Array of each firm’s silhouette score

Return type

(BipartiteBase or tuple of (BipartiteBase, NumPy Array))

copy(deep=True)

Return copy of self.

Parameters: deep (bool) – make a deep copy, including a copy of the data and the indices. If False, neither the indices nor the data are copied.
Returns: copy of dataframe
Return type: (BipartiteBase)

diagnostic(): Run diagnostic and print diagnostic report.

drop(labels, axis=0, inplace=False, allow_optional=False, allow_required=False, **kwargs)

Drop labels along axis.

Parameters

labels (int or str, optionally as a list) – row(s) or column(s) to drop. For columns, use general column names for joint columns, e.g. put ‘g’ instead of ‘g1’, ‘g2’. Only user-added columns may be dropped, unless allow_optional or allow_required is set to True.
axis (int or str) – whether to drop labels from the ‘index’ (0) or ‘columns’ (1)
inplace (bool) – if True, modify in-place
allow_optional (bool) – if True, allow to drop optional columns
allow_required (bool) – if True, allow to drop required columns
**kwargs – keyword arguments for Pandas drop

Returns

dataframe without dropped labels

Return type

(BipartiteBase)

drop_rows(rows, drop_returns_to_stays=False, is_sorted=False, reset_index=True, copy=True)

Drop particular rows.

Parameters

rows (list) – rows to keep
drop_returns_to_stays (bool) – If True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe is not guaranteed to be sorted if original dataframe is not sorted for long and collapsed long formats, but is guaranteed to be sorted for event study and collapsed event study formats. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
reset_index (bool) – if True, reset index at end
copy (bool) – if False, avoid copy

Returns

dataframe with given rows dropped

Return type

(BipartiteBase)

get_column_properties(col_name)

Return dictionary linking properties to their value for a particular column.

Parameters: col_name (str) – general column name whose properties will be printed
Returns: dictionary linking properties to their value for a particular column (‘general_column’: general column name; ‘subcolumns’: subcolumns linked to general column; ‘dtype’: column datatype; ‘is_categorical’: column is categorical; ‘how_collapse’: how to collapse at the worker-firm spell level (None if dropped during collapse); ‘long_es_split’: whether column should split into two columns when converting between long and event study formats (None if dropped during conversion))
Return type: (dict)

log(message, level='info')

Log a message at the specified level.

Parameters

message (str) – message to log
level (str) – logger level. Options, in increasing severity, are ‘debug’, ‘info’, ‘warning’, ‘error’, and ‘critical’.

log_on(on=True)

Toggle logger on or off.

Parameters: on (bool) – if True, turn logger on; if False, turn logger off

merge(*args, **kwargs)

Merge two BipartiteBase objects.

Parameters

*args – arguments for Pandas merge
**kwargs – keyword arguments for Pandas merge

Returns

merged dataframe

Return type

(BipartiteBase)

min_movers_firms(threshold=15, is_sorted=False, copy=True)

List firms with at least threshold many movers.

Parameters

threshold (int) – minimum number of movers required to keep a firm
is_sorted (bool) – used for event study format. If False, dataframe will be sorted by i (and t, if included). Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – used for event study format. If False, avoid copy.

Returns

firms with sufficiently many movers

Return type

(NumPy Array)

n_clusters()

Get the number of unique clusters.

Returns: number of unique clusters if cluster column included; None otherwise
Return type: (int or None)

n_firms()

Get the number of unique firms.

Returns: number of unique firms
Return type: (int)

n_unique_ids(id_col)

Number of unique ids in column.

Parameters: id_col (str) – column to check ids (‘i’, ‘j’, or ‘g’). Use general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’.
Returns: number of unique ids if column included; None otherwise
Return type: (int or None)

n_workers()

Get the number of unique workers.

Returns: number of unique workers
Return type: (int)

original_ids(copy=True)

Return self merged with original column ids.

Parameters: copy (bool) – if False, avoid copy
Returns: copy of dataframe merged with original column ids, or None if id_reference_dict is empty
Return type: (BipartiteBase or None)

print_column_properties(col_name)

Print properties associated with a particular column.

Parameters: col_name (str) – general column name whose properties will be printed

rename(rename_dict, axis=0, inplace=False, allow_optional=False, allow_required=False, **kwargs)

Rename a column.

Parameters

rename_dict (dict) – key is current label, value is new label. When renaming columns, use general column names for joint columns, e.g. put ‘g’ instead of ‘g1’, ‘g2’.
axis (int or str) – whether to drop labels from the ‘index’ (0) or ‘columns’ (1)
inplace (bool) – if True, modify in-place
allow_optional (bool) – if True, allow to rename optional columns
allow_required (bool) – if True, allow to rename required columns
**kwargs – keyword arguments for Pandas rename

Returns

dataframe with renamed labels

Return type

(BipartiteBase)

set_column_properties(col_name, is_categorical=False, dtype='any', how_collapse='first', long_es_split=True, copy=True)

Safe method for setting the properties of pre-existing custom columns.

Parameters

col_name (str) – general column name
is_categorical (bool) – if True, column is categorical
dtype (str) – column datatype, must be one of ‘int’, ‘float’, ‘any’, or ‘categorical’
how_collapse (function or str or None) – how to collapse data at the worker-firm spell level, must be a valid input for Pandas groupby; if None, column will be dropped during collapse/uncollapse
long_es_split (bool or None) if True, column should split into two when converting from long to event study; if None, column will be dropped when converting between (collapsed) long and (collapsed) –
copy (bool) – if False, avoid copy

Returns

dataframe with new column(s)

Return type

(BipartiteBase)

sort_cols(copy=True)

Sort frame columns (not in-place).

Parameters: copy (bool) – if False, avoid copy
Returns: dataframe with sorted columns
Return type: (BipartiteBase)

sort_rows(j_if_no_t=True, is_sorted=False, copy=True)

Sort rows by i and t.

Parameters

j_if_no_t (bool) – if no time column, sort on i and j columns instead
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – if False, avoid copy

Returns

dataframe with rows sorted

Return type

(BipartiteBase)

summary(): Print summary statistics. This uses class attributes. To run a diagnostic to verify these values, run .diagnostic().

unique_ids(id_col)

Unique ids in column.

Parameters: id_col (str) – column to check ids (‘i’, ‘j’, or ‘g’). Use general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’
Returns: unique ids if column included; None otherwise
Return type: (NumPy Array or None)