BipartiteBase class
- class bipartitepandas.bipartitebase.BipartiteBase(*args, columns_req=None, columns_opt=None, columns_contig=None, col_reference_dict=None, col_dtype_dict=None, col_collapse_dict=None, col_long_es_dict=None, track_id_changes=False, log=False, **kwargs)
Bases:
DataFrame
Base class for BipartitePandas, where BipartitePandas gives a bipartite network of firms and workers. Contains generalized methods. Inherits from DataFrame.
- Parameters
*args – arguments for Pandas DataFrame
columns_req (list or None) – required columns (only put general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’; then put the joint columns in col_reference_dict); None is equivalent to []
columns_opt (list or None) – optional columns (only put general column names for joint columns, e.g. put ‘g’ instead of ‘g1’, ‘g2’; then put the joint columns in col_reference_dict); None is equivalent to []
columns_contig (dict or None) – columns of categorical ids linked to boolean of whether those ids are contiguous, or None if column(s) not included, e.g. {‘i’: False, ‘j’: False, ‘g’: None} (only put general column names for joint columns); None is equivalent to {}
col_reference_dict (dict or None) – clarify which joint columns are associated with a general column name, e.g. {‘i’: ‘i’, ‘j’: [‘j1’, ‘j2’]}; None is equivalent to {}
col_dtype_dict (dict or None) – link column to datatype, e.g. {‘m’: ‘int’}; None is equivalent to {}
col_collapse_dict (dict or None) – how to collapse column (None indicates the column should be dropped), e.g. {‘y’: ‘mean’}; None is equivalent to {}
col_long_es_dict (dict or None) – whether each column should split into two when converting from long to event study (None indicates the column should be dropped), e.g. {‘y’: True, ‘m’: None}; None is equivalent to {}
track_id_changes (bool) – if True, create dictionary of Pandas dataframes linking original categorical id values to updated contiguous id values
log (bool) – if True, will create log file(s)
**kwargs – keyword arguments for Pandas DataFrame
- add_column(col_name, col_data=None, col_reference=None, is_categorical=False, dtype='any', how_collapse='first', long_es_split=True, copy=True)
Safe method for adding custom columns. Columns added with this method will be compatible with conversions between long, collapsed long, event study, and collapsed event study formats.
- Parameters
col_name (str) – general column name
col_data (NumPy Array or Pandas Series or list of (NumPy Array or Pandas Series) or None) – data for column, or list of data for columns; set to None if columns already added to dataframe via column assignment
col_reference (str or list of str) – if column has multiple subcolumns (e.g. firm ids are associated with the columns [‘j1’, ‘j2’]) this must be specified; otherwise, None will automatically default to the column name (plus a column number, if more than one column is listed) (e.g. firm ids are associated with the column ‘j’ if one column is included, or [‘j1’, ‘j2’] if two columns are included)
is_categorical (bool) – if True, column is categorical
dtype (str) – column datatype, must be one of ‘int’, ‘float’, ‘any’, or ‘categorical’
how_collapse (function or str or None) – how to collapse data at the worker-firm spell level, must be a valid input for Pandas groupby; if None, column will be dropped during collapse/uncollapse
long_es_split (bool or None) if True, column should split into two when converting from long to event study; if None, column will be dropped when converting between (collapsed) long and (collapsed) –
copy (bool) – if False, avoid copy
- Returns
dataframe with new column(s)
- Return type
- cluster(params=None, rng=None)
Cluster data and assign a new column giving the cluster for each firm.
- Parameters
params (ParamsDict or None) – dictionary of parameters for clustering. Run bpd.cluster_params().describe_all() for descriptions of all valid parameters. None is equivalent to bpd.cluster_params().
rng (np.random.Generator) – NumPy random number generator; None is equivalent to np.random.default_rng(None)
- Returns
if silhouette=False, return dataframe with clusters; if silhouette=True, return tuple where first element is dataframe with clusters and second element is NumPy Array of each firm’s silhouette score
- Return type
(BipartiteBase or tuple of (BipartiteBase, NumPy Array))
- copy(deep=True)
Return copy of self.
- Parameters
deep (bool) – make a deep copy, including a copy of the data and the indices. If False, neither the indices nor the data are copied.
- Returns
copy of dataframe
- Return type
- diagnostic()
Run diagnostic and print diagnostic report.
- drop(labels, axis=0, inplace=False, allow_optional=False, allow_required=False, **kwargs)
Drop labels along axis.
- Parameters
labels (int or str, optionally as a list) – row(s) or column(s) to drop. For columns, use general column names for joint columns, e.g. put ‘g’ instead of ‘g1’, ‘g2’. Only user-added columns may be dropped, unless allow_optional or allow_required is set to True.
axis (int or str) – whether to drop labels from the ‘index’ (0) or ‘columns’ (1)
inplace (bool) – if True, modify in-place
allow_optional (bool) – if True, allow to drop optional columns
allow_required (bool) – if True, allow to drop required columns
**kwargs – keyword arguments for Pandas drop
- Returns
dataframe without dropped labels
- Return type
- drop_rows(rows, drop_returns_to_stays=False, is_sorted=False, reset_index=True, copy=True)
Drop particular rows.
- Parameters
rows (list) – rows to keep
drop_returns_to_stays (bool) – If True, when recollapsing collapsed data, drop observations that need to be recollapsed instead of collapsing (this is for computational efficiency when re-collapsing data for leave-one-out connected components, where intermediate observations can be dropped, causing a worker who returns to a firm to become a stayer)
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe is not guaranteed to be sorted if original dataframe is not sorted for long and collapsed long formats, but is guaranteed to be sorted for event study and collapsed event study formats. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
reset_index (bool) – if True, reset index at end
copy (bool) – if False, avoid copy
- Returns
dataframe with given rows dropped
- Return type
- get_column_properties(col_name)
Return dictionary linking properties to their value for a particular column.
- Parameters
col_name (str) – general column name whose properties will be printed
- Returns
dictionary linking properties to their value for a particular column (‘general_column’: general column name; ‘subcolumns’: subcolumns linked to general column; ‘dtype’: column datatype; ‘is_categorical’: column is categorical; ‘how_collapse’: how to collapse at the worker-firm spell level (None if dropped during collapse); ‘long_es_split’: whether column should split into two columns when converting between long and event study formats (None if dropped during conversion))
- Return type
(dict)
- log(message, level='info')
Log a message at the specified level.
- Parameters
message (str) – message to log
level (str) – logger level. Options, in increasing severity, are ‘debug’, ‘info’, ‘warning’, ‘error’, and ‘critical’.
- log_on(on=True)
Toggle logger on or off.
- Parameters
on (bool) – if True, turn logger on; if False, turn logger off
- merge(*args, **kwargs)
Merge two BipartiteBase objects.
- Parameters
*args – arguments for Pandas merge
**kwargs – keyword arguments for Pandas merge
- Returns
merged dataframe
- Return type
- min_movers_firms(threshold=15, is_sorted=False, copy=True)
List firms with at least threshold many movers.
- Parameters
threshold (int) – minimum number of movers required to keep a firm
is_sorted (bool) – used for event study format. If False, dataframe will be sorted by i (and t, if included). Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – used for event study format. If False, avoid copy.
- Returns
firms with sufficiently many movers
- Return type
(NumPy Array)
- n_clusters()
Get the number of unique clusters.
- Returns
number of unique clusters if cluster column included; None otherwise
- Return type
(int or None)
- n_firms()
Get the number of unique firms.
- Returns
number of unique firms
- Return type
(int)
- n_unique_ids(id_col)
Number of unique ids in column.
- Parameters
id_col (str) – column to check ids (‘i’, ‘j’, or ‘g’). Use general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’.
- Returns
number of unique ids if column included; None otherwise
- Return type
(int or None)
- n_workers()
Get the number of unique workers.
- Returns
number of unique workers
- Return type
(int)
- original_ids(copy=True)
Return self merged with original column ids.
- Parameters
copy (bool) – if False, avoid copy
- Returns
copy of dataframe merged with original column ids, or None if id_reference_dict is empty
- Return type
(BipartiteBase or None)
- print_column_properties(col_name)
Print properties associated with a particular column.
- Parameters
col_name (str) – general column name whose properties will be printed
- rename(rename_dict, axis=0, inplace=False, allow_optional=False, allow_required=False, **kwargs)
Rename a column.
- Parameters
rename_dict (dict) – key is current label, value is new label. When renaming columns, use general column names for joint columns, e.g. put ‘g’ instead of ‘g1’, ‘g2’.
axis (int or str) – whether to drop labels from the ‘index’ (0) or ‘columns’ (1)
inplace (bool) – if True, modify in-place
allow_optional (bool) – if True, allow to rename optional columns
allow_required (bool) – if True, allow to rename required columns
**kwargs – keyword arguments for Pandas rename
- Returns
dataframe with renamed labels
- Return type
- set_column_properties(col_name, is_categorical=False, dtype='any', how_collapse='first', long_es_split=True, copy=True)
Safe method for setting the properties of pre-existing custom columns.
- Parameters
col_name (str) – general column name
is_categorical (bool) – if True, column is categorical
dtype (str) – column datatype, must be one of ‘int’, ‘float’, ‘any’, or ‘categorical’
how_collapse (function or str or None) – how to collapse data at the worker-firm spell level, must be a valid input for Pandas groupby; if None, column will be dropped during collapse/uncollapse
long_es_split (bool or None) if True, column should split into two when converting from long to event study; if None, column will be dropped when converting between (collapsed) long and (collapsed) –
copy (bool) – if False, avoid copy
- Returns
dataframe with new column(s)
- Return type
- sort_cols(copy=True)
Sort frame columns (not in-place).
- Parameters
copy (bool) – if False, avoid copy
- Returns
dataframe with sorted columns
- Return type
- sort_rows(j_if_no_t=True, is_sorted=False, copy=True)
Sort rows by i and t.
- Parameters
j_if_no_t (bool) – if no time column, sort on i and j columns instead
is_sorted (bool) – if False, dataframe will be sorted by i (and t, if included). Returned dataframe will be sorted. Sorting may alter original dataframe if copy is set to False. Set is_sorted to True if dataframe is already sorted.
copy (bool) – if False, avoid copy
- Returns
dataframe with rows sorted
- Return type
- summary()
Print summary statistics. This uses class attributes. To run a diagnostic to verify these values, run .diagnostic().
- unique_ids(id_col)
Unique ids in column.
- Parameters
id_col (str) – column to check ids (‘i’, ‘j’, or ‘g’). Use general column names for joint columns, e.g. put ‘j’ instead of ‘j1’, ‘j2’
- Returns
unique ids if column included; None otherwise
- Return type
(NumPy Array or None)