Measures module

Classes for computing cluster measures. Note: use classes rather than nested functions because nested functions cannot be pickled (source: https://stackoverflow.com/a/12022055/17333120).

class bipartitepandas.measures.measures.CDFs(cdf_resolution=10, measure='quantile_all', outcome_col='y')

Bases: object

Generate cdfs of compensation for firms. Used for clustering.

Parameters
  • cdf_resolution (int) – how many values to use to approximate the cdfs

  • measure (str) – how to compute the cdfs (‘quantile_all’ to get quantiles from entire set of data, then have firm-level values between 0 and 1; ‘quantile_firm’ to get quantiles at the firm-level and have values be compensations)

  • outcome_col (str) – outcome_col column to use for data

class bipartitepandas.measures.measures.Moments(measures='mean', outcome_col='y')

Bases: object

Generate compensation moments for firms. Used for clustering.

Parameters
  • measures (str or list of str) – how to compute the measures (‘mean’ to compute average income within each firm; ‘var’ to compute variance of income within each firm; ‘max’ to compute max income within each firm; ‘min’ to compute min income within each firm)

  • outcome_col (str) – outcome_col column to use for data