# PyTwoWay

PyTwoWay is the Python package associated with the following paper:

“How Much Should we Trust Estimates of Firm Effects and Worker Sorting?” by Stéphane Bonhomme, Kerstin Holzheu, Thibaut Lamadon, Elena Manresa, Magne Mogstad, and Bradley Setzler. No. w27368. National Bureau of Economic Research, 2020.

The package provides implementations for a series of estimators for models with two sided heterogeneity:

two way fixed effect estimator as proposed by Abowd, Kramarz, and Margolis

homoskedastic bias correction as in Andrews, et al.

heteroskedastic bias correction as in Kline, Saggio, and Sølvsten

group fixed estimator as in Bonhomme, Lamadon, and Manresa

group correlated random effect as presented in the main paper

fixed-point revealed preference estimator as in Sorkin

estimator as in Borovičková and Shimer for a modified definition of sorting

If you want to give it a try, you can start an example notebook for the FE estimator here: for the CRE estimator here: for the BLM estimator here: for the Sorkin estimator here: and for the Borovickova-Shimer estimator here: . These start fully interactive notebooks with simple examples that simulate data and run the estimators.

The package provides a Python interface. Installation is handled by pip or Conda (TBD). The source of the package is available on GitHub at PyTwoWay. The online documentation is hosted here.

The code is relatively efficient. A benchmark below compares PyTwoWay’s speed with that of LeaveOutTwoWay, a MATLAB package for estimating AKM and its bias corrections.

# Quick Start

To install via pip, from the command line run:

```
pip install pytwoway
```

To make sure you are running the most up-to-date version of PyTwoWay, from the command line run:

```
pip install --upgrade pytwoway
```

Please DO NOT download the Conda version of the package, as it is outdated!

# Help with Running the Package

Please check out the documentation for detailed examples of how to use PyTwoWay. If you have a question that the documentation doesn’t answer, please also check the past Issues to see if someone else has already asked this question and an answer has been provided. If you still can’t find an answer, please open a new Issue and we will try to answer as quickly as possible.

# Benchmarking

Data is simulated from BipartitePandas using the following code:

```
import numpy as np
import bipartitepandas as bpd
sim_params = bpd.sim_params({'n_workers': 500000, 'firm_size': 10, 'p_move': 0.05})
rng = np.random.default_rng(1234)
sim_data = bpd.SimBipartite(sim_params).simulate(rng)
```

This data is then estimated using the PyTwoWay class FEEstimator and using the MATLAB package LeaveOutTwoWay. For estimation using PyTwoWay, all estimators other than AMG use the incomplete Cholesky decomposition as a preconditioner.

Results are estimated on a 2021 MacBook Pro 14” with 16 GB Ram and an Apple M1 Pro processor with 8 cores.

Some summary statistics about the largest leave-one-match-out set:

Package |
#obs |
#firms |
#movers |
---|---|---|---|

KSS |
2,255,370 |
44,510 |
88,542 |

PyTwoWay |
2,269,665 |
44,601 |
89,098 |

Run time:

Solver |
Cleaning |
Estimation |
Total |
---|---|---|---|

KSS |
N/A |
N/A |
55.2s |

PYTW-AMG |
4.0s |
3m2s |
3m6s |

PYTW-BICG |
4.0s |
20.4s |
24.4s |

PYTW-BICGSTAB |
4.0s |
21.9s |
25.9s |

PYTW-CG |
4.0s |
19.6s |
23.6s |

PYTW-CGS |
4.0s |
20.6s |
24.6s |

PYTW-GMRES |
4.0s |
32.9s |
36.9s |

PYTW-MINRES |
4.0s |
10.7s |
14.7s |

PYTW-QMR |
4.0s |
3m53s |
3m57s |

# Contributing to the Package

If you want to contribute to the package, the easiest way is to test that it’s working properly! If you notice a part of the package is giving incorrect results, please add a new post in Issues and we will do our best to fix it as soon as possible.

We are also happy to consider any suggestions to improve the package and documentation, whether to add a new feature, make a feature more user-friendly, or make the documentation clearer. Please also post suggestions in Issues.

Finally, if you would like to help with developing the package, please make a fork of the repository and submit pull requests with any changes you make! These will be promptly reviewed, and hopefully accepted!

We are extremely grateful for all contributions made by the community!

# Dependencies

Solving large sparse linear models relies on a combination of PyAMG (this is the package we use to estimate the different decompositions on US data) and SciPy’s iterative sparse linear solvers.

Many tools for handling sparse matrices come from SciPy.

Additional preconditioners for linear solvers come from PyMatting (installing the package is not required, as the necessary files have been copied into the submodule preconditioners). The incomplete Cholesky preconditioner in turn relies on Numba.

Constrained optimization is handled by QPSolvers.

Progress bars are generated with tqdm.

Parameter dictionaries are constructed using ParamsDict.

Data cleaning is handled by BipartitePandas.

We also rely on a number of standard libraries, such as NumPy, Pandas, matplotlib, etc.

Optionally, the code is compatible with: - multiprocess. Installing this may help if multiprocessing is raising errors related to pickling objects. - PyTorch. This may speed up BLM estimation, and adds the option to compute some operations using the GPU.

# Citation

Please use following citation to cite PyTwoWay in academic publications:

Bibtex entry:

```
@techreport{bhlmms2020,
title={How Much Should We Trust Estimates of Firm Effects and Worker Sorting?},
author={Bonhomme, St{\'e}phane and Holzheu, Kerstin and Lamadon, Thibaut and Manresa, Elena and Mogstad, Magne and Setzler, Bradley},
year={2020},
institution={National Bureau of Economic Research}
}
```