{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Borovickova-Shimer example" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Add PyTwoWay to system path (do not run this)\n", "# import sys\n", "# sys.path.append('../../..')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import the PyTwoWay package\n", "\n", "Make sure to install it using `pip install pytwoway`." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2021-01-15T23:38:19.123052Z", "start_time": "2021-01-15T23:38:18.565950Z" } }, "outputs": [], "source": [ "from pandas import Series\n", "import pytwoway as tw\n", "import bipartitepandas as bpd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## First, check out parameter options\n", "\n", "Do this by running:\n", "\n", "- Cleaning - `bpd.clean_params().describe_all()`\n", "\n", "- Simulating - `tw.sim_bs_params().describe_all()`\n", "\n", "Alternatively, run `x_params().keys()` to view all the keys for a parameter dictionary, then `x_params().describe(key)` to get a description for a single key." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Second, set parameter choices\n", "\n", "
\n", "\n", "Note\n", "\n", "We specify `connectedness=strongly_connected` in `clean_params` because we need to compute the strongly connected set of firms to estimate the Borovickova-Shimer estimator.\n", "\n", "
\n", "\n", "
\n", "\n", "Note\n", "\n", "We set `copy=False` in `clean_params` to avoid unnecessary copies (although this may modify the original dataframe).\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Cleaning\n", "clean_params_1 = bpd.clean_params(\n", " {\n", " 'drop_single_stayers': True,\n", " 'drop_returns': 'returns',\n", " 'copy': False,\n", " 'verbose': False\n", " }\n", ")\n", "clean_params_2 = bpd.clean_params(\n", " {\n", " 'is_sorted': True,\n", " 'copy': False,\n", " 'verbose': False\n", " }\n", ")\n", "# Simulating\n", "sim_params = tw.sim_bs_params(\n", " {\n", " 'n_workers': 10000,\n", " 'n_firms': 100,\n", " 'sigma_lambda_sq': 1.25,\n", " 'sigma_mu_sq': 0.75,\n", " 'sigma_wages': 2.5,\n", " 'rho': -0.5\n", " }\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Third, extract data (we simulate for the example)\n", "\n", "`PyTwoWay` contains the class `SimBS` which we use here to simulate from the Borovickova-Shimer dgp. If you have your own data, you can import it during this step. Load it as a `Pandas DataFrame` and then convert it into a `BipartitePandas DataFrame` in the next step." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "sim_data = tw.SimBS(sim_params).simulate()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fourth, prepare data\n", "\n", "This is exactly how you should prepare real data prior to running the Borovickova-Shimer estimator.\n", "\n", "- First, we convert the data into a `BipartitePandas DataFrame`\n", "\n", "- Second, we clean the data (e.g. drop NaN observations, drop returns, make sure firm and worker ids are contiguous, etc.)\n", "\n", "- Third, we collapse the data at the worker-firm spell level (take mean wage over the spell)\n", "\n", "- Fourth, we ensure all firms and workers have at least 2 observations\n", "\n", "- Fifth, we clean up firm and worker ids\n", "\n", "Further details on `BipartitePandas` can be found in the package documentation, available [here](https://tlamadon.github.io/bipartitepandas/)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Convert into BipartitePandas DataFrame\n", "bdf = bpd.BipartiteDataFrame(sim_data)\n", "# Clean\n", "bdf = bdf.clean(clean_params_1)\n", "# Collapse\n", "bdf = bdf.collapse(is_sorted=True, copy=False)\n", "# Make sure all workers and firms have at least 2 observations\n", "bdf = bdf.min_joint_obs_frame(2, 2, 'j', 'i', is_sorted=True, copy=False)\n", "# Clean up worker and firm ids\n", "bdf = bdf.clean(clean_params_2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fifth, initialize and run the estimator\n", "\n", "
\n", "\n", "Note\n", "\n", "We can also fit the alternative estimator by specifying `alternative_estimator=True`.\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Initialize Borovickova-Shimer estimator\n", "bs_estimator = tw.BSEstimator()\n", "# Fit Borovickova-Shimer estimator\n", "bs_estimator.fit(bdf, alternative_estimator=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Finally, investigate the results\n", "\n", "Results correspond to:\n", "\n", "- `y`: income (outcome) column\n", "- `lambda`: worker effects\n", "- `mu`: firm effects" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2020-12-22T21:42:51.498849Z", "start_time": "2020-12-22T21:42:51.489723Z" }, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "{'mean(y)': -0.07862680349885089,\n", " 'var(lambda)': 1.2422121496618905,\n", " 'var(mu)': 0.8050475806005325,\n", " 'cov(lambda, mu)': -0.47932450909729013,\n", " 'corr(lambda, mu)': -0.4793149502917461}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bs_estimator.res" ] } ], "metadata": { "celltoolbar": "Tags", "hide_input": false, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.4" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": false, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "nbsphinx-toctree": { "hidden": true, "maxdepth": 1, "titlesonly": true }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }