{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Sorkin example" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Add PyTwoWay to system path (do not run this)\n", "# import sys\n", "# sys.path.append('../../..')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import the PyTwoWay package\n", "\n", "Make sure to install it using `pip install pytwoway`." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2021-01-15T23:38:19.123052Z", "start_time": "2021-01-15T23:38:18.565950Z" } }, "outputs": [], "source": [ "from pandas import Series\n", "import pytwoway as tw\n", "import bipartitepandas as bpd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## First, check out parameter options\n", "\n", "Do this by running:\n", "\n", "- Cleaning - `bpd.clean_params().describe_all()`\n", "\n", "- Simulating - `bpd.sim_params().describe_all()`\n", "\n", "Alternatively, run `x_params().keys()` to view all the keys for a parameter dictionary, then `x_params().describe(key)` to get a description for a single key." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Second, set parameter choices\n", "\n", "
\n", "\n", "Note\n", "\n", "The Sorkin estimator requires a strongly connected set of firms, so we set `connectedness='strongly_connected'` in `clean_params`.\n", "\n", "
\n", "\n", "
\n", "\n", "Note\n", "\n", "We set `copy=False` in `clean_params` to avoid unnecessary copies (although this may modify the original dataframe).\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Cleaning\n", "clean_params = bpd.clean_params(\n", " {\n", " 'connectedness': 'strongly_connected',\n", " 'drop_single_stayers': True,\n", " 'drop_returns': 'returners',\n", " 'copy': False\n", " }\n", ")\n", "# Simulating\n", "sim_params = bpd.sim_params(\n", " {\n", " 'n_workers': 1000,\n", " 'firm_size': 5,\n", " 'alpha_sig': 2, 'w_sig': 2,\n", " 'c_sort': 1.5, 'c_netw': 1.5,\n", " 'p_move': 0.1\n", " }\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Third, extract data (we simulate for the example)\n", "\n", "`BipartitePandas` contains the class `SimBipartite` which we use here to simulate a bipartite network. If you have your own data, you can import it during this step. Load it as a `Pandas DataFrame` and then convert it into a `BipartitePandas DataFrame` in the next step." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "sim_data = bpd.SimBipartite(sim_params).simulate()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fourth, prepare data\n", "\n", "This is exactly how you should prepare real data prior to running the Sorkin estimator.\n", "\n", "- First, we convert the data into a `BipartitePandas DataFrame`\n", "\n", "- Second, we clean the data (e.g. drop NaN observations, make sure firm and worker ids are contiguous, construct the strongly connected set, etc.)\n", "\n", "- Third, we collapse the data at the worker-firm spell level (take mean wage over the spell)\n", "\n", "- Fourth, we convert the data into event study format\n", "\n", "Further details on `BipartitePandas` can be found in the package documentation, available [here](https://tlamadon.github.io/bipartitepandas/)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "checking required columns and datatypes\n", "sorting rows\n", "dropping NaN observations\n", "generating 'm' column\n", "keeping highest paying job for i-t (worker-year) duplicates (how='max')\n", "dropping workers who leave a firm then return to it (how='returners')\n", "making 'i' ids contiguous\n", "making 'j' ids contiguous\n", "computing largest connected set (how='strongly_connected')\n", "making 'i' ids contiguous\n", "making 'j' ids contiguous\n", "sorting columns\n", "resetting index\n" ] } ], "source": [ "# Convert into BipartitePandas DataFrame\n", "bdf = bpd.BipartiteDataFrame(sim_data)\n", "# Clean\n", "bdf = bdf.clean(clean_params)\n", "# Collapse\n", "bdf = bdf.collapse(is_sorted=True, copy=False)\n", "# Convert to event study format\n", "bdf = bdf.to_eventstudy(is_sorted=True, copy=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fifth, initialize and run the estimator" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Initialize Sorkin estimator\n", "sorkin_estimator = tw.SorkinEstimator()\n", "# Fit Sorkin estimator\n", "sorkin_estimator.fit(bdf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Finally, investigate the results\n", "\n", "Estimated firm values are stored in the class attribute `.V_EE`." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2020-12-22T21:42:51.498849Z", "start_time": "2020-12-22T21:42:51.489723Z" }, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "0 -5.642380\n", "1 -6.047845\n", "2 -4.543767\n", "3 -5.642380\n", "4 -4.949233\n", " ... \n", "102 -5.354698\n", "103 -5.642380\n", "104 -6.335527\n", "105 -5.354698\n", "106 -4.949233\n", "Length: 107, dtype: float64" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(Series(sorkin_estimator.V_EE))" ] } ], "metadata": { "celltoolbar": "Tags", "hide_input": false, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.2" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": false, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "nbsphinx-toctree": { "hidden": true, "maxdepth": 1, "titlesonly": true }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }