{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Attrition example" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Add PyTwoWay to system path (do not run this)\n", "# import sys\n", "# sys.path.append('../../..')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import the PyTwoWay package\n", "\n", "Make sure to install it using `pip install pytwoway`." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2021-01-15T23:38:19.123052Z", "start_time": "2021-01-15T23:38:18.565950Z" } }, "outputs": [], "source": [ "import pytwoway as tw\n", "import bipartitepandas as bpd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## First, check out parameter options\n", "\n", "Do this by running:\n", "\n", "- FE - `tw.fe_params().describe_all()`\n", "\n", "- CRE - `tw.cre_params().describe_all()`\n", "\n", "- Clustering - `bpd.cluster_params().describe_all()`\n", "\n", "- Cleaning - `bpd.clean_params().describe_all()`\n", "\n", "- Simulating - `bpd.sim_params().describe_all()`\n", "\n", "Alternatively, run `x_params().keys()` to view all the keys for a parameter dictionary, then `x_params().describe(key)` to get a description for a single key." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Second, set parameter choices\n", "\n", "Note that we set `copy=False` in `clean_params` to avoid unnecessary copies (although this will modify the original dataframe)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# FE\n", "fe_params = tw.fe_params(\n", " {\n", " 'ndraw_trace_ho': 10,\n", " 'ndraw_trace_he': 20\n", " }\n", ")\n", "# Cleaning\n", "clean_params = bpd.clean_params(\n", " {\n", " 'connectedness': None,\n", " 'drop_returns': 'returners',\n", " 'copy': False\n", " }\n", ")\n", "# Simulating\n", "sim_params = bpd.sim_params(\n", " {\n", " 'n_workers': 8000,\n", " 'firm_size': 20,\n", " 'alpha_sig': 2, 'w_sig': 2,\n", " 'c_sort': 1.5, 'c_netw': 1.5,\n", " 'p_move': 0.2\n", " }\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Third, extract data (we simulate for the example)\n", "\n", "`BipartitePandas` contains the class `SimBipartite` which we use here to simulate a bipartite network. If you have your own data, you can import it during this step. Load it as a `Pandas DataFrame` and then convert it into a `BipartitePandas DataFrame` in the next step." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "sim_data = bpd.SimBipartite(sim_params).simulate()[['i', 'j', 'y', 't']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fourth, prepare data\n", "\n", "This is exactly how you should prepare real data prior to running the FE estimator.\n", "\n", "- First, we convert the data into a `BipartitePandas DataFrame`\n", "\n", "- Second, we clean the data (e.g. drop NaN observations, make sure firm and worker ids are contiguous, etc.)\n", "\n", "- Third, we collapse the data at the worker-firm spell level (taking mean wage over the spell)\n", "\n", "Further details on `BipartitePandas` can be found in the package documentation, available [here](https://tlamadon.github.io/bipartitepandas/).\n", "\n", "