Simulating and estimating teh interacted model
require(rblm)
require(knitr)
require(kableExtra)
options(knitr.table.format = "html")
The interacted model has the following equations:
\[ y_{it} = a_t(k_{it}) + b_t(k_{it}) \cdot \alpha_i + \epsilon_{it}\] where in the model we define the average worker quality conditional on a firm type:
\[ \text{Em} = E[ \alpha_i |k,t=1 ] \]
and the average wroker quality conditional on a job change: \[ \text{EEm} = E[ \alpha_i |k_1{=}k,k_2{=}k',m{=}1 ] \]
set.seed(324313)
model = m2.mini.new(10,serial = F,fixb=T)
# we set the parameters to something simple
model$A1 = seq(0,2,l=model$nf) # setting increasing intercepts
model$B1 = seq(1,2,l=model$nf) # adding complementarity (increasing interactions)
model$Em = seq(0,1,l=model$nf) # adding sorting (mean type of workers is increasing in k)
# we make the model stationary (same in both periods)
model$A2 = model$A1
model$B2 = model$B1
# setting the number of movers and stayers
model$Ns = array(300000/model$nf,model$nf)
model$Nm = 10*toeplitz(ceiling(seq(100,10,l=model$nf)))
# creating a simulated data set
ad = m2.mini.simulate(model)
## INFO [2018-12-19 23:34:26] computing var decomposition with ns=817440 nm=182550
## cor_kl cov_kl var_k var_l rsq
## 1 0.2353 0.1394 0.7423 0.1183 0.7891
# plot firm size distribution
ggplot(ad$sdata[,.N,f1],aes(x=N)) + geom_histogram(binwidth=1) + theme_bw()
We start by extracting the measures that will be used to cluster
ms = grouping.getMeasures(ad,"ecdf",Nw=20,y_var = "y1")
## INFO [2018-12-19 23:34:37] processing 3000 firms
## INFO [2018-12-19 23:34:37] computing measures...
## INFO [2018-12-19 23:34:37] computing weights...
# then we group we choose k=10
grps = grouping.classify.once(ms,k = 10,nstart = 1000,iter.max = 200,step=250)
## INFO [2018-12-19 23:34:37] clustering T=101.015687, Nw=20 , measure=ecdf
## INFO [2018-12-19 23:34:37] running weigthed kmeans step=250 total=1000
## INFO [2018-12-19 23:34:37] nobs=3000 nmeasures=20
## INFO [2018-12-19 23:34:39] [25%] tot=358.144483 best=358.144483 <<<<
## INFO [2018-12-19 23:34:41] [50%] tot=358.144483 best=358.144483
## INFO [2018-12-19 23:34:42] [75%] tot=358.144483 best=358.144483
## INFO [2018-12-19 23:34:43] [100%] tot=358.144483 best=358.144483
## INFO [2018-12-19 23:34:43] k=10 WSS=358.144483 nstart=1000 nfrims=3000
# finally we append the results to adata
ad = grouping.append(ad,grps$best_cluster,drop=T)
# we can also check the classification
ggplot(ad$sdata[,.N,list(j1,j1true)],aes(x=j1true,y=j1,size=N)) + geom_point() + theme_bw() +
scale_x_continuous("true type") + scale_y_continuous("estimated group")
In the previous command we tell rblm that we want to use the firm specific empirical measure “ecdf” with 20 points of supports and that the dependent variable is “y1”. The firm identifier should be “f1”.
This is a relatively old code, we will be pushing an update extremely soon!
res = m2.mini.estimate(ad$jdata,ad$sdata,model0 = model,method = "fixb")
## INFO [2018-12-19 23:34:45] Beginning m2.mini.estimate with method fixb
## INFO [2018-12-19 23:34:46] getting worker compositions in movers
## INFO [2018-12-19 23:34:46] getting residual wage variances in movers
## INFO [2018-12-19 23:34:46] getting residual wage variances in stayers
## INFO [2018-12-19 23:34:46] computing wage variance decomposition
## INFO [2018-12-19 23:34:46] computing var decomposition with ns=817439 nm=182564
## cor_kl cov_kl var_k var_l rsq
## 1 0.2445 0.1416 0.7463 0.1122 0.7914
## cor with true model:0.954619
We can show the decompositions next to each other:
kable(rbind(model$vdec$stats,res$vdec$stats),digits = 4) %>%
kable_styling(bootstrap_options = c("striped", "bordered","condensed"), full_width = F)
cor_kl | cov_kl | var_k | var_l | rsq |
---|---|---|---|---|
0.2445 | 0.1416 | 0.7463 | 0.1122 | 0.7914 |
And we can plot the resulting wage, next to the true ones
m2.mini.plotw(res) + ggtitle("Estimated wages")
m2.mini.plotw(model) + ggtitle("Model wages")
This is a new estimator we have been working on.
We start by extracting all necesserary moments for off-line estimation:
mstats = ad$jdata[,list(m1=mean(y1),sd1=sd(y1),
m2=mean(y2),sd2=sd(y2),
v12 = cov(y1,y2),.N),list(j1,j2)]
cstats = ad$sdata[,list(m1=mean(y1),sd1=sd(y1),
m2=mean(y2),sd2=sd(y2),
v12 = cov(y1,y2),.N),list(j1)]
Then we estimate using the quasi-likelihood estimator:
res2 = m2.minirc.estimate(cstats,mstats,method = 2)
here is the variance decomposition:
dd = rbind(model$vdec$stats,res$vdec$stats,res2$vdec$stats)
kable(dd,digits = 4) %>%
kable_styling(bootstrap_options = c("striped", "bordered","condensed"), full_width = F)
cor_kl | cov_kl | var_k | var_l | rsq |
---|---|---|---|---|
0.2445 | 0.1416 | 0.7463 | 0.1122 | 0.7914 |
0.2399 | 0.1424 | 0.7382 | 0.1194 | 0.7742 |