clear
cd "/Users/vojtabartos/Documents/4 Teaching/2018 Development Economics LMU/Tutorials/Exercise 1/Data/"
import delimited "pwt61_data.csv", delimiter(";") 
set more off
keep if year==1960 | year==1985
rename ïcountry country

/* Let's define the variables as in MRW: */

* Tease the number of workers from the fact that we have GDP per worker as well as GDP per person
gen workers = rgdpch*pop/rgdpwok

*Y/L: real GDP divided by the working-age population
gen y_l = (cgdp*pop)/workers

* n: average rate of growth of working-age population (annual)
reshape wide pop xrat ppp cgdp cc ci cg p pc pg pi openc cgnp csave y rgdpl rgdpch rgdpeqa rgdpwok rgdptt openk kc kg ki grgdpch y_l workers, i(country) j(year)

* annual growth rate = (end value/initial value)^(1/# of years)-1
* gen n=exp(ln(pop1985/pop1960)/(1985-1960))-1
* or use an approximation for small growth rates: ln(1+g) ≈ g
gen n=ln(workers1985/workers1960)/(1985-1960)

* s: average share of real investment (including government investment) in real GDP
gen s1985=((ci1985+cg1985+ci1960+cg1960)/2)/100

* g and delta: g+delta is assumed to be 0.05
gen g_d=0.05

* generate variables for regressions (log is a natural logarithm, alternative syntax is ln)
gen ln_y_l1960=log(y_l1960)
gen ln_y_l1985=log(y_l1985)
gen ln_s1985=log(s1985)
gen ln_n_g_d=log(n+g_d)

* generate sub-samples
gen intermediate= country=="Algeria" | country=="Argentina" | country=="Australia" | country=="Austria" | country=="Bangladesh" | country=="Belgium" | country=="Bolivia" | country=="Botswana" | country=="Brazil" | country=="Burma" | country=="Cameroon" | country=="Canada" | country=="Chile" | country=="Colombia" | country=="Costa Rica" | country=="Cote d`Ivoire" | country=="Denmark" | country=="Dominican Republic" | country=="Ecuador" | country=="El Salvador" | country=="Ethiopia" | country=="Finland" | country=="France" | country=="Germany" | country=="Greece" | country=="Guatemala" | country=="Haiti" | country=="Honduras" | country=="Hong Kong" | country=="India" | country=="Indonesia" | country=="Ireland" | country=="Israel" | country=="Italy" | country=="Jamaica" | country=="Japan" | country=="Jordan" | country=="Kenya" | country=="Korea, Republic of" | country=="Madagascar" | country=="Malawi" | country=="Malaysia" | country=="Mali" | country=="Mexico" | country=="Morocco" | country=="Netherlands" | country=="New Zealand" | country=="Nicaragua" | country=="Nigeria" | country=="Norway" | country=="Pakistan" | country=="Panama" | country=="Paraguay" | country=="Peru" | country=="Philippines" | country=="Portugal" | country=="Senegal" | country=="Singapore" | country=="South Africa" | country=="Spain" | country=="Sri Lanka" | country=="Sweden" | country=="Switzerland" | country=="Syria" | country=="Tanzania" | country=="Thailand" | country=="Trinidad &Tobago" | country=="Tunisia" | country=="Turkey" | country=="United Kingdom" | country=="United States" | country=="Uruguay" | country=="Venezuela" | country=="Zambia" | country=="Zimbabwe"
gen oecd= country=="Australia" | country=="Austria" | country=="Belgium" | country=="Canada" | country=="Denmark" | country=="Finland" | country=="France" | country=="Germany" | country=="Greece" | country=="Ireland" | country=="Italy" | country=="Japan" | country=="Netherlands" | country=="New Zealand" | country=="Norway" | country=="Portugal" | country=="Spain" | country=="Sweden" | country=="Switzerland" | country=="Turkey" | country=="United Kingdom" | country=="United States"

/*
Importantly, we assume that the countries have reached the steady state in 1985 or that their
deviations from steady state are random with mean zero.
We try to explain cross-country differences by differences in the determinants of the steady 
state: capital accummulation and population growth.

What are the assumptions we have to make so that OLS is a valid estimator?
We assume that the rates of saving and population growth are independent of country-specific 
factors shifting the production function. That is, we assume that s and n are independent of e. 
This assumption implies that we can estimate equation (7) with ordinary least squares (OLS).*/
* UNRESTRICTED MODEL

reg ln_y_l1985 ln_s1985 ln_n_g_d if intermediate
outreg2 using "table1.xls" , replace dec(2) se
test ln_s1985-ln_n_g_d=0
test ln_s1985=0.5
test ln_n_g_d=-0.5

reg ln_y_l1985 ln_s1985 ln_n_g_d if oecd
outreg2
test ln_s1985-ln_n_g_d=0
test ln_s1985=0.5
test ln_n_g_d=-0.5

* RESTRICTED MODEL
* see calculations, the coefficient for ln_sk and ln_n_g_d should have the same size and opposing signs
* thus we impose the following restriction (just rewrite the model)
gen ln_s1985_ln_n_g_d=ln_s1985-ln_n_g_d

reg ln_y_l1985 ln_s1985_ln_n_g_d if intermediate
outreg2
* we can also test for the alpha parameter directly
* we first create a non-linear combination of estimators; recall that beta=alpha/(1-alpha)
* => alpha=beta/(1+beta)
nlcom (alpha: (_b[ln_s1985_ln_n_g_d]/(_b[ln_s1985_ln_n_g_d]+1))), post
* and then we test if alpha differs from 1/3
test _b[alpha] = 1/3

reg ln_y_l1985 ln_s1985_ln_n_g_d if oecd
outreg2
nlcom (alpha: (_b[ln_s1985_ln_n_g_d]/(_b[ln_s1985_ln_n_g_d]+1))), post
test _b[alpha] = 1/3

/*
What can we learn from the alphas?
Because the model predicts that factors are paid their marginal products, it predicts not only the 
signs but also the magnitudes of the coefficients on saving and population growth. Specifically, 
because capital's share in income ($\alpha$) is roughly one third, the model implies an elasticity 
of income per capita with respect to the saving rate of approximately 0.5 and an elasticity with 
respect to $n + g + \delta$ of approximately -0.5.
To see why: (1/3)/(1-1/3)=0.5
[...]
If OLS yields coefficients that are substantially different from these values, then we can reject 
the joint hypothesis that the Solow model and our identifying assumption are correct.*/

********************
* CONVERGENCE
gen ln_y_l_diff=ln_y_l1985-ln_y_l1960

* Unconditional convergence
reg ln_y_l_diff ln_y_l1960 if intermediate
outreg2 using "table2.xls" , replace dec(2) se

reg ln_y_l_diff ln_y_l1960 if oecd
outreg2

* Conditional convergence
/* p. 422: "The Solow model predicts that countries reach different steady states.
 In Section II we argued that much of the cross-country differences in income per
 capita can be traced to differing determi- nants of the steady state in the Solow
 growth model: accumulation of human and physical capital and population growth.
 Thus, the Solow model does not predict convergence; it predicts only that income
 per capita in a given country converges to that country's steady-state value.
 In other words, the Solow model predicts convergence only after controlling for
 the determinants of the steady state, a phenomenon that might be called 
 "conditional convergence.""
*/
* unrestricted
* Model specification in equation 16 (p. 423)
/* Note on the regression: p.424 "Equation (16) has the advantage of explicitly 
taking into account out-of-steady-state dynamics. Yet, implementing equation (16) 
introduces a new problem. If countries have permanent differences in their 
production functions—that is, different A (0)'s—then these A (0)'s would enter 
as part of the error term and would be positively correlated with initial income. 
Hence, variation in A (0) would bias the coefficient on initial income toward 
zero (and would potentially influence the other coefficients as well). In other 
words, permanent cross-country differences in the produc- tion function would 
lead to differences in initial incomes uncorre- lated with subsequent growth 
rates and, therefore, would bias the results against finding convergence."
*/
reg ln_y_l_diff ln_y_l1960 ln_s1985 ln_n_g_d if intermediate
outreg2
* this will be used for the graphs later
predict ln_gdppw_diff_conditional_inter
local r2_inter: display %5.2f e(r2)

reg ln_y_l_diff ln_y_l1960 ln_s1985 ln_n_g_d if oecd
outreg2
* this will be used for the graphs later
predict ln_gdppw_diff_conditional_oecd
local r2_oecd: display %5.2f e(r2)


* restricted
reg ln_y_l_diff ln_y_l1960 ln_s1985_ln_n_g_d if intermediate
outreg2
nlcom (lambda: (-(ln(_b[ln_y_l1960]+1)/(1985-1960)))), post
test _b[lambda] = 0


reg ln_y_l_diff ln_y_l1960 ln_s1985_ln_n_g_d if oecd
outreg2
nlcom (lambda: (-(ln(_b[ln_y_l1960]+1)/(1985-1960)))), post
test _b[lambda] = 0	


* Graphs
graph twoway ///
	(scatter ln_gdppw_diff_conditional_inter ln_y_l1960, mlabel(country_isocode)) ///
	(lfit ln_gdppw_diff_conditional_inter ln_y_l1960) ///
	if intermediate, note(R-squared=`r2_inter') title("Intermediate")
	
graph twoway ///
	(scatter ln_gdppw_diff_conditional_oecd ln_y_l1960, mlabel(country_isocode)) ///
	(lfit ln_gdppw_diff_conditional_oecd ln_y_l1960) ///
	if oecd, note(R-squared=`r2_oecd') title("OECD")

* ADDING HUMAN CAPITAL
preserve
* Load the Barro-Lee data
* NOTE: We use different HC proxy than MRW, but the logic remains the same:
/* MRW (1992, p. 419): "We use a proxy for the rate of human-capital accumulation 
(sh ) that measures approximately the percentage of the working-age population 
that is in secondary school. We begin with data on the fraction of the eligible 
population (aged 12 to 17) enrolled in secondary school, which we obtained from 
the UNESCO yearbook. We then multiply this enrollment rate by the fraction of the 
working-age population that is of school age (aged 15 to 19). This variable, which 
we call SCHOOL, is clearly imperfect: the age ranges in the two data series are 
not exactly the same, the variable does not include the input of teachers, and 
it completely ignores primary and higher education. Yet if SCHOOL is proportional 
to sh, then we can use it to estimate equation (11); the factor of proportionality 
will affect only the constant term."
*/

* p. 416: "We are assuming that human capital depreciates at the same rate as 
* physical capital."
use BL2013.dta, clear
* we'll merge the dataset based on country ISO code; let's rename the WBcode
* variable so that it matches the name of the original variable
rename WBcode country_isocode
* keep data for 1960 only (the base year of our analysis)
keep if year==1960
* focus on the working-age population
keep if agefrom==15 & ageto==999
* save the data
save bl_educ.dta, replace
restore

merge 1:1 country_isocode using bl_educ.dta

* CONDITIONAL CONVERGENCE WITH HUMAN CAPITAL
gen school=(ls+lh)/100
reg ln_y_l_diff ln_y_l1960 ln_s1985 ln_n_g_d school if intermediate
outreg2

reg ln_y_l_diff ln_y_l1960 ln_s1985 ln_n_g_d school if oecd
outreg2

/* We can also go at lengths further an estimate the factor shares from the 
coefficients in the regression adding human capital (not the convergene regression, 
though): p. 417 "Like the textbook Solow model, the augmented model predicts 
coefficients in equation (11) that are functions of the factor shares. As before, 
$\alpha$ is physical capital's share of income, so we expect a value of $\alpha$ of 
about one third. Gauging a reasonable value of $\beta$, human capital's share, is more 
difficult. In the United States the minimum wage—roughly the return to labor without 
human capital—has averaged about 30 to 50 percent of the average wage in manufacturing. 
This fact suggests that 50 to 70 percent of total labor income represents the 
return to human capital, or that $\beta$ is between one third and one half."
+ Predictions on coefficient sizes in last two paragraphs on p. 417
*/