// ECON 626: EMPIRICAL MICROECONOMICS (FALL 2019) // L2. REGRESSION BASICS // IN-CLASS ACTIVITY 1. ATTENUATION BIAS // WRITTEN BY PAM JAKIELA & OWEN OZIER // preliminaries clear all cd "E:\Dropbox\econ-626-2019\lectures\L2 Regression\activities" version 14.2 // replace with earlier version as needed set seed 12345 // 1. Overview: // // Consider a regression of the form y* = beta x* + epsilon // where x*~U(0,2) and epsilon~N(0,1). You do not observe x*; // instead you observe x = x* + nu where nu~N(0,0.5). Assume // x*, epsilon, and nu are independent. // // 1a. Let beta = 1. Write a do file that simulates this data-generating // process in a sample of ten thousand observations. set obs 10000 gen xstar = 2*runiform() gen eps = rnormal() gen ystar = xstar + eps gen x = xstar + (1/sqrt(2))*rnormal() // 1b. You are interested in recovering the true coefficient, beta. // Since you know the data-generating process, you know that beta = 1. // How does the true coefficient compare to the coefficient you get // from a regression of y* on the observed x? reg ystar x /* OUTPUT: confidence interval for x does not include 1. . reg ystar x Source | SS df MS Number of obs = 10,000 -------------+---------------------------------- F(1, 9998) = 1197.75 Model | 1417.77862 1 1417.77862 Prob > F = 0.0000 Residual | 11834.6376 9,998 1.1837005 R-squared = 0.1070 -------------+---------------------------------- Adj R-squared = 0.1069 Total | 13252.4162 9,999 1.32537416 Root MSE = 1.088 ------------------------------------------------------------------------------ ystar | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .4097467 .0118395 34.61 0.000 .386539 .4329544 _cons | .5889379 .0162641 36.21 0.000 .557057 .6208188 ------------------------------------------------------------------------------ */ // 1c. When the independent variable of interest is measured with (mean-zero) // error, the OLS coefficient is biased toward zero. Let beta-hat be the OLS // coefficient resulting from a regression of y* on x. Show that your // answer to (1b) is consistent with the formula: // // plim beta-hat = beta*(1 - (s/(1+s))) // // where s = sigma^2_nu / sigma^2_x* (the signal-to-noise ratio). See // Cameron and Trivedi, pp. 903-904 for discussion. // ANSWER: We know that sigma^2_nu = 0.5 and sigma^2_x* = 1/3 (recall // the variance of a uniform: (1/12)*(b-a)^2 = (1/12)*2*2 in this case). // This implies that the limit of beta-hat is 0.4. // 1d. Now imagine that you observe x* but y* is measured with error - // speciffically, assume that you only observe y = y* + eta where // eta~N(0,0.5). How does the estiamted beta-hat compare to the true // beta? How does the estimated standard error of beta-hat compare to // the standard error from a regression of y* on x*? Why is this the case? gen y = ystar + (1/sqrt(2))*rnormal() reg y xstar reg ystar xstar // 2. Overview: // // In statistics, power is the probability of rejecting a false null // hypothesis. Consider the DGP y* = beta x* + epsilon where // x*~U(-sqrt(3),sqrt(3)) and epsilon~N(0,sigma^2_epsilon). Write a // program to generate an empirical estimate of the statistical power // of a regression of y* on x* in a sample of 100 observations. // Specifically, write a loop that generates one thousand data sets // of 100 observations each using the DGP described above; for each data // set, record the p-value associated with a test of the hypothesis that // beta-hat = 0. For a test size of 0.05, the fraction of observations // with p<0.05 provides an empirical estimate of the power of the test. // 2a. Once you've written your loop, use it to identify a value of // sigma^2_epsilon that will lead to a power of 0.8 in your N=100 // sample. // 2b. Now modify your loop to compare the results of the ideal regression // (described above) to a regression of y* on x = x* + nu where nu~N(0,1). // How much does attenuation bias reduces statistical power // 2c. Now modify your loop to compare the results of the ideal regression // to a regression of y = y* + eta on x* for eta~N(0,1). How much does // measurement error in the depedent variable reduce statistical pwoer clear set obs 100 gen pval = . gen p_xnoise = . gen p_ynoise = . forvalues i = 1/100 { di "Loop `i'" quietly gen xstar = 2*sqrt(3)*runiform() - sqrt(3) in 1/100 quietly gen eps = 3.6*rnormal() in 1/100 quietly gen ystar = xstar + eps in 1/100 quietly gen x = xstar + rnormal() in 1/100 quietly gen y = ystar + rnormal() in 1/100 ** 2a quietly reg ystar xstar matrix V = r(table) local temp_pval = V[4,1] quietly replace pval = `temp_pval' in `i' matrix drop V ** 2b quietly reg ystar x matrix V = r(table) local temp_pval = V[4,1] quietly replace p_xnoise = `temp_pval' in `i' matrix drop V ** 2c quietly reg y xstar matrix V = r(table) local temp_pval = V[4,1] quietly replace p_ynoise = `temp_pval' in `i' matrix drop V drop xstar ystar x y eps } gen reject = (pval<=0.05) gen rej_xnoise = (p_xnoise<=0.05) gen rej_ynoise = (p_ynoise<=0.05) tab reject tab rej_x tab rej_y