* This code generates standard BH (1995) q-values as described in Anderson (2008), "Multiple Inference and Gender Differences in the Effects of Early Intervention: A Reevaluation of the Abecedarian, Perry Preschool, and Early Training Projects", Journal of the American Statistical Association, 103(484), 1481-1495
* BH (1995) q-values are introduced in Benjamini and Hochberg (1995), "Controlling the False Discovery Rate", Journal of the Royal Statistical Society: Series B, 57(1), 289-300
* Last modified: M. Anderson, 11/20/07
* Test Platform: Stata/MP 10.0 for Macintosh (Intel 32-bit), Mac OS X 10.5.1
* Should be compatible with Stata 10 or greater on all platforms
* Likely compatible with with Stata 9 or earlier on all platforms (remove "version 10" line below)
version 10
**** INSTRUCTIONS:
**** Please start with a clear data set
**** When prompted, paste the vector of p-values you are testing into the "pval" variable
**** Please use the "do" button rather than the "run" button to run this file (if you use "run", you will miss the instructions at the prompts
pause on
set more off
if _N>0 {
display "Please clear data set before proceeding"
display "After clearing, type 'q' to resume"
pause
}
quietly gen float pval = .
display "***********************************"
display "Please paste the vector of p-values that you wish to test into the variable 'pval'"
display "After pasting, type 'q' to resume"
display "***********************************"
pause
* Collect the total number of p-values tested
quietly sum pval
local totalpvals = r(N)
* Sort the p-values in ascending order and generate a variable that codes each p-value's rank
quietly gen int original_sorting_order = _n
quietly sort pval
quietly gen int rank = _n if pval~=.
* Set the initial counter to 1
local qval = 1
* Generate the variable that will contain the BH (1995) q-values
gen bh95_qval = 1 if pval~=.
* Set up a loop that begins by checking which hypotheses are rejected at q = 1.000, then checks which hypotheses are rejected at q = 0.999, then checks which hypotheses are rejected at q = 0.998, etc. The loop ends by checking which hypotheses are rejected at q = 0.001.
while `qval' > 0 {
* Generate value qr/M
quietly gen fdr_temp = `qval'*rank/`totalpvals'
* Generate binary variable checking condition p(r) <= qr/M
quietly gen reject_temp = (fdr_temp>=pval) if fdr_temp~=.
* Generate variable containing p-value ranks for all p-values that meet above condition
quietly gen reject_rank = reject_temp*rank
* Record the rank of the largest p-value that meets above condition
quietly egen total_rejected = max(reject_rank)
* A p-value has been rejected at level q if its rank is less than or equal to the rank of the max p-value that meets the above condition
replace bh95_qval = `qval' if rank <= total_rejected & rank~=.
* Reduce q by 0.001 and repeat loop
quietly drop fdr_temp reject_temp reject_rank total_rejected
local qval = `qval' - .001
}
quietly sort original_sorting_order
pause off
set more on
display "Code has completed."
display "Benjamini Hochberg (1995) q-vals are in variable 'bh95_qval'"
display "Sorting order is the same as the original vector of p-values"