Title: | Computing Power and Sample Size for the False Discovery Rate in Multiple Applications |
---|---|
Description: | Defines a collection of functions to compute average power and sample size for studies that use the false discovery rate as the final measure of statistical significance. A three-rectangle approximation method of a p-value histogram is proposed to derive a formula to compute the statistical power for analyses that involve the FDR. The methodology paper of this package is under review. |
Authors: | Yonghui Ni [aut, cre], Stanley Pounds [aut] |
Maintainer: | Yonghui Ni <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.0 |
Built: | 2025-02-26 04:30:13 UTC |
Source: | https://github.com/cran/FDRsamplesize2 |
Given the proportion pi0 of tests with a true null, find the p-value threshold that results in a desired FDR and average power.
alpha.power.fdr(fdr, pwr, pi0, method = "HH")
alpha.power.fdr(fdr, pwr, pi0, method = "HH")
fdr |
desired FDR (scalar numeric) |
pwr |
desired average power (scalar numeric) |
pi0 |
the proportion of tests with a true null hypothesis |
method |
method to estimate proportion |
To get the fixed p-value threshold for multiple testing procedure, 4 approximation methods are provided, they are Benjamini & Hochberg procedure (1995), Jung's formula (2005), method of using p-value histogram height (HH) and method of using p-value histogram mean (HM). For last two methods' details, see Ni Y, Onar-Thomas A, Pounds S. "Computing Power and Sample Size for the False Discovery Rate in Multiple Applications"
The fixed p-value threshold for multiple testing procedure
Pounds S and Cheng C, "Sample size determination for the false discovery rate." Bioinformatics 21.23 (2005): 4263-4271.
Gadbury GL, et al. (2004) Power and sample size estimation in high dimensional biology. Statistical Methods in Medical Research 13(4):325-38.
Jung,Sin-Ho."Sample size for FDR-control in microarray data analysis." Bioinformatics 21.14 (2005): 3097-3104.
Ni Y, Seffernick A, Onar-Thomas A, Pounds S. "Computing Power and Sample Size for the False Discovery Rate in Multiple Applications", Manuscript.
alpha.power.fdr(fdr = 0.1, pwr = 0.9, pi0=0.9, method = "HH")
alpha.power.fdr(fdr = 0.1, pwr = 0.9, pi0=0.9, method = "HH")
Compute the average power of many Cox regression models for a given number of events, p-value threshold, vector of effect sizes (log hazard ratio),and variance of predictor variables
average.power.coxph(n, alpha, logHR, v)
average.power.coxph(n, alpha, logHR, v)
n |
number of events (scalar) |
alpha |
p-value threshold (scalar) |
logHR |
log hazard ratio (vector) |
v |
variance of predictor variable (vector) |
Average power estimate for multiple testing procedure
Hsieh, FY and Lavori, Philip W (2000) Sample-size calculations for the Cox proportional hazards regression model with non-binary covariates. Controlled Clinical Trials 21(6):552-560.
power.cox
for more details about power calculation of single-predictor Cox regression model. The power calculation is based on asymptotic normal approximation.
logHR = log(rep(c(1, 2),c(900, 100))); v = rep(1, 1000); average.power.coxph(n = 50, alpha = 0.05, logHR = logHR, v = v)
logHR = log(rep(c(1, 2),c(900, 100))); v = rep(1, 1000); average.power.coxph(n = 50, alpha = 0.05, logHR = logHR, v = v)
Compute average power of many Fisher's exact tests
average.power.fisher(p1, p2, n, alpha, alternative)
average.power.fisher(p1, p2, n, alpha, alternative)
p1 |
probability in one group (vector) |
p2 |
probability in other group (vector) |
n |
per-group sample size |
alpha |
p-value threshold |
alternative |
one- or two-sided test |
Average power estimate for multiple testing procedure
power.fisher
for more details about power calculation of Fisher's exact test
set.seed(1234); p1 = sample(seq(0,0.5,0.1),5,replace = TRUE); p2 = sample(seq(0.5,1,0.1),5,replace = TRUE); average.power.fisher(p1 = p1,p2 = p2,n = 20,alpha = 0.05,alternative = "two.sided")
set.seed(1234); p1 = sample(seq(0,0.5,0.1),5,replace = TRUE); p2 = sample(seq(0.5,1,0.1),5,replace = TRUE); average.power.fisher(p1 = p1,p2 = p2,n = 20,alpha = 0.05,alternative = "two.sided")
Compute average power for RNA-seq experiments assuming Negative Binomial distribution
average.power.hart(n, alpha, log.fc, mu, sig)
average.power.hart(n, alpha, log.fc, mu, sig)
n |
per-group sample size (scalar) |
alpha |
p-value threshold (scalar) |
log.fc |
log fold-change (vector), usual null hypothesis is log.fc=0 |
mu |
read depth per gene (vector, same length as log.fc) |
sig |
coefficient of variation (CV) per gene (vector, same length as log.fc) |
The power function is based on equation (1) of Hart et al (2013). It assumes a Negative Binomial model for RNA-seq read counts and equal sample size per group.
Average power estimate for multiple testing procedure
SN Hart, TM Therneau, Y Zhang, GA Poland, and J-P Kocher (2013). Calculating Sample Size Estimates for RNA Sequencing Data. Journal of Computational Biology 20: 970-978.
power.hart
for more details about power calculation of data under Negative Binomial distribution. The power calculation is based on asymptotic normal approximation.
logFC = log(rep(c(1,2),c(900,100))); mu = rep(5,1000); sig = rep(0.6,1000); average.power.hart(n = 50, alpha = 0.05,log.fc = logFC, mu = mu, sig = sig)
logFC = log(rep(c(1,2),c(900,100))); mu = rep(5,1000); sig = rep(0.6,1000); average.power.hart(n = 50, alpha = 0.05,log.fc = logFC, mu = mu, sig = sig)
Use the formula of Li et al (2013) to compute power for comparing RNA-seq expression across two groups assuming the Poisson distribution.
average.power.li(n, alpha, rho, mu0, w, type)
average.power.li(n, alpha, rho, mu0, w, type)
n |
per-group sample size |
alpha |
p-value threshold (scalar) |
rho |
fold-change, usual null hypothesis is that rho=1 (vector) |
mu0 |
average count in control group (vector) |
w |
ratio of the total number of reads mapped between the two groups (scalar or vector) |
type |
type of test: "w" for Wald, "s" for score, "lw" for log-transformed Wald, "ls" for log-transformed score |
This function computes the average power for a series of two-sided tests defined by the input parameters. The power is based on the sample size formulas in equations (10-13) of Li et al (2013). Also, note that the null.effect is set to 1 in the examples because the usual null hypothesis is that the fold-change = 1.
Average power estimate for multiple testing procedure
C-I Li, P-F Su, Y Guo, and Y Shyr (2013). Sample size calculation for differential expression analysis of RNA-seq data under Poisson distribution. Int J Comput Biol Drug Des 6(4).<doi:10.1504/IJCBDD.2013.056830>
power.li
for more details about power calculation of data under Poisson distribution
rho = rep(c(1,1.25),c(900,100)); mu0 = rep(5,1000); w = rep(0.5,1000); average.power.li(n = 50, alpha = 0.05, rho = rho, mu0 = mu0, w = w, type = "w")
rho = rep(c(1,1.25),c(900,100)); mu0 = rep(5,1000); w = rep(0.5,1000); average.power.li(n = 50, alpha = 0.05, rho = rho, mu0 = mu0, w = w, type = "w")
Compute average power of many one-way ANOVA tests
average.power.oneway(n, alpha, theta, k)
average.power.oneway(n, alpha, theta, k)
n |
per-group sample size (scalar) |
alpha |
p-value threshold (scalar) |
theta |
sum of ((group mean - overall mean)/stdev)^2 across all groups for each hypothesis test(vector) |
k |
the number of groups to be compared |
Average power estimate for multiple testing procedure
power.oneway
for more details about power calculation of one-way ANOVA
theta=rep(c(2,0),c(100,900)); average.power.oneway(n = 50, alpha = 0.05, theta = theta, k = 2)
theta=rep(c(2,0),c(100,900)); average.power.oneway(n = 50, alpha = 0.05, theta = theta, k = 2)
Compute average power of rank-sum tests
average.power.ranksum(n, alpha, p)
average.power.ranksum(n, alpha, p)
n |
sample size (scalar) |
alpha |
p-value threshold (scalar) |
p |
Pr(Y>X), as in Noether (JASA 1987) |
Average power estimate for multiple testing procedure
power.ranksum
for more details about power calculation of rank-sum test. The power calculation is based on asymptotic normal approximation.
p = rep(c(0.8,0.5),c(100,900)); average.power.ranksum(n = 50, alpha = 0.05, p=p)
p = rep(c(0.8,0.5),c(100,900)); average.power.ranksum(n = 50, alpha = 0.05, p=p)
Compute average power of many signed-rank tests
average.power.signrank(n, alpha, p1, p2)
average.power.signrank(n, alpha, p1, p2)
n |
sample size (scalar) |
alpha |
p-value threshold (scalar) |
p1 |
Pr(X>0), as in Noether (JASA 1987) |
p2 |
Pr(X+X'>0), as in Noether (JASA 1987) |
Average power estimate for multiple testing procedure
power.signrank
for more details about power calculation of signed-rank test. The power calculation is based on asymptotic normal approximation.
p1 = rep(c(0.8,0.5),c(100,900)); p2 = rep(c(0.8,0.5),c(100,900)); average.power.signrank(n = 50, alpha = 0.05, p1 = p1, p2 = p2)
p1 = rep(c(0.8,0.5),c(100,900)); p2 = rep(c(0.8,0.5),c(100,900)); average.power.signrank(n = 50, alpha = 0.05, p1 = p1, p2 = p2)
Compute average power of many sign tests
average.power.signtest(n, alpha, p)
average.power.signtest(n, alpha, p)
n |
sample size (scalar) |
alpha |
p-value threshold (scalar) |
p |
Pr(Y>X), as in Noether (JASA 1987) |
Average power estimate for multiple testing procedure
power.signtest
for more details about power calculation of sign test. The power calculation is based on asymptotic normal approximation.
p = rep(c(0.8,0.5),c(100,900)); average.power.signtest(n = 50, alpha = 0.05, p=p)
p = rep(c(0.8,0.5),c(100,900)); average.power.signtest(n = 50, alpha = 0.05, p=p)
Compute average power of many t-tests; Uses classical power formula for t-test; Assumes equal variance and sample size
average.power.t.test( n, alpha, delta, sigma = 1, type = "two.sample", alternative = "two.sided" )
average.power.t.test( n, alpha, delta, sigma = 1, type = "two.sample", alternative = "two.sided" )
n |
per-group sample size (scalar) |
alpha |
p-value threshold (scalar) |
delta |
difference of population means (vector) |
sigma |
standard deviation (vector or scalar, default=1) |
type |
type of t-test: "two.sample", "one.sample" |
alternative |
one- or two-sided test |
Average power estimate for multiple testing procedure
d = rep(c(2,0),c(100,900)); average.power.t.test(n = 20, alpha = 0.05,delta = d)
d = rep(c(2,0),c(100,900)); average.power.t.test(n = 20, alpha = 0.05,delta = d)
Compute average power of many t-tests for non-zero correlation
average.power.tcorr(n, alpha, rho)
average.power.tcorr(n, alpha, rho)
n |
sample size (scalar) |
alpha |
p-value threshold (scalar) |
rho |
population correlation coefficient (vector) |
For many applications, the null.effect is rho = 0
Average power estimate for multiple testing procedure
power.tcorr
for more details about power calculation of t-test for non-zero correlation
rho = rep(c(0.3,0),c(100,900)); average.power.tcorr(n = 50, alpha = 0.05, rho = rho)
rho = rep(c(0.3,0),c(100,900)); average.power.tcorr(n = 50, alpha = 0.05, rho = rho)
Computer average power of many two proportion z-tests.The power calculation of two proportion z-test is based on asymptotic normal approximation.
average.power.twoprop(n, alpha, p1, p2, alternative)
average.power.twoprop(n, alpha, p1, p2, alternative)
n |
per-group sample size (scalar) |
alpha |
p-value threshold (scalar) |
p1 |
probability in one group (vector) |
p2 |
probability in other group (vector) |
alternative |
one- or two-sided test |
Average power estimate for multiple testing procedure
set.seed(1234); p1 = sample(seq(0,0.5,0.1),40,replace = TRUE); p2 = sample(seq(0.5,1,0.1),40,replace = TRUE); average.power.twoprop(n = 30, alpha = 0.05, p1 = p1,p2 = p2,alternative="two.sided")
set.seed(1234); p1 = sample(seq(0,0.5,0.1),40,replace = TRUE); p2 = sample(seq(0.5,1,0.1),40,replace = TRUE); average.power.twoprop(n = 30, alpha = 0.05, p1 = p1,p2 = p2,alternative="two.sided")
For a given fixed sample size and effect size vector,compute FDR and average power as a function of the p-value threshold alpha.
fdr.avepow(n, avepow.func, null.hypo, alpha = 1:100/1000, method = "BH", ...)
fdr.avepow(n, avepow.func, null.hypo, alpha = 1:100/1000, method = "BH", ...)
n |
sample size |
avepow.func |
function to compute average power |
null.hypo |
string to evaluate null hypothesis |
alpha |
p-value threshold(s) to consider |
method |
method to estimate proportion pi0 of tests with a true null hypothesis, including: "HH" (p-value histogram height) , "HM" (p-value histogram mean), "BH" (Benjamini & Hochberg 1995), "Jung" (Jung 2005) |
... |
additional arguments, including effect size vector for average power function |
A list with the following components:
n |
input sample size |
avepow.func |
average power function |
null.hypo |
null hypothesis string |
pi0 |
computed value of pi0 |
method |
method to estimate proportion |
other.args |
additional arguments |
res.tbl |
table of alpha, fdr, and average power |
Pounds S and Cheng C, "Sample size determination for the false discovery rate." Bioinformatics 21.23 (2005): 4263-4271.
Gadbury GL, et al. (2004) Power and sample size estimation in high dimensional biology. Statistical Methods in Medical Research 13(4):325-38.
Jung,Sin-Ho."Sample size for FDR-control in microarray data analysis." Bioinformatics 21.14 (2005): 3097-3104.
Ni Y, Seffernick A, Onar-Thomas A, Pounds S. "Computing Power and Sample Size for the False Discovery Rate in Multiple Applications", Manuscript.
n = 50; # number of events logHR = rep(c(0,0.5),c(950,50)); v = rep(1,length(logHR)); # variance of predictor variable (vector) res = fdr.avepow(n,average.power.coxph,"logHR==0",logHR=logHR,v=v); res$pi0; head(res$res.tbl)
n = 50; # number of events logHR = rep(c(0,0.5),c(950,50)); v = rep(1,length(logHR)); # variance of predictor variable (vector) res = fdr.avepow(n,average.power.coxph,"logHR==0",logHR=logHR,v=v); res$pi0; head(res$res.tbl)
Compute the FDR for given values of the p-value threshold alpha, average power, and proportion pi0 of tests with a true null hypothesis.
fdr.power.alpha(alpha, pwr, pi0, method = "HH")
fdr.power.alpha(alpha, pwr, pi0, method = "HH")
alpha |
p-value threshold (vector) |
pwr |
average power |
pi0 |
actual proportion of tests with a true null hypothesis |
method |
method to estimate proportion |
FDR
Pounds S and Cheng C, "Sample size determination for the false discovery rate." Bioinformatics 21.23 (2005): 4263-4271.
Gadbury GL, et al. (2004) Power and sample size estimation in high dimensional biology. Statistical Methods in Medical Research 13(4):325-38.
Jung,Sin-Ho."Sample size for FDR-control in microarray data analysis." Bioinformatics 21.14 (2005): 3097-3104.
Ni Y, Seffernick A, Onar-Thomas A, Pounds S. "Computing Power and Sample Size for the False Discovery Rate in Multiple Applications", Manuscript.
alpha = 1:100/1000; pwr = rep(0.8,length(alpha)); pi0 = 0.95; fdr.power.alpha(alpha,pwr,pi0,method="HH")
alpha = 1:100/1000; pwr = rep(0.8,length(alpha)); pi0 = 0.95; fdr.power.alpha(alpha,pwr,pi0,method="HH")
Determines the sample size needed to achieve the desired FDR and average power by given the proportion of true null hypothesis.
find.sample.size(alpha, pwr, avepow.func, n0 = 3, n1 = 6, max.its = 50, ...)
find.sample.size(alpha, pwr, avepow.func, n0 = 3, n1 = 6, max.its = 50, ...)
alpha |
the fixed p-value threshold (scalar numeric) |
pwr |
desired average power (scalar numeric) |
avepow.func |
an R function to compute average power |
n0 |
lower limit for initial sample size range |
n1 |
upper limit for initial sample size range |
max.its |
maximum number of iterations |
... |
additional arguments to average power function |
A list with the following components:
n |
a sample size estimate |
computed.avepow |
average power |
desired.avepow |
desired average power |
alpha |
fixed p-value threshold for multiple testing procedure |
n.its |
number of iteration |
max.its |
maximum number of iteration, default is 50 |
n0 |
lower limit for initial sample size range |
n1 |
upper limit for initial sample size range |
For the test with power calculation based on asymptotic normal approximation, we suggest checking FDRsamplesize2
calculation by simulation.
#Here, calculating the sample size for the study involving many sign tests average.power.signtest; p.adj = 0.001; p = rep(c(0.8,0.5), c(100,9900)); find.sample.size(alpha = p.adj, pwr = 0.8, avepow.func = average.power.signtest, p = p)
#Here, calculating the sample size for the study involving many sign tests average.power.signtest; p.adj = 0.001; p = rep(c(0.8,0.5), c(100,9900)); find.sample.size(alpha = p.adj, pwr = 0.8, avepow.func = average.power.signtest, p = p)
Find number of events needed to have a desired false discovery rate and average power for a large number of Cox regression models with non-binary covariates.
n.fdr.coxph(fdr, pwr, logHR, v, pi0.hat = "BH")
n.fdr.coxph(fdr, pwr, logHR, v, pi0.hat = "BH")
fdr |
desired FDR (scalar numeric) |
pwr |
desired average power (scalar numeric) |
logHR |
log hazard ratio (vector) |
v |
variance of predictor variable (vector) |
pi0.hat |
method to estimate proportion |
A list with the following components:
n |
number of events estimate |
computed.avepow |
average power |
desired.avepow |
desired average power |
desired.fdr |
desired FDR |
input.pi0 |
proportion of tests with a true null hypothesis |
alpha |
fixed p-value threshold for multiple testing procedure |
n.its |
number of iteration |
max.its |
maximum number of iteration, default is 50 |
n0 |
lower limit for initial sample size range |
n1 |
upper limit for initial sample size range |
For the test with power calculation based on asymptotic normal approximation, we suggest checking FDRsamplesize2
calculation by simulation.
Hsieh, FY and Lavori, Philip W (2000) Sample-size calculations for the Cox proportional hazards regression model with non-binary covariates. Controlled Clinical Trials 21(6):552-560.
log.HR=log(rep(c(1,2),c(900,100))) v=rep(1,1000) n.fdr.coxph(fdr=0.1, pwr=0.8,logHR=log.HR, v=v, pi0.hat="BH")
log.HR=log(rep(c(1,2),c(900,100))) v=rep(1,1000) n.fdr.coxph(fdr=0.1, pwr=0.8,logHR=log.HR, v=v, pi0.hat="BH")
Find the sample size needed to have a desired false discovery rate and average power for a large number of Fisher's exact tests.
n.fdr.fisher(fdr, pwr, p1, p2, alternative = "two.sided", pi0.hat = "BH")
n.fdr.fisher(fdr, pwr, p1, p2, alternative = "two.sided", pi0.hat = "BH")
fdr |
desired FDR (scalar numeric) |
pwr |
desired average power (scalar numeric) |
p1 |
probability in one group (vector) |
p2 |
probability in other group (vector) |
alternative |
one- or two-sided test |
pi0.hat |
method to estimate proportion |
A list with the following components:
n |
per-group sample size estimate |
computed.avepow |
average power |
desired.avepow |
desired average power |
desired.fdr |
desired FDR |
input.pi0 |
proportion of tests with a true null hypothesis |
alpha |
fixed p-value threshold for multiple testing procedure |
n.its |
number of iteration |
max.its |
maximum number of iteration, default is 50 |
n0 |
lower limit for initial sample size range |
n1 |
upper limit for initial sample size range |
set.seed(1234); p1 = sample(seq(0,0.5,0.1),10,replace = TRUE); p2 = sample(seq(0.5,1,0.1),10,replace = TRUE); n.fdr.fisher(fdr = 0.1, pwr = 0.8, p1 = p1, p2 = p2, alternative = "two.sided", pi0.hat = "BH")
set.seed(1234); p1 = sample(seq(0,0.5,0.1),10,replace = TRUE); p2 = sample(seq(0.5,1,0.1),10,replace = TRUE); n.fdr.fisher(fdr = 0.1, pwr = 0.8, p1 = p1, p2 = p2, alternative = "two.sided", pi0.hat = "BH")
Find the sample size needed to have a desired false discovery rate and average power for a large number of Negative Binomial comparisons.
n.fdr.negbin(fdr, pwr, log.fc, mu, sig, pi0.hat = "BH")
n.fdr.negbin(fdr, pwr, log.fc, mu, sig, pi0.hat = "BH")
fdr |
desired FDR (scalar numeric) |
pwr |
desired average power (scalar numeric) |
log.fc |
log fold-change (vector), usual null hypothesis is log.fc=0 |
mu |
read depth per gene (vector, same length as log.fc) |
sig |
coefficient of variation (CV) per gene (vector, same length as log.fc) |
pi0.hat |
method to estimate proportion |
A list with the following components:
n |
per-group sample size estimate |
computed.avepow |
average power |
desired.avepow |
desired average power |
desired.fdr |
desired FDR |
input.pi0 |
proportion of tests with a true null hypothesis |
alpha |
fixed p-value threshold for multiple testing procedure |
n.its |
number of iteration |
max.its |
maximum number of iteration, default is 50 |
n0 |
lower limit for initial sample size range |
n1 |
upper limit for initial sample size range |
For the test with power calculation based on asymptotic normal approximation, we suggest checking FDRsamplesize2
calculation by simulation.
SN Hart, TM Therneau, Y Zhang, GA Poland, and J-P Kocher (2013). Calculating Sample Size Estimates for RNA Sequencing Data. Journal of Computational Biology 20: 970-978.
logFC = log(rep(c(1,2),c(900,100))); mu = rep(5,1000); sig = rep(0.6,1000); n.fdr.negbin(fdr = 0.1, pwr = 0.8, log.fc = logFC, mu = mu, sig = sig, pi0.hat = "BH")
logFC = log(rep(c(1,2),c(900,100))); mu = rep(5,1000); sig = rep(0.6,1000); n.fdr.negbin(fdr = 0.1, pwr = 0.8, log.fc = logFC, mu = mu, sig = sig, pi0.hat = "BH")
Find the sample size needed to have a desired false discovery rate and average power for a large number of one-way ANOVA tests.
n.fdr.oneway(fdr, pwr, theta, k, pi0.hat = "BH")
n.fdr.oneway(fdr, pwr, theta, k, pi0.hat = "BH")
fdr |
desired FDR (scalar numeric) |
pwr |
desired average power (scalar numeric) |
theta |
sum of ((group mean - overall mean)/stdev)^2 across all groups for each hypothesis test (vector) |
k |
the number of groups to be compared |
pi0.hat |
method to estimate proportion |
A list with the following components:
n |
per-group sample size estimate |
computed.avepow |
average power |
desired.avepow |
desired average power |
desired.fdr |
desired FDR |
input.pi0 |
proportion of tests with a true null hypothesis |
alpha |
fixed p-value threshold for multiple testing procedure |
n.its |
number of iteration |
max.its |
maximum number of iteration, default is 50 |
n0 |
lower limit for initial sample size range |
n1 |
upper limit for initial sample size range |
theta=rep(c(2,0),c(100,900)); n.fdr.oneway(fdr = 0.1, pwr = 0.8, theta = theta, k = 2, pi0.hat = "BH")
theta=rep(c(2,0),c(100,900)); n.fdr.oneway(fdr = 0.1, pwr = 0.8, theta = theta, k = 2, pi0.hat = "BH")
Find the sample size needed to have a desired false discovery rate and average power for a large number of two-group comparisons under Poisson distribution.
n.fdr.poisson(fdr, pwr, rho, mu0, w, type, pi0.hat = "BH")
n.fdr.poisson(fdr, pwr, rho, mu0, w, type, pi0.hat = "BH")
fdr |
desired FDR (scalar numeric) |
pwr |
desired average power (scalar numeric) |
rho |
fold-change, usual null hypothesis is that rho=1 (vector) |
mu0 |
average count in control group (vector) |
w |
ratio of the total number of reads mapped between the two groups |
type |
type of test: "w" for Wald, "s" for score, "lw" for log-transformed Wald, "ls" for log-transformed score. |
pi0.hat |
method to estimate proportion |
A list with the following components:
n |
per-group sample size estimate |
computed.avepow |
average power |
desired.avepow |
desired average power |
desired.fdr |
desired FDR |
input.pi0 |
proportion of tests with a true null hypothesis |
alpha |
fixed p-value threshold for multiple testing procedure |
n.its |
number of iteration |
max.its |
maximum number of iteration, default is 50 |
n0 |
lower limit for initial sample size range |
n1 |
upper limit for initial sample size range |
C-I Li, P-F Su, Y Guo, and Y Shyr (2013). Sample size calculation for differential expression analysis of RNA-seq data under Poisson distribution. Int J Comput Biol Drug Des 6(4).<doi:10.1504/IJCBDD.2013.056830>
rho = rep(c(1,1.25),c(900,100)); mu0 = rep(5,1000); w = rep(0.5,1000); n.fdr.poisson(fdr = 0.1, pwr = 0.8, rho = rho, mu0 = mu0, w = w, type = "w", pi0.hat = "BH")
rho = rep(c(1,1.25),c(900,100)); mu0 = rep(5,1000); w = rep(0.5,1000); n.fdr.poisson(fdr = 0.1, pwr = 0.8, rho = rho, mu0 = mu0, w = w, type = "w", pi0.hat = "BH")
Find the sample size needed to have a desired false discovery rate and average power for a large number of rank-sum tests.
n.fdr.ranksum(fdr, pwr, p, pi0.hat = "BH")
n.fdr.ranksum(fdr, pwr, p, pi0.hat = "BH")
fdr |
desired FDR (scalar numeric) |
pwr |
desired average power (scalar numeric) |
p |
Pr(Y>X), as in Noether (JASA 1987) |
pi0.hat |
method to estimate proportion |
A list with the following components:
n |
sample size estimate |
computed.avepow |
average power |
desired.avepow |
desired average power |
desired.fdr |
desired FDR |
input.pi0 |
proportion of tests with a true null hypothesis |
alpha |
fixed p-value threshold for multiple testing procedure |
n.its |
number of iteration |
max.its |
maximum number of iteration, default is 50 |
n0 |
lower limit for initial sample size range |
n1 |
upper limit for initial sample size range |
Noether, Gottfried E (1987) Sample size determination for some common nonparametric tests. Journal of the American Statistical Association, 82:645-647.
p = rep(c(0.8,0.5),c(100,900)); n.fdr.ranksum(fdr = 0.1, pwr = 0.8, p = p, pi0.hat = "BH")
p = rep(c(0.8,0.5),c(100,900)); n.fdr.ranksum(fdr = 0.1, pwr = 0.8, p = p, pi0.hat = "BH")
Find the sample size needed to have a desired false discovery rate and average power for a large number of signed-rank tests.
n.fdr.signrank(fdr, pwr, p1, p2, pi0.hat = "BH")
n.fdr.signrank(fdr, pwr, p1, p2, pi0.hat = "BH")
fdr |
desired FDR (scalar numeric) |
pwr |
desired average power (scalar numeric) |
p1 |
Pr(X>0), as in Noether (JASA 1987) |
p2 |
Pr(X+X'>0), as in Noether (JASA 1987) |
pi0.hat |
method to estimate proportion |
A list with the following components:
n |
sample size estimate |
computed.avepow |
average power |
desired.avepow |
desired average power |
desired.fdr |
desired FDR |
input.pi0 |
proportion of tests with a true null hypothesis |
alpha |
fixed p-value threshold for multiple testing procedure |
n.its |
number of iteration |
max.its |
maximum number of iteration, default is 50 |
n0 |
lower limit for initial sample size range |
n1 |
upper limit for initial sample size range |
Noether, Gottfried E (1987) Sample size determination for some common nonparametric tests. Journal of the American Statistical Association, 82:645-647.
p1 = rep(c(0.8,0.5),c(100,900)); p2 = rep(c(0.8,0.5),c(100,900)); n.fdr.signrank(fdr = 0.1, pwr = 0.8, p1 = p1, p2 = p2, pi0.hat = "BH")
p1 = rep(c(0.8,0.5),c(100,900)); p2 = rep(c(0.8,0.5),c(100,900)); n.fdr.signrank(fdr = 0.1, pwr = 0.8, p1 = p1, p2 = p2, pi0.hat = "BH")
Find the sample size needed to have a desired false discovery rate and average power for a large number of sign tests.
n.fdr.signtest(fdr, pwr, p, pi0.hat = "BH")
n.fdr.signtest(fdr, pwr, p, pi0.hat = "BH")
fdr |
desired FDR (scalar numeric) |
pwr |
desired average power (scalar numeric) |
p |
Pr(X>0), as in Noether (JASA 1987) |
pi0.hat |
method to estimate proportion |
A list with the following components:
n |
sample size estimate |
computed.avepow |
average power |
desired.avepow |
desired average power |
desired.fdr |
desired FDR |
input.pi0 |
proportion of tests with a true null hypothesis |
alpha |
fixed p-value threshold for multiple testing procedure |
n.its |
number of iteration |
max.its |
maximum number of iteration, default is 50 |
n0 |
lower limit for initial sample size range |
n1 |
upper limit for initial sample size range |
For the test with power calculation based on asymptotic normal approximation, we suggest checking FDRsamplesize2
calculation by simulation.
Noether, Gottfried E (1987) Sample size determination for some common nonparametric tests. Journal of the American Statistical Association, 82:645-647.
p = rep(c(0.8, 0.5), c(100, 900)); n.fdr.signtest(fdr = 0.1, pwr = 0.8, p = p, pi0.hat = "BH")
p = rep(c(0.8, 0.5), c(100, 900)); n.fdr.signtest(fdr = 0.1, pwr = 0.8, p = p, pi0.hat = "BH")
Find the sample size needed to have a desired false discovery rate and average power for a large number of t-tests for non-zero correlation.
n.fdr.tcorr(fdr, pwr, rho, pi0.hat = "BH")
n.fdr.tcorr(fdr, pwr, rho, pi0.hat = "BH")
fdr |
desired FDR (scalar numeric) |
pwr |
desired average power (scalar numeric) |
rho |
population correlation coefficient (vector) |
pi0.hat |
method to estimate proportion |
A list with the following components:
n |
sample size estimate |
computed.avepow |
average power |
desired.avepow |
desired average power |
desired.fdr |
desired FDR |
input.pi0 |
proportion of tests with a true null hypothesis |
alpha |
fixed p-value threshold for multiple testing procedure |
n.its |
number of iteration |
max.its |
maximum number of iteration, default is 50 |
n0 |
lower limit for initial sample size range |
n1 |
upper limit for initial sample size range |
rho = rep(c(0.3,0),c(100,900)); n.fdr.tcorr(fdr = 0.1, pwr = 0.8, rho = rho, pi0.hat="BH")
rho = rep(c(0.3,0),c(100,900)); n.fdr.tcorr(fdr = 0.1, pwr = 0.8, rho = rho, pi0.hat="BH")
Find the sample size needed to have a desired false discovery rate and average power for a large number of t-tests.
n.fdr.ttest( fdr, pwr, delta, sigma = 1, type = "two.sample", pi0.hat = "BH", alternative = "two.sided" )
n.fdr.ttest( fdr, pwr, delta, sigma = 1, type = "two.sample", pi0.hat = "BH", alternative = "two.sided" )
fdr |
desired FDR (scalar numeric) |
pwr |
desired average power (scalar numeric) |
delta |
difference of population means (vector) |
sigma |
standard deviation (vector or scalar) |
type |
type of t-test |
pi0.hat |
method to estimate proportion |
alternative |
one- or two-sided test |
A list with the following components:
n |
sample size (per group) estimate |
computed.avepow |
average power |
desired.avepow |
desired average power |
desired.fdr |
desired FDR |
input.pi0 |
proportion of tests with a true null hypothesis |
alpha |
fixed p-value threshold for multiple testing procedure |
n.its |
number of iteration |
max.its |
maximum number of iteration, default is 50 |
n0 |
lower limit for initial sample size range |
n1 |
upper limit for initial sample size range |
d = rep(c(2,0),c(100,900)); n.fdr.ttest(fdr = 0.1, pwr = 0.8, delta = d)
d = rep(c(2,0),c(100,900)); n.fdr.ttest(fdr = 0.1, pwr = 0.8, delta = d)
Find the sample size needed to have a desired false discovery rate and average power for a large number of two-group comparisons using the two proportion z-test.
n.fdr.twoprop(fdr, pwr, p1, p2, alternative = "two.sided", pi0.hat = "BH")
n.fdr.twoprop(fdr, pwr, p1, p2, alternative = "two.sided", pi0.hat = "BH")
fdr |
desired FDR (scalar numeric) |
pwr |
desired average power (scalar numeric) |
p1 |
probability in one group (vector) |
p2 |
probability in other group (vector) |
alternative |
one- or two-sided test |
pi0.hat |
method to estimate proportion |
A list with the following components:
n |
per-group sample size estimate |
computed.avepow |
average power |
desired.avepow |
desired average power |
desired.fdr |
desired FDR |
input.pi0 |
proportion of tests with a true null hypothesis |
alpha |
fixed p-value threshold for multiple testing procedure |
n.its |
number of iteration |
max.its |
maximum number of iteration, default is 50 |
n0 |
lower limit for initial sample size range |
n1 |
upper limit for initial sample size range |
For the test with power calculation based on asymptotic normal approximation, we suggest checking FDRsamplesize2
calculation by simulation.
set.seed(1234); p1 = sample(seq(0,0.5,0.1),40,replace = TRUE); p2 = sample(seq(0.5,1,0.1),40,replace = TRUE); n.fdr.twoprop(fdr = 0.1, pwr = 0.8, p1 = p1, p2 = p2, alternative = "two.sided", pi0.hat = "BH")
set.seed(1234); p1 = sample(seq(0,0.5,0.1),40,replace = TRUE); p2 = sample(seq(0.5,1,0.1),40,replace = TRUE); n.fdr.twoprop(fdr = 0.1, pwr = 0.8, p1 = p1, p2 = p2, alternative = "two.sided", pi0.hat = "BH")
Use the formula of Hsieh and Lavori (2000) to compute the power of a single-predictor Cox model, which is based on asymptotic normal approximation.
power.cox(n, alpha, logHR, v)
power.cox(n, alpha, logHR, v)
n |
number of events (scalar) |
alpha |
p-value threshold (scalar) |
logHR |
log hazard ratio (vector) |
v |
variance of predictor variable (vector) |
Vector of power estimates for two-sided test
Hsieh, FY and Lavori, Philip W (2000) Sample-size calculations for the Cox proportional hazards regression model with non-binary covariates. Controlled Clinical Trials 21(6):552-560.
logHR = log(rep(c(1, 2),c(900, 100))); v = rep(1, 1000); res = power.cox(n = 50,alpha = 0.05,logHR = logHR, v = v)
logHR = log(rep(c(1, 2),c(900, 100))); v = rep(1, 1000); res = power.cox(n = 50,alpha = 0.05,logHR = logHR, v = v)
Compute power for Fisher's exact test
power.fisher(p1, p2, n, alpha, alternative)
power.fisher(p1, p2, n, alpha, alternative)
p1 |
probability in one group (scalar) |
p2 |
probability in other group (scalar) |
n |
per-group sample size (scalar) |
alpha |
p-value threshold (scalar) |
alternative |
one- or two-sided test, must be one of "greater", "less", or "two.sided" |
Power estimate for one- or two-sided tests
power.fisher(p1 = 0.5, p2 = 0.9, n=20, alpha = 0.05, alternative = 'two.sided')
power.fisher(p1 = 0.5, p2 = 0.9, n=20, alpha = 0.05, alternative = 'two.sided')
Use the formula of Hart et al (2013) to compute power for comparing RNA-seq expression across two groups assuming a Negative Binomial distribution. The power calculation is based on asymptotic normal approximation.
power.hart(n, alpha, log.fc, mu, sig)
power.hart(n, alpha, log.fc, mu, sig)
n |
per-group sample size (scalar) |
alpha |
p-value threshold (scalar) |
log.fc |
log fold-change (vector), usual null hypothesis is log.fc=0 |
mu |
read depth per gene (vector, same length as log.fc) |
sig |
coefficient of variation (CV) per gene (vector, same length as log.fc) |
This function is based on equation (1) of Hart et al (2013). It assumes a Negative Binomial model for RNA-seq read counts and equal sample size per group.
Vector of power estimates for the set of two-sided tests
SN Hart, TM Therneau, Y Zhang, GA Poland, and J-P Kocher (2013). Calculating Sample Size Estimates for RNA Sequencing Data. Journal of Computational Biology 20: 970-978.
n.hart = 2*(qnorm(0.975)+qnorm(0.9))^2*(1/20+0.6^2)/(log(2)^2) # Equation (6) of Hart et al power.hart(n.hart,0.05,log(2),20,0.6) # Recapitulate 90% power
n.hart = 2*(qnorm(0.975)+qnorm(0.9))^2*(1/20+0.6^2)/(log(2)^2) # Equation (6) of Hart et al power.hart(n.hart,0.05,log(2),20,0.6) # Recapitulate 90% power
Use the formula of Li et al (2013) to compute power for comparing RNA-seq expression across two groups assuming the Poisson distribution
power.li(n, alpha, rho, mu0, w, type)
power.li(n, alpha, rho, mu0, w, type)
n |
per-group sample size |
alpha |
p-value threshold (scalar) |
rho |
fold-change, usual null hypothesis is that rho=1 (vector) |
mu0 |
average count in control group |
w |
ratio of the total number of reads mapped between the two groups |
type |
type of test: "w" for Wald, "s" for score, "lw" for log-transformed Wald, "ls" for log-transformed score |
This function computes the power for each of a series of two-sided tests defined by the input parameters. The power is based on the sample size formulas in equations (10-13) of Li et al (2013). Also, note that the null.effect is set to 1 in the examples because the usual null hypothesis is that the fold-change = 1.
Vector of power estimates for two-sided tests
C-I Li, P-F Su, Y Guo, and Y Shyr (2013). Sample size calculation for differential expression analysis of RNA-seq data under Poisson distribution. Int J Comput Biol Drug Des 6(4). <doi:10.1504/IJCBDD.2013.056830>
power.li(n = 88, alpha = 0.05, rho = 1.25, mu0 = 5, w = 0.5,type = "w") # recapitulate 80% power in Table 1 of Li et al (2013)
power.li(n = 88, alpha = 0.05, rho = 1.25, mu0 = 5, w = 0.5,type = "w") # recapitulate 80% power in Table 1 of Li et al (2013)
Compute power of one-way ANOVA; Uses classical power formula for ANOVA; Assumes equal variance and sample size
power.oneway(n, alpha, theta, k = 2)
power.oneway(n, alpha, theta, k = 2)
n |
per-group sample size (scalar) |
alpha |
p-value threshold (scalar) |
theta |
sum of ((group mean - overall mean)/stdev)^2 across all groups for each hypothesis test(vector) |
k |
the number of groups to be compared, default k=2 |
For many applications, the null effect is zero for the parameter theta described above
Vector of power estimates for test of equal means
theta=rep(c(2,0),c(100,900)); res = power.oneway(n = 50, alpha = 0.05, theta = theta, k = 2)
theta=rep(c(2,0),c(100,900)); res = power.oneway(n = 50, alpha = 0.05, theta = theta, k = 2)
Compute power of rank-sum test; Uses formula of Noether (JASA 1987), which is based on asymptotic normal approximation.
power.ranksum(n, alpha, p)
power.ranksum(n, alpha, p)
n |
sample size (scalar) |
alpha |
p-value threshold (scalar) |
p |
Pr(Y>X), as in Noether (JASA 1987) |
In most applications, the null effect size will be designated by p = 0.5
Vector of power estimates for two-sided tests
Noether, Gottfried E (1987) Sample size determination for some common nonparametric tests. Journal of the American Statistical Association, 82:645-647.
p = rep(c(0.8,0.5),c(100,900)) res = power.ranksum(n = 50, alpha = 0.5, p=p)
p = rep(c(0.8,0.5),c(100,900)) res = power.ranksum(n = 50, alpha = 0.5, p=p)
Use the Noether (1987) formula to compute the power of the signed-rank test, which is based on asymptotic normal approximation.
power.signrank(n, alpha, p1, p2)
power.signrank(n, alpha, p1, p2)
n |
sample size (scalar) |
alpha |
p-value threshold (scalar) |
p1 |
Pr(X>0), as in Noether (JASA 1987) |
p2 |
Pr(X+X'>0), as in Noether (JASA 1987) |
In most applications, the null effect size will be designated by p1 = p2 = 0.5
Vector of power estimates for two-sided tests
Noether, Gottfried E (1987) Sample size determination for some common nonparametric tests. Journal of the American Statistical Association, 82:645-647.
p1 = rep(c(0.8,0.5),c(100,900)); p2 = rep(c(0.8,0.5),c(100,900)); res = power.signrank(n = 50, alpha = 0.05, p1 = p1, p2 = p2)
p1 = rep(c(0.8,0.5),c(100,900)); p2 = rep(c(0.8,0.5),c(100,900)); res = power.signrank(n = 50, alpha = 0.05, p1 = p1, p2 = p2)
Use the Noether (1987) formula to compute the power of the sign test, which is based on asymptotic normal approximation.
power.signtest(n, alpha, p)
power.signtest(n, alpha, p)
n |
sample size (scalar) |
alpha |
p-value threshold (scalar) |
p |
Pr(X>0), as in Noether (JASA 1987) |
In most applications, the null effect size will be designated by p = 0.5
Vector of power estimates for two-sided tests
Noether, Gottfried E (1987) Sample size determination for some common nonparametric tests. Journal of the American Statistical Association, 82:645-647.
p = rep(c(0.8,0.5),c(100,900)); res = power.signtest(n = 50, alpha = 0.05, p = p)
p = rep(c(0.8,0.5),c(100,900)); res = power.signtest(n = 50, alpha = 0.05, p = p)
Compute power of the t-test for non-zero correlation
power.tcorr(n, alpha, rho)
power.tcorr(n, alpha, rho)
n |
sample size (scalar) |
alpha |
p-value threshold (scalar) |
rho |
population correlation coefficient (vector) |
For many applications, the null.effect is rho = 0
Vector of power estimates for two-sided tests
rho = rep(c(0.3,0),c(100,900)); res = power.tcorr(n = 50, alpha = 0.05, rho = rho)
rho = rep(c(0.3,0),c(100,900)); res = power.tcorr(n = 50, alpha = 0.05, rho = rho)