A general method for controlling the genome-wide type i error rate in linkage and association mapping experiments in plants
A general method for controlling the genome-wide type i error rate in linkage and association mapping experiments in plants"
- Select a language for the TTS:
- UK English Female
- UK English Male
- US English Female
- US English Male
- Australian Female
- Australian Male
- Language selected: (auto detect) - EN
Play all audios:
ABSTRACT Control of the genome-wide type I error rate (GWER) is an important issue in association mapping and linkage mapping experiments. For the latter, different approaches, such as
permutation procedures or Bonferroni correction, were proposed. The permutation test, however, cannot account for population structure present in most association mapping populations. This
can lead to false positive associations. The Bonferroni correction is applicable, but usually on the conservative side, because correlation of tests cannot be exploited. Therefore, a new
approach is proposed, which controls the genome-wide error rate, while accounting for population structure. This approach is based on a simulation procedure that is equally applicable in a
linkage and an association-mapping context. Using the parameter settings of three real data sets, it is shown that the procedure provides control of the GWER and the generalized genome-wide
type I error rate (GWER_k_). SIMILAR CONTENT BEING VIEWED BY OTHERS ASSESSMENT OF TWO STATISTICAL APPROACHES FOR VARIANCE GENOME-WIDE ASSOCIATION STUDIES IN PLANTS Article 10 May 2022
EFFICIENCY OF MAPPING EPISTATIC QUANTITATIVE TRAIT LOCI Article 08 May 2023 ROBUSTIFICATION OF GWAS TO EXPLORE EFFECTIVE SNPS ADDRESSING THE CHALLENGES OF HIDDEN POPULATION STRATIFICATION
AND POLYGENIC EFFECTS Article Open access 22 June 2021 INTRODUCTION Of central importance for marker-assisted selection is the estimation of positions and effects of quantitative trait loci
(QTL). Two of the most commonly used tools for estimating the position of QTL are classical linkage mapping (Lander and Botstein, 1989) and association mapping (Bodmer, 1986; Thornsberry et
al., 2001; Yu et al., 2006; Sun et al., 2010). The difference between both methods is that in linkage mapping, there are only a few opportunities for recombination to occur within families
and pedigrees with known ancestry. This results in a relatively low mapping resolution (Flint-Garcia et al., 2003). By contrast, for association mapping, historical recombination and natural
genetic diversity of the different populations lead to a higher mapping resolution (Ersoz et al., 2008; Zhu et al., 2008). The resolution of association mapping depends on the structure and
degree of linkage disequilibrium across the genome. Linkage disequilibrium caused by population structure and familial relatedness lead to false positive results if not controlled correctly
in the statistical analysis (Pritchard et al., 2000; Yu et al., 2006). Genetic and non-genetic factors, like recombination, drift and selection, affect the structure of linkage
disequilibrium (Stich et al., 2005). To overcome these problems and to reduce the effect of the population structure, several procedures have been proposed, including the logistic regression
ratio test (_Q_ model) (Pritchard et al., 2000; Thornsberry et al., 2001), linear mixed models with effects for subpopulations (Breseghello and Sorrels, 2006) and a unified mixed model
approach (_QK_ model) (Yu et al., 2006). In the _QK_ mixed model, Bayesian clustering (Pritchard et al., 2000) is used to estimate probabilities for subpopulation membership (matrix _Q_),
which are used to fit fixed effects, whereas random effects are fitted with covariance proportional to the relative kinship matrix _K_ (Hardy and Vekemans, 2002). Both _Q_ and _K_ account
for population structure when scanning for marker trait association (Yu et al., 2006). One major concern in the context of both linkage and association mapping studies is the statistical
power and the control of false positive associations (type I error rate). A false positive association occurs when a significant QTL is declared where none really exists. A genome-wide type
I error occurs if at least one false QTL is declared. In both linkage and association mapping, multiple testing needs to be accounted for to control the genome-wide type I error rate (GWER).
Different methods were proposed for linkage mapping to control the GWER. Traditionally, the type I error rate has been controlled by a Bonferroni correction. This correction is conservative
and sacrifices statistical power because it cannot exploit the correlation structure among the multiple tests. Several alternative analytical methods have been proposed (Davies, 1977;
Lander and Botstein, 1989; Feingold et al., 1993; Rebai et al., 1994; Dupuis and Siegmund, 1999; Piepho, 2001; Li and Ji, 2005) that exploit the correlation structure of multiple tests on
the same chromosome. A further approach to control the GWER commonly used in linkage mapping is the permutation test of Churchill and Doerge (1994) and Doerge and Churchill (1996). This
approach depends on no distributional assumption and is characterized by simplicity and applicability to different experimental populations. In this approach, the trait values are permutated
relative to the genotypic data. A disadvantage of the permutation test procedure is the computational workload. To compute a critical threshold for a GWER of 0.01, 10 000 permutations of
the trait values are necessary, in which for a GWER of 0.05, 1000 permutations are recommended (Churchill and Doerge, 1994). Although permutation testing is the standard method in linkage
mapping, it is not applicable in an association mapping context because permutation would destroy any correlations between trait and population structure (Aulchencko et al., 2007). This
would be inappropriate because a valid test must control for any such structure. Furthermore, analytical methods as proposed for linkage mapping are not available for association mapping.
Another error rate that has been used for linkage mapping and association mapping is the false discovery rate (FDR). Loosely speaking, FDR is the ratio of false positives among detections.
This approach was proposed by Benjamini and Hochberg (1995) and for genome-wide studies by Storey and Tibshirani (2003). The popularity of the FDR stems from the fact that it leads to more
liberal thresholds than the GWER. Chen and Storey (2006), however, have shown that it is difficult to interpret the FDR when applied to genome-wide linkage scans, because the FDR counts
multiple true discoveries as being distinct even though they are from the same underlying gene (De Silva and Ball, 2007). As the marker density applied for association mapping studies will
dramatically increase in the near future (Donnelly, 2008), the FDR does not seem to be an appropriate error rate concept for association studies. For this reason, we will not consider it
further. Use of the GWER can lead to conservative tests, if there are numerous QTL. Control of the GWER requires that not a single false positive result occurs among all tests, and it may be
argued that this requirement is too stringent in the presence of many QTL. Therefore, Chen and Storey (2006) proposed to relax the definition of GWER by allowing a small number _k_>0 of
false positives, the so-called generalized genome-wide _k_-error rate (GWER_k_). The usual GWER corresponds to _k_=0. In this study a new approach for controlling both GWER and GWER_k_ is
proposed. This method, which is based on simulation, is equally applicable in linkage and association mapping. In the simulation procedure, _S_ random samples from the same multivariate
normal distribution are generated under the null hypothesis. For each sample, the test statistic is calculated for each QTL. The critical value, which is used as threshold for controlling
GWER_k_, is given by the _α_-quantile of the simulated distribution of _S_ values of the (_k_+1)th smallest _P-_value. The simulation reflects both the population structure and the
correlation of tests. The performance of the method is assessed for three different real data sets regarding different GWER_k_ (_k_=0, 1, 2 and 5). MATERIALS AND METHODS PLANT MATERIALS,
PHENOTYPIC DATA AND MOLECULAR MARKERS To assess the performance of our method, we used three empirical data sets that were described in detail by Stich et al. (2008) (winter wheat) and by
Stich and Melchinger (2009) (sugar beet and rapeseed). WINTER WHEAT A total of 303 winter wheat genotypes (_Triticum aestivum_) developed by KWS Lochow GmbH (Bergen-Wohlde, Germany) was used
for this study. The entries were evaluated for grain yield in a series of five breeding trials at four to six locations, with the number of entries per trial ranging from 36 to 110. All 303
inbreds were fingerprinted by KWS Lochow GmbH following standard protocols with 36 simple sequence repeat markers and one single nucleotide polymorphism marker. The 37 marker loci were
randomly distributed across 19 of the 21 wheat chromosomes. SUGAR BEET A total of 178 sugar beet inbreds (_Beta vulgaris_) of the pollen parent heterotic pool of the KWS SAAT AG (Einbeck,
Germany) were analyzed. The test-cross progenies of these entries with an inbred of the seed parent heterotic pool were evaluated in a series of plant breeding trials. Data were recorded
among others for beet yield. All entries were fingerprinted with 59 simple sequence repeat markers and 41 single nucleotide polymorphism marker, both randomly distributed across the sugar
beet genome. The fingerprinting was done by the KWS SAAT AG following standard protocols. RAPESEED A total of 136 rapeseed (_Brassica napus_) inbreds of the Norddeutsche Pflanzenzucht
Hans-Georg Lembke KG (Holtsee, Germany) were studied. All entries were evaluated in a series of field trials, in which data were collected for thousand-kernel weight. Furthermore, all
entries were fingerprinted with 59 genome-wide distributed simple sequence repeat markers by Saaten-Union Resistenzlabor GmbH (Hovedissen, Germany) following standard protocols. STATISTICAL
ANALYSES PHENOTYPIC DATA ANALYSES In the study of Stich et al. (2008) the empirical type I error rates of association mapping approaches, which were based on adjusted entry means calculated
by a two-step analysis, were only slightly higher than that of approaches in which the phenotypic data analysis and the association analysis were performed in one step (one-step analysis)
(also see Möhring and Piepho, 2009). We therefore calculated adjusted entry means (winter wheat and rapeseed) or entry means (sugar beet) in the first step (for more details, see Stich et
al., 2008; Stich and Melchinger, 2009) for each entry under consideration. These estimates were then used in a second step for the association analyses. POPULATION STRUCTURE ANALYSES For
each of the three above mentioned data sets, the kinship matrix _K_ was calculated based on the available marker data using the software package SPAGeDi (Hardy and Vekemans, 2002), in which
negative kinship values between entries were set to 0. We used the first _p_ principal components of an allele frequency matrix (PC-matrix) instead of the _Q_ matrix of STRUCTURE (Pritchard
et al., 2000), as previous studies suggested that both methods are comparable with respect to adherence to the nominal _α_ level, but the former requires much less computational effort (Yu
et al., 2006; Zhao et al., 2007). The explained variance of the first _p_ principal components was about 25% (Stich and Melchinger, 2009). METHOD FOR CONTROLLING GWER To scan the genome for
QTL in linkage mapping or association mapping, we use a mixed linear model to represent the phenotypic data. At each putative QTL position/marker, we test the null hypothesis of no QTL
effect. Under this hypothesis, the null model for genotype means can be written as where _Y′_=(_y_1, _y_2,…,_y__G_), _y__i_ is the mean of the _i_-th genotype (_i_=1,…,_G_), _Β_0 is a vector
of fixed effects, _X_0 is the corresponding design matrix and _E_ is a random residual. In association mapping, _X_0 might represent the probabilities of subpopulation membership (_Q_
matrix) or PC-matrix of allele frequencies and, possibly, cofactors accounting for major background QTL, whereas _E_ models genetic correlation due to coancestry and identically distributed
noise, that is, var(_E_)=_V_=_2_ _A_ _σ__A_2+_I_ _σ__2_, where _A_ is the numerator relationship matrix. Alternatively, _A_ could be replaced by the kinship matrix _K_ (Yu et al., 2006),
which was done in this study. For the rapeseed data set, _E_ models _var_(_E_)=_V_=_I_ _σ_2, because _A_ was similar to _I_ and no changes were visible in the log likelihoods when fitting
the full model including _A_. To test the null hypothesis at the _q_th putative position (_q_=1, 2, …, _Z_), we augment the null model by where _A__Q_ is the vector of fixed genetic effects
at the _q_th putative position and _W__Q_ is the associated design matrix. Notably, the dimension of _A__Q_ may vary among markers, depending on the genetic model and the number of alleles
per marker. Furthermore, we need to cater for the possibility that marker information may be missing, especially in association mapping, in which imputation is not straightforward. The
approach taken in this study is to simply discard records of individuals with missing information at the _q_th marker when testing the _q_th marker, meaning that different subsets of the
data will be used for different markers. We therefore add a subscript _q_ also to the data vector _Y_, writing _Y__Q_. Thus, _Y__Q_ contains all records with complete data at the _q_th
marker. Consequently, the design matrix _W__Q_ will have rows only for observations in _Y__Q_. The marker-specific data vector may be formally defined as follows. Let _B_ be a _G_ × _Z_
indicator matrix of zeros and ones, with rows corresponding to genotypes and columns to markers, reflecting the missing data pattern and let _D__Q_ be computed by diag(_B__Q_), deleting all
rows that have zeros only where _B__Q_ is the _q_th column of _B_. We then have _D__Q_ selects from _Y_ all observations that have complete data for the _q_th marker. The reduced data vector
_Y__Q_ has variance The full model can be written compactly as where _X__Q_=(_D__Q__X_0,_W__Q_) and _Β_ _′__Q_=(_Β_ _′_0,_Α_ _′__Q_). The null hypothesis at the _q_th position can be stated
as where _H__Q_ is a suitable matrix of known constants. The size and form of _H__Q_ depend on the putative position _q_, for example, on the number of marker alleles. Furthermore, the null
hypothesis pertains to _Α__Q_ only, that is, _H__Q_=(0_Q_ _H̃__Q_), where 0_Q_ is a null matrix with number of columns corresponding to those of _D__Q__X_0 and _H̃__Q_ states the null
hypothesis pertaining to _Α__Q_. For example, when _H_0 states equality of all additive allele effects at a locus, then _H̃__Q_=(_I__N(Q)_, −1_N(Q)_), where _n_(_q_) equals the number of
marker alleles minus one. Thus, When _V_ is known, the Wald statistic where Β̂_Q_=(_X_ _′__Q__V_−1_QQ__X__Q_)−_X_ _′__Q__V__QQ_−1_Y__Q_, has an exact central Χ2-distribution with
_rank_(_H__Q_) degrees of freedom. In practice, _V_ needs to be estimated from the data based on the null model (1). In this case, one may use the Kenward–Roger method to approximate the
distribution of _T__q_. Provided the number of genotypes _G_ is not small, Equation (5) will have an approximate _Χ__2_-distribution. We expect the approximation to be very accurate in most
practical cases, so long as the number of genotypes is not very small (for example, <50). SIMULATION OF THE JOINT DISTRIBUTION OF _T_1, _T_2, …, _T__Z_ It is convenient to re-write _T__q_
as where and Under the global null hypothesis the joint distribution of is multivariate normal with zero mean and variance–covariance matrix where _M__QQ_′=cov(. This result is explained in
more detail in the Appendix. Notably, when _q_=_q_′, then Equation (9) simplifies to Equation (7). For simulating , it is convenient to compute Equation (9), obtain a decomposition , where
the number of columns in _P_ equals the rank _r_ of var (), store _P_ in memory during iterations, and at each iteration simulate as , where _U__SIM_ is a vector of _r_-independent standard
normal deviates. We can use the singular value decomposition where _F_ is a diagonal matrix, first diagonal elements of which are the _r_ non-zero singular values of var (), whereas the
remaining ones are zero. We can then choose _P_=(_U_ _√_ _F_)_r_, where (_M_)_r_ is given by the first _r_ columns of _M_. To compute a critical threshold for the Wald tests controlling the
GWER at level _α_, we may generate _S_ random samples _Γ̂__SIM_ from this same multivariate normal distribution. For each sample, we compute the corresponding test statistics _T__q_
(_q_=1,…,_Z_). As test statistics _T__q_ may involve hypotheses with differing degrees of freedom for different _q_, we convert each _T__q_ to the point-wise _P_-value _p__q_ based on a
_χ__2_ distribution with degrees of freedom equal to rank(_H__Q_). Conversion to _P_-values allows us to use the same rejection region for all QTL (Storey, 2002). Subsequently we determine
the minimum of _p__q_ across positions (_p__q_(_min_)). The critical value is given by the _α_-quantile of the simulated distribution of _S_ values of _p__min_. The approach can be extended
further using the GWER_k_ approach of Chen and Storey (2006), which defines a genome-wide error to occur when more than _k_ point-wise tests are falsely declared significant. In this more
general case, the (_k_+1)th lowest _p__q_ across positions is determined in each simulation run. Notably the ordinary GWER corresponds to _k_=0. SIMULATION STUDY The performance of the above
method is verified by simulation. As the method for determining the threshold is also based on simulation, there are two levels of simulation: (1) an inner simulation that generates the
thresholds for a given data set, and (2) an outer simulation that generates data to be analyzed by a mixed model. The simulation scheme can be described as follows: Do _i_=1 to _n_
(_n_=number of outer loops) (a) Generate a data set _Y__SIM_ from a multivariate normal distribution with zero mean, using restricted maximum likelihood estimates of _V_ of a real data set.
(b) Determine threshold based on simulation with _S_ runs of the inner loops, using _Y__SIM_ and _X_0 and _W__Q_ (_q_=1, …, _Z_) from real data set. (c) Evaluate significance tests for scan
of _i_th simulated data set _Y__SIM_ and determine the (_k_+1)th ordered _P_-value across the positions. End Determine the threshold _P_-value for GWER_k_=_α_ as the _α_-quantile of the
_n_(_k_+1)th ordered _P_-values. To start a simulation, we analyze a real data set under the global _H_0 based on model (1), obtain an estimate of _V_ and then compute its Cholesky
decomposition according to _V_=_LL_ _′_. In each run of the outer loop, we then simulate data under the global _H_0 as where _V_ is a vector of independent standard normal deviates. The same
_L_ is used in all iterations of the outer loop, so _L_ needs to be stored throughout the whole simulation. RESULTS The proposed method for controlling the GWER_k_ (Chen and Storey, 2006)
was tested on three empirical data sets of commercial plant-breeding programs. The threshold computation and the analysis of the PC-K mixed model were repeated 1000 times, meaning there were
1000 inner simulations and 1000 outer simulations. Notably, for a test to be declared significant, the _P_-value had to remain below the threshold _P_-value. At a nominal error rate of 5%,
a 95% prediction interval for the observed error rate has lower limit of 3.65% and upper limit of 6.35% when 1000 runs converged. Thus, an observed error rate should not exceed 64 cases or
fall below 36 cases of the 1000 simulations if tests control _α_ exactly. For the sugar beet data set only 978 outer simulations converged. The 95% prediction interval therefore has a lower
limit of 3.63% and an upper limit of 6.37%. We also computed Bonferroni-adjusted prediction intervals based on the 12 cases studied (Table 1 ). For 1000 runs and for the 978 runs of the
sugar beet data set, the Bonferroni adjusted limits are 30 and 70, respectively. The empirical error rates for the GWER_k_ are given in Table 1. The nominal GWER could be maintained for the
winter wheat data set of KWS Lochow. For GWER_k_=0 in 6.0% of simulations, the critical threshold was higher than the _P_-values of the PC-K mixed models. The threshold for GWER_k_=0 was
0.00139, which is higher than the Bonferroni-corrected threshold (0.00135). The extension of Chen and Storey (2006) led to further reduction of times the critical threshold was higher than
the _P_-values of the PC-K mixed models. The critical threshold was passed for GWER_k_=1 in 4.9% of the simulations, for GWER_k_=2 in 3.6%, and for the GWER_k_=5 in 2.2% of the simulations
(Table 1). For the sugar beet data set of KWS, the nominal GWER could be kept. In 6.1% of the simulations for GWER_k_=0, the threshold was higher than the _P_-values of the PC-K mixed model.
The threshold for GWER_k_=0 was 0.00052 and therefore higher than the threshold corrected by the Bonferroni method (0.00050). Furthermore, for the modified GWER_k_ with _k_=1, 2 and 5 the
nominal rate of 5% could be maintained. In 5.4% of the simulations, a type I error occurred for GWER_k_=1, in 4.7% of the simulations for the GWER_k_=2 and in 2.4% of the simulations for the
GWER_k_=5 (Table 1). Our method could also satisfactorily control the nominal GWER for the third data set. For the rapeseed data set of Norddeutsche Pflanzenzucht, the threshold for the
GWER_k_=0 was higher than the _P_-values of the PC-K mixed model in 6.3% of the simulations. The threshold for GWER_k_=0 was 0.000937 and therefore also higher than the Bonferroni-corrected
threshold that had the value 0.000847. For the GWER_k_=1, the empirical error rate was 5.0%; for GWER_k_=2 it was 3.3%. The empirical GWER_k_=5 was 2.7% (Table 1). DISCUSSION Error rates for
controlling the multiple testing in linkage and association mapping experiments include the FDR, which was proposed by Benjamini and Hochberg (1995) and Storey and Tibshirani (2003), and
the GWER and its extension GWER_k_, which was proposed by Chen and Storey (2006). For linkage mapping, different approaches were proposed, which control the GWER, like the Bonferroni
correction, the permutation procedure (Churchill and Doerge, 1994; Doerge and Churchill, 1996) and several analytical methods for specific population structures (Davies, 1977; Lander and
Botstein, 1989; Feingold et al., 1993; Rebai et al., 1994; Dupuis and Siegmund, 1999; Piepho, 2001). Thus, at present there do not seem to be tailor-made methods for controlling GWER for
association mapping experiments. This study has proposed a simulation-based approach for controlling the type I error rate, which includes the information of the population structure. The
approach is akin to that proposed by Edwards and Berry (1987) in the context of multiple mean comparisons in linear models, and it is also similar in spirit to the method of Zou et al.
(2004) in the context of linkage mapping. The simulation approach can also be regarded as a parametric bootstrap procedure (Efron and Tibshirani, 1993). The simulations of the proposed
method based on the three commercial plant breeding data sets have shown that the calculated thresholds provide reasonable, slightly conservative control of the genome-wide type I error
rate. An advantage of our proposed method over the permutation procedure of Churchill and Doerge (1994) is that the information of the population structure is accounted for in our threshold
computation. The associations between trait and population structure are not destroyed like for the permutations procedure. Aulchenko et al., 2007 proposed an approach, in which residuals
from a mixed model fit ignoring markers, but corrected for family effects are used for the permutations test. The method was developed in an animal breeding context for genetically
homogeneous populations, but its principles could be applied to the more general setting considered here. Residuals from a mixed model fit will typically display correlation and
heteroscedasticity arising from the estimation of model effects, which may affect the performance of the method. Our procedure does not have these limitations, because the null distribution
is simulated rather than computed from permutations. Li and Ji (2005), Seaman and Müller-Myhsok (2005) and Conneely and Boehnke (2007) suggested methods to adjust the _P_-value regarding the
correlation structure of the markers. These approaches are therefore similar to our approach; but they do not account for population structure. Moreover, the approaches of Seaman and
Müller-Myhsok (2005) and Conneely and Boehnke (2007) need imputation, if there are missing values in the marker data. The occurrence of missing values can be handled without imputation by
our proposed method. For the three data sets used in this study, the computation time for one approximate threshold was 1 min and 23 s for the rapeseed data set up to 9 min and 20 s for the
sugar beet data set (Intel Pentium Dual central processing unit, 2.20 GHZ, 1.95 GB random access memory). The computational time depends on the number of markers and on the number of
genotypes. The computational time increases mainly due to the generation of the matrix _M_, if there are more markers. Furthermore, the computational time is increased by the number of
genotypes because mixed model analysis takes longer time. The computational time could be reduced, if necessary, by performing threshold computation separately for each chromosome and using
a Bonferroni correction across chromosomes (Piepho, 2001). Moreover, when the number of markers by far exceeds the number of genotypes, it will be computationally more efficient to simulate
data _Y_ instead of test statistics _T__q_ (Supplementary Information). REFERENCES * Aulchenko YS, de Koning DJ, Haley C (2007). Genomewide rapid association using mixed model and
regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. _Genetics_ 177: 577–585. Article CAS PubMed PubMed Central Google Scholar
* Benjamini Y, Hochberg Y (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. _J R Stat Soc Series B_ 85: 289–300. Google Scholar * Bodmer
WF (1986). Human genetics: the molecular challenge. Cold spring harbour symp. _Quant Biol_ 51: 1–13. Article CAS Google Scholar * Breseghello F, Sorrels ME (2006). Association mapping of
kernel size and milling quality in wheat (_Triticum aestivum_ L.) cultivars. _Genetics_ 172: 1165–1177. Article PubMed PubMed Central Google Scholar * Chen L, Storey JD (2006). Relaxed
significance criteria for linkage analysis. _Genetics_ 173: 2371–2381. Article CAS PubMed PubMed Central Google Scholar * Churchill GA, Doerge RW (1994). Empirical threshold values for
quantitative trait mapping. _Genetics_ 138: 963–971. CAS PubMed PubMed Central Google Scholar * Conneely KN, Boehnke M (2007). So many correlated tests, so little time! Rapid adjustment
of _P_-values for multiple correlated tests. _Am J Hum Genet_ 81: 1158–1168. Article CAS PubMed PubMed Central Google Scholar * Davies RB (1977). Hypothesis testing when a nuisance
parameter is present only under the alternative. _Biometrika_ 64: 247–254. Article Google Scholar * De Silva HN, Ball RD (2007). Linkage disequilibrium mapping concepts. In: Oraguzie NC,
Rikkerink EHA, Gardiner SE, De Silva HN (eds). _Association Mapping in Plants_. Springer: New York, NY, USA. Google Scholar * Doerge RW, Churchill GA (1996). Permutation tests for multiple
loci affecting a quantitative character. _Genetics_ 142: 285–294. CAS PubMed PubMed Central Google Scholar * Donnelly P (2008). Progress and challenges in genome-wide association studies
in humans. _Nature_ 456: 728–731. Article CAS PubMed Google Scholar * Dupuis J, Siegmund D (1999). Statistical methods for mapping quantitative trait loci from a dense set of markers.
_Genetics_ 151: 373–386. CAS PubMed PubMed Central Google Scholar * Edwards D, Berry J (1987). The efficiency of simulation-based multiple comparisons. _Biometrics_ 43: 913–928. Article
CAS PubMed Google Scholar * Efron B, Tibshirani RJ (1993). _An introduction to the bootstrap_. Chapman & Hall, London. Book Google Scholar * Ersoz ES, Yu J, Buckler ES (2008).
Applications of linkage disequilibrium and association mapping in maize. In: Kriz A, Larkins B (eds). _Molecular Genetic Approaches to Maize Improvement_. Springer: Dordrecht, The
Netherlands. Google Scholar * Feingold EP, Brown PO, Siegmund D (1993). Gaussian models for genetic linkage analysis using complete high-resolution maps of identity by descent. _Am J Hum
Genet_ 53: 234–251. CAS PubMed PubMed Central Google Scholar * Flint-Garcia SA, Thornsberry JM, Buckler ES (2003). Structure of linkage disequilibrium in plants. _Ann Rev Plant Biol_ 54:
357–374. Article CAS Google Scholar * Hardy OJ, Vekemans X (2002). SPAGeDI: a versatile computer program to analyse spatial genetic structure at the individual or population levels. _Mol
Ecol Notes_ 2: 618–620. Article Google Scholar * Lander ES, Botstein D (1989). Mapping Mendelian factors underlying quantitative traits using RFLP markers. _Genetics_ 121: 185–199. CAS
PubMed PubMed Central Google Scholar * Li J, Ji L (2005). Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. _Heredity_ 95: 221–227. Article
CAS PubMed Google Scholar * Möhring J, Piepho HP (2009). Comparison of weighting in two-stage analysis of plant breeding trials. _Crop Sci_ 49: 1977–1988. Article Google Scholar *
Piepho HP (2001). A quick method for computing approximate thresholds for quantitative trait loci detection. _Genetics_ 157: 425–432. CAS PubMed PubMed Central Google Scholar * Pritchard
JK, Stephens M, Rosenberg NA, Donnelly P (2000). Association mapping in structured populations. _Am J Hum Genet_ 67: 170–181. Article CAS PubMed PubMed Central Google Scholar * Rebai
A, Goffinet B, Mangin B (1994). Approximate thresholds of interval mapping tests for QTL detection. _Genetics_ 138: 235–240. CAS PubMed PubMed Central Google Scholar * Seaman SR,
Müller-Myhsok B (2005). Rapid simulation of _P_-values for product methods and multiple-testing adjustment in association studies. _Am J Hum Genet_ 76: 399–408. Article CAS PubMed PubMed
Central Google Scholar * Stich B, Melchinger AE (2009). Comparison of mixed-model approaches for association mapping in rapeseed, potato, sugar beet, maize, and _Arabidopsis_. _BMC
Genomics_ 10: 94. Article PubMed PubMed Central Google Scholar * Stich B, Melchinger AE, Frisch M, Maurer HP, Heckenberger M, Reif JC (2005). Linkage disequilibrium in European elite
maize germplasm investigated with SSRs. _Theor Appl Genet_ 111: 723–730. Article PubMed Google Scholar * Stich B, Möhring J, Piepho HP, Heckenberger M, Buckler ES, Melchinger AE (2008).
Comparison of mixed-model approaches for association mapping. _Genetics_ 178: 1745–1754. Article PubMed PubMed Central Google Scholar * Storey JD (2002). A direct approach to false
discovery rates. _J R Stat Soc Ser B Stat Methodol_ 64: 479–498. Article Google Scholar * Storey JD, Tibshirani R (2003). Statistical significance for genomewide studies. _Proc Natl Acad
Sci USA_ 100: 9440–9445. Article CAS PubMed PubMed Central Google Scholar * Sun G, Zhu C, Kramer MH, Yang SS, Song W, Piepho HP _et al_. (2010). Comparing different R2 statistics for
mixed model association mapping. _Heredity_ 105: 333–340. Article CAS PubMed Google Scholar * Thornsberry JM, Goodmann MM, Doebley J, Kresovich S, Nielsen D, Buckler IV ES (2001). Dwarf8
polymorphisms associate with variation in flowering time. _Nat Genet_ 28: 286–289. Article CAS PubMed Google Scholar * Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M, Doebley JF _et
al_. (2006). A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. _Nat Genet_ 38: 203–208. Article CAS PubMed Google Scholar * Zhao J,
Paulo MJ, Jamar D, Lou P, Van Eeuwijk F, Bonnema G _et al_. (2007). Association mapping of leaf traits, flowering time, and phytate content in Brassica rapa. _Genome_ 50: 963–973. Article
CAS PubMed Google Scholar * Zhu C, Gore M, Buckler ES, Yu J (2008). Status and prospects of association mapping in plants. _Plant Genome_ 1: 5–19. Article CAS Google Scholar * Zou F,
Fine JP, Hu J, Lin DY (2004). An efficient resampling method for assessing genome-wide statistical significance in mapping quantitative trait loci. _Genetics_ 168: 2307–2316. Article CAS
PubMed PubMed Central Google Scholar Download references ACKNOWLEDGEMENTS We thank the breeding companies KWS, KWS Lochow and Norddeutsche Pflanzenzucht for providing the data sets within
the GABI BRAIN project. This study was supported by the GABI GAIN project (Grant no FKZ0315072C). AUTHOR INFORMATION AUTHORS AND AFFILIATIONS * Institute for Crop Science, Bioinformatic
Unit, Universität Hohenheim, Stuttgart, Germany B U Müller & H-P Piepho * Max Planck Institute for Plant Breeding Research, Quantitative Crop Genetics, Köln, Germany B Stich Authors * B
U Müller View author publications You can also search for this author inPubMed Google Scholar * B Stich View author publications You can also search for this author inPubMed Google Scholar *
H-P Piepho View author publications You can also search for this author inPubMed Google Scholar CORRESPONDING AUTHOR Correspondence to H-P Piepho. ETHICS DECLARATIONS COMPETING INTERESTS
The authors declare no conflict of interest. ADDITIONAL INFORMATION Supplementary Information accompanies the paper on Heredity website SUPPLEMENTARY INFORMATION PROGRAM FOR THRESHOLD
CALCULATION (DOC 52 KB) DATASET 1-INFORMATION MARKER (XLS 15 KB) DATASET 2-TRAITVALUES (XLS 13 KB) DATASET 3-POPULATION (XLS 13 KB) DATASET 4-AMATRIX (XLS 15 KB)
41437_2011_BFHDY2010125_MOESM31_ESM.PDF A general method for controlling the genome-wide Type I error rate in linkage and association mapping experiments in plants (PDF 89 kb) APPENDIX
APPENDIX Let _Y__Q_=D_Q__Y_ and _Y__Q_′=_D__Q′__Y_. Then where Similarly, noting that with _C__Q_=_H__Q_(_X_ _′__Q__V_−1_QQ__X__Q_)−_X_ _′__Q__V_−1_QQ_, we have and where _M__QQ_
′=_C__Q__V__QQ_ ′_C__Q_′′. Inserting the expression for _C__Q_ we find and RIGHTS AND PERMISSIONS Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Müller, B., Stich, B. &
Piepho, HP. A general method for controlling the genome-wide type I error rate in linkage and association mapping experiments in plants. _Heredity_ 106, 825–831 (2011).
https://doi.org/10.1038/hdy.2010.125 Download citation * Received: 18 January 2010 * Revised: 04 August 2010 * Accepted: 08 August 2010 * Published: 20 October 2010 * Issue Date: May 2011 *
DOI: https://doi.org/10.1038/hdy.2010.125 SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable link Sorry, a shareable link is not
currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative KEYWORDS * association mapping * genome-wide type I error rate *
linkage mapping * mixed model * Monte Carlo simulation * parametric bootstrap
Trending News
Drivers urged to pay car tax ahead of major ved changes next monthThe standard rate will increase by £10 for most cars which were first registered on or after April 1, 2017. For cars reg...
Closer look: venison sandwiches; allergies; and moreCloser Look with Rose Scott November 4, 2016 Friday on “Closer Look with Rose Scott and Jim Burress”: * 0:00: Atlanta Jo...
Children's hospice launches trailblazing transition support role | nursing timesA charity that provides hospice care to seriously ill children in the North West of England is blazing a trail in the se...
Let us now praise independent publishers | thearticleA few days ago, Pushkin Press was awarded the 2022 British Book Award for Independent Publisher of the Year. It was enti...
Spatio-temporal changes in the causal interactions among sustainable development goals in chinaABSTRACT Extensive efforts have been dedicated to deciphering the interactions associated with Sustainable Development G...
Latests News
A general method for controlling the genome-wide type i error rate in linkage and association mapping experiments in plantsABSTRACT Control of the genome-wide type I error rate (GWER) is an important issue in association mapping and linkage ma...
Claudin 6: a novel surface marker for characterizing mouse pluripotent stem cellsDEAR EDITOR, Stem cell surface proteins have been used as markers for isolating and purifying undifferentiated pluripote...
‘succession’s alan ruck puts pedal to the metal for joe biden’s california recall visit_Succession_ isn’t back on HBO for its third season until next month, but the HBO media mogul drama is already driving m...
Natural order | Nature MaterialsAccess through your institution Buy or subscribe Biomimetic materials are often so-called because they mimic the forms a...
Does rishi sunak deserve to be detested? | thearticleNewspaper headlines have made torrid reading for the Chancellor in recent days. They are unlikely to improve in the shor...