Day Night

Study design features increase replicability in brain-wide association studies

Nature

Study design features increase replicability in brain-wide association studies"

Select a language for the TTS:
UK English Female
UK English Male
US English Female
US English Male
Australian Female
Australian Male
Language selected: (auto detect) - EN

Play all audios:

ABSTRACT Brain-wide association studies (BWAS) are a fundamental tool in discovering brain–behaviour associations1,2. Several recent studies have shown that thousands of study participants

are required for good replicability of BWAS1,2,3. Here we performed analyses and meta-analyses of a robust effect size index using 63 longitudinal and cross-sectional MRI studies from the

Lifespan Brain Chart Consortium4 (77,695 total scans) to demonstrate that optimizing study design is critical for increasing standardized effect sizes and replicability in BWAS. A

meta-analysis of brain volume associations with age indicates that BWAS with larger variability of the covariate and longitudinal studies have larger reported standardized effect size.

Analysing age effects on global and regional brain measures from the UK Biobank and the Alzheimer’s Disease Neuroimaging Initiative, we showed that modifying study design through sampling

schemes improves standardized effect sizes and replicability. To ensure that our results are generalizable, we further evaluated the longitudinal sampling schemes on cognitive,

psychopathology and demographic associations with structural and functional brain outcome measures in the Adolescent Brain and Cognitive Development dataset. We demonstrated that commonly

used longitudinal models, which assume equal between-subject and within-subject changes can, counterintuitively, reduce standardized effect sizes and replicability. Explicitly modelling the

between-subject and within-subject effects avoids conflating them and enables optimizing the standardized effect sizes for each separately. Together, these results provide guidance for study

designs that improve the replicability of BWAS. SIMILAR CONTENT BEING VIEWED BY OTHERS REPLICABLE BRAIN–PHENOTYPE ASSOCIATIONS REQUIRE LARGE-SCALE NEUROIMAGING DATA Article 26 June 2023

REPRODUCIBLE BRAIN-WIDE ASSOCIATION STUDIES REQUIRE THOUSANDS OF INDIVIDUALS Article Open access 16 March 2022 BRAIN AGEING IN SCHIZOPHRENIA: EVIDENCE FROM 26 INTERNATIONAL COHORTS VIA THE

ENIGMA SCHIZOPHRENIA CONSORTIUM Article Open access 09 December 2022 MAIN BWAS use non-invasive MRI to identify associations between inter-individual differences in behaviour, cognition,

biological or clinical measurements and brain structure or function1,2. A fundamental goal of BWAS is to identify true underlying biological associations that improve our understanding of

how brain organization and function are linked to health across the lifespan. Recent studies have raised concerns about the replicability of BWAS1,2,3. Statistical replicability is typically

defined as the probability of obtaining consistent results from hypothesis tests across different studies. Like statistical power, replicability is a function of both the standardized

effect size and the sample size5,6,7. Low replicability in BWAS has been attributed to a combination of small sample sizes, small standardized effect sizes and bad research practices (such

as _p_-hacking and publication bias)1,2,8,9,10,11,12. The most obvious solution to increase the replicability in BWAS is to increase study sample sizes. Several recent studies have shown

that thousands of study participants are required to obtain replicable findings in BWAS1,2. However, massive sample sizes are often infeasible in practice. Standardized effect sizes (such as

Pearson’s correlation and Cohen’s _d_) are statistical values that not only depend on the underlying biological association in the population but also on the study design. Two studies of

the same biological effect with different study designs will have different standardized effect sizes. For example, contrasting brain function of groups with depression versus those without

depression will have a different Cohen’s _d_ effect size if the study design measures more extreme depressed states contemporaneously with measures of brain function, as opposed to less

extreme depressed states, even if the underlying biological effect is the same. Although researchers cannot increase the magnitude of the underlying biological association, its standardized

effect size — and thus its replicability — can be increased by critical features of study design. In this study, we focus on identifying modifiable study design features that can be used to

improve the replicability of BWAS by increasing standardized effect sizes. Increasing standardized effect sizes through study design before data collection stands in contrast to bad research

practices that can artificially inflate reported effect sizes, such as _p_-hacking and publication bias. There has been very little research regarding how modifications to the study design

might improve BWAS replicability. Specifically, we focus on two major design features that directly influence standardized effect sizes: variation in sampling scheme and longitudinal

designs1,13,14,15. Of note, these design features can be implemented without inflating the sample estimate of the underlying biological effect when using correctly specified models16. By

increasing the replicability of BWAS through study design, we can more efficiently utilize the US$1.8 billion average annual investment in neuroimaging research from the US National

Institutes of Health (https://reporter.nih.gov/search/_dNnH1VaiEKU_vZLZ7L2xw/projects/charts). Here we conducted a comprehensive investigation of cross-sectional and longitudinal BWAS

designs by capitalizing on multiple large-scale data resources. Specifically, we begin by analysing and meta-analysing 63 neuroimaging datasets including 77,695 scans from 60,900 cognitively

normal participants from the Lifespan Brain Chart Consortium4 (LBCC). We leverage data from the UK Biobank (UKB; up to 29,031 scans), the Alzheimer’s Disease Neuroimaging Initiative (ADNI;

2,232 scans) and the Adolescent Brain Cognitive Development study (ABCD; up to 17,210 scans) to investigate the most commonly measured phenotypes of brain structure and function. To ensure

that our results are broadly generalizable, we evaluated associations with diverse covariates of interest, including age, sex, cognition and psychopathology. To facilitate comparison between

BWAS designs, we also introduce a new version of the robust effect size index (RESI)17,18,19 that allows us to demonstrate how longitudinal study design directly impacts standardized effect

sizes. STANDARDIZED EFFECT SIZES DEPEND ON STUDY DESIGN To fit each study-level analysis, we regressed each of the global brain measures (total grey matter volume (GMV), total subcortical

grey matter volume (sGMV), total white matter volume (WMV) and mean cortical thickness) and regional brain measures (regional GMV and cortical thickness, based on Desikan–Killiany

parcellation20) on sex and age in each of the 63 neuroimaging datasets from the LBCC. Age was modelled using a non-linear spline function in linear regression models for the cross-sectional

datasets and generalized estimating equations (GEEs) for the longitudinal datasets (Methods). Site effects were removed before the regressions using ComBat21,22 (Methods). Analyses for total

GMV, total sGMV and total WMV used all 63 neuroimaging datasets (16 longitudinal; Supplementary Table 1). Analyses of regional brain volumes and cortical thickness used 43 neuroimaging

datasets (13 longitudinal; Methods and Supplementary Table 2). Throughout the present study, we used the RESI17,18,19 as a measure of standardized effect size. The RESI is a recently

developed index that is equal to 1/2 Cohen’s _d_ under the same assumptions for Cohen’s _d_17 (Methods; section 3 in Supplementary Information). We used the RESI as a standardized effect

size because it is broadly applicable to many types of models and is robust to model misspecification. To investigate the effects of study design features on the RESI, we performed

meta-analyses for the four global brain measures and two regional brain measures in the LBCC to model the association of study-level design features with standardized effect sizes. Study

design features are quantified as the sample mean, standard deviation and skewness of the age covariate as non-linear terms, and a binary variable indicating the design type (cross-sectional

or longitudinal). After obtaining the estimates of the standardized effect sizes of age and sex in each analysis of the global and regional brain measures, we conducted meta-analyses of the

estimated standardized effect sizes using weighted linear regression models with study design features as covariates (Methods). For total GMV, the partial regression plots of the effect of

each study design feature demonstrate a strong cubic-shape relationship between the standardized effect size for total GMV–age association and study population mean age. This cubic shape

indicates that the strength of the age effect varies with respect to the age of the population being studied. The largest age effect on total GMV in the human lifespan occurs during early

and late adulthood (Fig. 1a and Supplementary Table 3). There is also a strong positive linear effect of the study population standard deviation of age and the standardized effect size for

total GMV–age association. For each unit increase in the standard deviation of age (in years), the expected standardized effect size increases by about 0.1 (Fig. 1a). This aligns with the

well-known relationship between correlation strength and covariate standard deviation indicated by statistical principles23. Plots for total sGMV, total WMV and mean cortical thickness show

U-shaped changes of the age effect with respect to the study population mean age (Fig. 1b–d). A similar but sometimes weaker relationship is shown between expected standardized effect size

and study population standard deviation and skewness of the age covariate (Fig. 1b–d and Supplementary Tables 4–6). Finally, the meta-analyses also show a moderate effect of study design on

the standardized effect size of age on each of the global brain measures (Fig. 1a–d and Supplementary Tables 3–6). The average standardized effect size for total GMV–age associations in

longitudinal studies (RESI = 0.39) is substantially larger than in cross-sectional studies (RESI = 0.08) after controlling for the study design variables, corresponding to a more than 380%

increase in the standardized effect size for longitudinal studies. This value quantifies the systematic differences in the standardized effect sizes between the cross-sectional and

longitudinal studies among the 63 neuroimaging studies. Of note, longitudinal study design does not improve the standardized effect size for biological sex, because sex does not vary within

participants in these studies (Supplementary Tables 7–10 and Supplementary Fig. 2). For regional GMV and cortical thickness, similar effects of study design features also occur across

regions (Fig. 1e,f; 34 regions per hemisphere). In most of the regions, the standardized effect sizes of age on regional GMV and cortical thickness are strongly associated with the study

population standard deviation of age. Longitudinal study designs generally tend to have a positive effect on the standardized effect sizes for regional GMV–age associations and a positive

but weaker effect on the standardized effect sizes for regional cortical thickness–age associations. To improve the comparability of standardized effect sizes between cross-sectional and

longitudinal studies, we propose a new effect size index: the cross-sectional RESI for longitudinal datasets (section 3 in Supplementary Information). The cross-sectional RESI for

longitudinal datasets represents the RESI in the same study population, if the longitudinal study had been conducted cross-sectionally. This newly developed effect size index allows us to

quantify the benefit of using a longitudinal study design in a single dataset (section 3.3 in Supplementary Information). The meta-analysis results demonstrate that standardized effect sizes

are dependent on study design features, such as mean age, standard deviation of the age of the sample population, and cross-sectional or longitudinal design. Moreover, the results suggest

that modifying study design features, such as increasing variability and conducting longitudinal studies, can increase the standardized effect sizes in BWAS. IMPROVED SAMPLING BOOSTS

REPLICABILITY To investigate the effect of modifying the variability of the age covariate on increasing standardized effect sizes and replicability, we implemented three sampling schemes

that produce different sample standard deviations of the age covariate. We treated the large-scale cross-sectional UKB data as the population and draw samples whose age distributions follow

a pre-specified shape (bell shaped, uniform and U shaped; Methods and Extended Data Fig. 1a). In the UKB, the U-shaped sampling scheme on age increases the standardized effect size for the

total GMV–age association by 60% compared with bell shaped and by 27% compared with uniform (Fig. 2a), with an associated increase in replicability (Fig. 2b). To achieve 80% replicability

for detecting the total GMV–age association (Methods), fewer than 100 participants are sufficient if using the U-shaped sampling scheme, whereas about 200 participants are needed if the

bell-shaped sampling scheme is used (Fig. 2b). A similar pattern can be seen for the regional outcomes of GMV and cortical thickness (Fig. 2c–f). The U-shaped sampling scheme typically

provides the largest standardized effect sizes of age and the highest replicability, followed by the uniform and bell-shaped schemes. The U-shaped sampling scheme shows greater

region-specific improvement in the standardized effect sizes and replicability for regional GMV–age and regional cortical thickness–age associations than the bell-shaped scheme (Extended

Data Fig. 1d,e). To investigate the effect of increasing the variability of the age covariate longitudinally, we implemented sampling schemes to adjust the between-subject and within-subject

variability of age in the bootstrap samples from the longitudinal ADNI dataset. In the bootstrap samples, each participant had two measurements (baseline and a follow-up). To imitate the

true operation of a study, we selected the two measurements of each participant based on baseline age and the follow-up age by targeting specific distributions for the baseline age and the

age change at the follow-up time point (Methods; Extended Data Fig. 1b,c). Increasing between-subject and within-subject variability of age increases the average observed standardized effect

sizes, with corresponding increases in replicability (Fig. 3). A U-shaped between-subject sampling scheme on age increases the standardized effect size for total GMV–age association by

23.6% compared with bell shaped and by 12.1% compared with uniform, when using the uniform within-subject sampling scheme (Fig. 3a). In addition, we investigated the effect of the number of

measurements per participant on the standardized effect size and replicability in longitudinal data using the ADNI dataset. Adding a single additional measurement after the baseline

increases the standardized effect size for total GMV–age association by 156% and replicability by 350%. The benefit of additional measurements is minimal (Fig. 3c,d). Finally, we also

evaluated the effects of the longitudinal sampling schemes on regional GMV and cortical thickness in the ADNI dataset (Fig. 3e–h). When sampling two measurements per participant, the

between-subject and within-subject sampling schemes producing larger age variability increase the standardized effect size and replicability across most regions. Together, these results

suggest that having larger spacing in between-subject and within-subject age measurements increases standardized effect size and replicability. Most of the benefit of the number of

within-subject measurements is due to the first additional measurement after baseline. SAMPLING BENEFIT VARIES BY BRAIN MEASURE As standardized effect sizes for brain–age associations are

often larger than for brain–behaviour associations, we investigated whether the proposed sampling schemes are effective on various non-brain covariates and their associations with structural

and functional brain measures in all participants (with and without neuropsychiatric symptoms) with cross-sectional and longitudinal measurements from the ABCD dataset. The non-brain

covariates include the NIH toolbox24, Child Behavior Checklist (CBCL), body mass index (BMI), birth weight and handedness (Methods; Supplementary Tables 11 and 12). Functional connectivity

is used as a functional brain measure and is computed for all pairs of regions in the Gordon atlas25 (Methods). We used the bell-shaped and U-shaped target sampling distributions to control

the between-subject and within-subject variability of each non-brain covariate (Methods). For each non-brain covariate, we show the results for the four combinations of between-subject and

within-subject sampling schemes. Overall, there is a consistent benefit to increasing between-subject variability of the covariate (Fig. 4 and Extended Data Fig. 2). These preferred sampling

schemes lead to more than 1.8 factor reduction in sample size for 80% replicability and more than 1.4 factor increase in the standardized effect size for over 50% of associations. Moreover,

72% of covariate-outcome associations had increased standardized effect sizes by increasing the between-subject variability of the covariates (Extended Data Fig. 3). Importantly, increasing

within-subject variability decreases the standardized effect sizes for many structural associations (Fig. 4a–f and Extended Data Fig. 2a–f), suggesting that conducting longitudinal analyses

can result in decreased replicability compared with cross-sectional analyses. For the functional connectivity outcomes, there is a slight positive effect of increasing within-subject

variability (Fig. 4g,h and Extended Data Fig. 2g,h). To evaluate the lower replicability of the structural associations with increasing within-subject variability, we compared

cross-sectional standardized effect sizes of the non-brain covariates on each brain measure using the baseline measurements to the standardized effect sizes estimated using the full

longitudinal data (Fig. 5a–d and Extended Data Fig. 4). Consistent with the reduction in standardized effect size by increasing within-subject variability, for most structural associations

(GMV and cortical thickness), conducting cross-sectional analyses using the baseline measurements results in larger standardized effect sizes (and higher replicability) than conducting

analyses using the full longitudinal data. This finding holds when fitting a cross-sectional model only using the 2-year follow-up measurement (Extended Data Fig. 4). Identical results are

found using linear mixed models with individual-specific random intercepts, which are commonly used in BWAS (Supplementary Fig. 3). Together, these results suggest that the benefit of

conducting longitudinal studies and larger within-subject variability is highly dependent on the brain–behaviour association. Counterintuitively, longitudinal designs can reduce the

standardized effect sizes and replicability. ACCURATE LONGITUDINAL MODELS ARE CRUCIAL To investigate why increasing within-subject variability or using longitudinal designs is not beneficial

for some associations, we examined an assumption common to GEEs and linear mixed models in BWAS. These widely used models assume that there is consistent association strength between the

brain measure and non-brain covariate across between-subject and within-subject changes in the non-brain covariate. However, the between-subject and within-subject association strengths can

differ because non-brain measures can be more variable than structural brain measures for various reasons. For example, crystallized composite scores may vary within a participant

longitudinally because of time-of-day effects, lack of sleep or natural noise in the measurement. By contrast, GMV is more precise and it is not vulnerable to other sources of variability

that might accompany the crystallized composite score. This combination leads to a low within-subject association between these variables (Supplementary Table 13). Functional connectivity

measures are more similar to crystallized composite scores in that they are subject to higher within-subject variability and natural noise, so they have a higher potential for stronger

within-subject associations with crystallized composite scores (that is, they are more likely to vary together based on many factors such as time of day and lack of sleep). To demonstrate

this, we fitted models that estimated distinct effects for between-subject and within-subject associations in the ABCD dataset (Methods) and found that there are large between-subject

parameter estimates and small within-subject parameter estimates in total and regional GMV (Fig. 5e, Supplementary Table 13 and section 5.2 in Supplementary Information), whereas the

functional connectivity associations are distributed more evenly across between-subject and within-subject parameters (Fig. 5f). If the between-subject and within-subject associations are

different, these widely used longitudinal models average the two associations (equation (13) in section 5 in Supplementary Information). Fitting these associations separately avoids

averaging the larger effect with the smaller effect and can inform our understanding of brain–behaviour associations (section 5.2 in Supplementary Information). This approach ameliorates the

reduction in standardized effect sizes caused by longitudinal designs for structural brain measures in the ABCD (Extended Data Figs. 5 and 6 and section 5 in Supplementary Information).

This longitudinal model has a similar between-subject standardized effect size to the cross-sectional model (see ‘Estimation of the between-subject and within-subject effects’ in the Methods

section; Extended Data Fig. 6). In short, longitudinal designs can be detrimental to replicability when the between-subject and within-subject effects differ and the model is incorrectly

specified. OPTIMAL DESIGN CONSIDERATIONS With increasing evidence of small standardized effect sizes and low replicability in BWAS, optimizing study design to increase standardized effect

sizes and replicability is a critical prerequisite for progress1,26. Our results demonstrate that standardized effect size and replicability can be increased by enriched sampling of

participants with small and large values of the covariate of interest. This is well known in linear models in which the standardized effect size is explicitly a function of the standard

deviation of the covariate23. We showed that designing a study to have a larger covariate standard deviation increases standardized effect sizes by a median factor of 1.4, even when there is

non-linearity in the association, such as with age and GMV (Supplementary Fig. 1). When the association is very non-monotonic — as in the case of a U-shape relationship between covariate

and outcome — sampling the tails more heavily could decrease replicability and diminish our ability to detect non-linearities in the centre of the study population. In such a case, sampling

to obtain a uniform distribution of the covariate balances power across the range of the covariate and can increase replicability relative to random sampling when the covariate has a normal

distribution in the population. Increasing between-subject variability is beneficial in more than 72% of the association pairs that we studied, despite the presence of such non-linearities

(Extended Data Fig. 3). Because standardized effect sizes are dependent on study design, careful design choices can simultaneously increase standardized effect sizes and study replicability.

Two-phase, extreme group and outcome-dependent sampling designs can inform which participants should be selected for imaging from a larger sample to increase the efficiency and standardized

effect sizes of brain–behaviour associations27,28,29,30,31,32,33. For example, given the high degree of accessibility of cognitive and behavioural testing (for example, to be performed

virtually or electronically), individuals scoring at the extremes on a testing scale or battery (‘phase I’) could be prioritized for subsequent brain scanning (‘phase II’). When there are

multiple covariates of interest, multivariate two-phase designs can be used to increase standardized effect sizes and replicability34. Multivariate designs are also needed to stratify

sampling to avoid confounding by other sociodemographic variables. Together, the use of optimal designs can increase both standardized effect sizes and replicability relative to a design

that uses random sampling31. If desired, weighted regression (such as inverse probability weighting) can be combined with optimized designs to estimate a standardized effect size that is

consistent with the standardized effect size if the study had been conducted in the full population34,35,36. Choosing highly reliable psychometric measurements or interventions (for example,

medications or neuromodulation within a clinical trial)37,38,39 may also be effective for increasing replicability. The decision to pursue an optimized design will depend on other practical

factors, such as the cost and complexity of acquiring other (non-imaging) measures of interest and the specific translational goals of the research. LONGITUDINAL DESIGN CONSIDERATIONS In

the meta-analysis, longitudinal studies of the total GMV–age associations have, on average, more than 380% larger standardized effect sizes than cross-sectional studies. However, in

subsequent analyses, we noticed that the benefit of conducting a longitudinal design is highly dependent on both the between-subject and the within-subject effects. When the between-subject

and the within-subject effects are equal and the within-subject brain measurement error is low, longitudinal studies offer larger standardized effect sizes than cross-sectional studies40

(section 5.1 in Supplementary Information). This combination of equal between-subject and within-subject effects and low within-subject measurement error is the reason that there is a

benefit of longitudinal design in the ADNI for the total GMV–age association (Supplementary Fig. 4). Comparing efficiency per measurement supports the approach of collecting two measurements

per participant in this scenario (section 5.1 in Supplementary Information). Longitudinal models offer the unique ability to separately estimate between-subject and within-subject effects.

When the between-subject and within-subject effects differ but we still fit them with a single effect, we mistakenly assume they are equal, and the interpretation of that coefficient becomes

complicated: the effect becomes a weighted average of the between-subject and within-subject effects whose weights are determined by the study design features (section 5 in Supplementary

Information). The apparent lack of benefit of longitudinal designs in the ABCD on the study of GMV associations is because within-subject changes in the non-brain measures are not associated

with within-subject changes in the GMV (Fig. 5e and Supplementary Table 13). The smaller standardized effect sizes that we found in longitudinal analyses are due to the contribution from

the smaller within-subject effect to the weighted average of the between-subject and within-subject effects (equation (14) in section 5 in Supplementary Information). Fitting the

between-subject and within-subject effects separately prevents averaging the two effects (section 5.2 in Supplementary Information). These two effects are often not directly comparable with

the effect obtained from a cross-sectional model because they have different interpretations41,42,43,44,45 (section 5.2 in Supplementary Information). Using sampling strategies to increase

between-subject and within-subject variability of the covariate will increase the standardized effect sizes for between-subject and within-subject associations, respectively (Extended Data

Fig. 5). DESIGN AND ANALYSIS RECOMMENDATIONS Although it is difficult to provide universal recommendations for study design and analysis, the present study provides general guidelines for

designing and analysing BWAS for optimal standardized effect sizes and replicability based on both empirical and theoretical results (Extended Data Figs. 7 and 8). Although the decision for

a particular design or analysis strategy may depend on unknown features of the brain and non-brain measures and their associations, these characteristics can be evaluated in pilot data or

the analysis dataset (Supplementary Fig. 4 and section 5.2 in Supplementary Information). One general principle that increases standardized effect sizes for most associations is to increase

the covariate standard deviation (for example, through two-phase, extreme group and outcome-dependent sampling), which is practically applicable to a wide range of BWAS contexts.

Longitudinal designs can be helpful and optimal even when the between-subject and within-subject effects differ, if modelled correctly. Moreover, longitudinal BWAS enable us to study

between-subject and within-subject effects separately, and they should be used when the two effects are hypothesized to be different. Although striving for large sample sizes remains

important when designing a study, our findings emphasize the importance of considering other design features to improve standardized effect sizes and replicability of BWAS. METHODS LBCC

DATASET AND PROCESSING The original LBCC dataset included 123,984 MRI scans from 101,457 human participants across more than 100 studies (which include multiple publicly available

datasets46,47,48,49,50,51,52,53,54,55,56) and was described in previous work4 (see Supplementary Information and supplementary table S1 from ref. 4). We filtered to the subset of cognitively

normal participants whose data were processed using FreeSurfer (v6.1). Studies were curated for the analysis by excluding duplicated observations and studies with fewer than 4 unique age

points, sample size less than 20 and/or only participants of one sex. If there were fewer than three participants having longitudinal observations, only the baseline observations were

included and the study was considered cross-sectional. If a participant had changing demographic information during the longitudinal follow-up (for example, changing biological sex), only

the most recent observation was included. We updated the LBCC dataset with the ABCD release 5, resulting in a final dataset that includes 77,695 MRI scans from 60,900 cognitively normal

participants with available total GMV, sGMV and GMV measures across 63 studies (Supplementary Table 1). In this dataset, 74,148 MRI scans from 57,538 participants across 43 studies have

complete-case regional brain measures (regional GMV, regional surface area and regional cortical thickness, based on Desikan–Killiany parcellation20; Supplementary Table 2). The global brain

measure mean cortical thickness was derived using the regional brain measures (see below). STRUCTURAL BRAIN MEASURES Details of data processing have been described in our previous work4. In

brief, total GMV, sGMV and WMV were estimated from T1-weighted and T2-weighted (when available) MRIs using the ‘aseg’ output from FreeSurfer (v6.0.1). All three cerebrum tissue volumes were

extracted from the aseg.stats files output by the recon-all process: ‘Total cortical gray matter volume’ for GMV; ‘Total cerebral white matter volume’ for WMV; and ‘Subcortical gray matter

volume’ for sGMV (inclusive of the thalamus, caudate nucleus, putamen, pallidum, hippocampus, amygdala and nucleus accumbens area; https://freesurfer.net/fswiki/SubcorticalSegmentation).

Regional GMV and cortical thickness across 68 regions (34 per hemisphere, based on Desikan–Killiany parcellation20) were obtained from the aparc.stats files output by the recon-all process.

Mean cortical thickness across the whole brain is the weighted average of the regional cortical thickness weighted by the corresponding regional surface areas. PREPROCESSING SPECIFIC TO ABCD

FUNCTIONAL CONNECTIVITY MEASURES Longitudinal functional connectivity measures were obtained from the ABCD-BIDS community collection, which houses a community-shared and continually updated

ABCD neuroimaging dataset available under Brain Imaging Data Structure (BIDS) standards. The data used in these analyses were processed using the abcd-hcp-pipeline (v0.1.3), an updated

version of The Human Connectome Project MRI pipeline57. In brief, resting-state functional MRI time series were demeaned and detrended, and a generalized linear model was used to regress out

mean white matter, cerebrospinal fluid and global signal, as well as motion variables and then band-pass filtered. High-motion frames (filtered frame displacement > 0.2 mm) were censored

during the demeaning and detrending. After preprocessing, the time series were parcellated using the 352 regions of the Gordon atlas (including 19 subcortical structures) and pairwise

Pearson correlations were computed among the regions. Functional connectivity measures were estimated from resting-state fMRI time series using a minimum of 5 min of data. After Fisher’s

_z_-transformation, the connectivities were averaged across the 24 canonical functional networks25, forming 276 inter-network connectivities and 24 intra-network connectivities. COGNITIVE

AND OTHER COVARIATES The ABCD dataset is a large-scale repository aiming to track the brain and psychological development of over 10,000 children 9–16 years of age by measuring hundreds of

variables, including demographic, physical, cognitive and mental health variables58. We used release 5 of the ABCD study to examine the effect of the sampling schemes on other types of

covariates including cognition (fully corrected _T_-scores of the individual subscales and total composite scores of the NIH Toolbox24), mental health (total problem CBCL syndrome scale) and

other common demographic variables (BMI, birth weight and handedness). For each of the covariates, we evaluated the effect of the sampling schemes on their associations with the global and

regional structural brain measures and functional connectivity after controlling for non-linear age and sex (and for functional connectivity outcomes only, mean frame displacement). For the

analyses of structural brain measures, there were three non-brain covariates with fewer than 5% non-missing follow-ups at both 2-year and 4-year follow-ups (that is, the Dimensional Change

Card Sort Test, Cognition Fluid Composite and Cognition Total Composite Score; Supplementary Table 11), and only their baseline cognitive measurements were included in the analyses. For the

remaining 11 variables (that is, the Picture Vocabulary Test, Flanker Inhibitory Control and Attention Test, List Sorting Working Memory Test, Pattern Comparison Processing Speed Test,

Picture Sequence Memory Test, Oral Reading Recognition Test, Crystallized Composite, CBCL, birth weight, BMI and handedness), all of the available baseline, 2-year and 4-year follow-up

observations were used. For the analyses of the functional connectivity, only the baseline observations for the List Sorting Working Memory Test were used due to missingness (Supplementary

Table 12). The records with BMI lying outside the lower and upper 1% quantiles (that is, BMI < 13.5 or BMI > 36.9) were considered misinput and replaced with missing values. The

variable handedness was imputed using the last observation carried forwards. STATISTICAL ANALYSIS REMOVAL OF SITE EFFECTS For multisite or multistudy neuroimaging studies, it is necessary to

control for potential heterogeneity between sites to obtain unconfounded and generalizable results. Before estimating the main effects of age and sex on the global and regional brain

measures (total GMV, total WMV, total sGMV, mean cortical thickness, regional GMV and regional cortical thickness), we applied ComBat21 and LongComBat22 in cross-sectional datasets and

longitudinal datasets, respectively, to remove the potential site effects. The ComBat algorithm involves several steps including data standardization, site-effect estimation, empirical

Bayesian adjustment, removing estimated site effects and data rescaling. In the analysis of cross-sectional datasets, the models for ComBat were specified as a linear regression model

illustrated below using total GMV: $${\rm{GMV\; \sim \; ns(age,\; d.f.\; =\; 2)\; +\; sex\; +\; site}},$$ where ns denotes natural cubic splines on 2 d.f., which means that there were two

boundary knots and one interval knot placed at the median of the covariate age. Splines were used to accommodate non-linearity in the age effect. For the longitudinal datasets, the model for

LongComBat used a linear mixed effects model with participant-specific random intercepts: $${\rm{GMV\; \sim \; (1| participant)\; +\; ns(age,\; d.f.\; =\; 2)\; +\; sex\; +\; site}}.$$ When

estimating the effects of other non-brain covariates in the ABCD dataset, ComBat was used to control the site effects, respectively, for each of the cross-sectional covariates. The ComBat

models were specified as illustrated below using GMV: $${\rm{GMV\; \sim \; ns(age,\; d.f.\; =\; 2)\; +\; sex}}+x+{\rm{site,}}$$ where _x_ denotes the non-brain covariate. LongComBat was used

for each of the longitudinal covariates with a linear mixed effects model with participant-specific random intercepts only: $${\rm{GMV\; \sim \; (1| participant)\; +\; ns(age,\; d.f.\; =\;

2)\; +\; sex}}+x+{\rm{site.}}$$ When estimating the effects of other covariates on the functional connectivity (FC) in the ABCD data, we additionally controlled for the mean frame

displacement (FD) of the frames remaining after scrubbing. The longComBat models were specified as: $$\begin{array}{l}\text{FC}\, \sim \,(1| \text{participant})+\text{ns}(\text{age,

d.f.}=2)\,+\,\text{sex}\\ \,+\,\text{ns}(\text{mean}\_\text{FD, d. f.}=3+x+\text{site}.\end{array}$$ The Combat and LongComBat were implemented using the neuroCombat59 and longCombat60 R

packages. Site effects were removed before all subsequent analyses including the bootstrap analyses described below. RESI FOR ASSOCIATION STRENGTH The RESI is a recently developed

standardized effect size index that has consistent interpretation across many model types, encompassing all types of test statistics in most regression models17,18. In brief, the RESI is a

standardized effect size parameter describing the deviation of the true parameter value (or values) $\beta $ from the reference value (or values) ${\beta }_{0}$ from the statistical null

hypothesis ${H}_{0}:\,\beta ={\beta }_{0}$, $$S=\sqrt{{(\beta -{\beta }_{0})}^{T}{{\varSigma }_{\beta }}^{-1}(\beta -{\beta }_{0})},$$ where _S_ denotes the parameter RESI, $\beta $ and

${\beta }_{0}$ can be vectors, _T_ denotes the transpose of a matrix, ${\varSigma }_{\beta }$ is the covariance matrix for $\sqrt{N}\hat{\beta }$ (where $\hat{\beta }$ is the

estimator for $\beta $, _N_ is the number of participants; section 3 in Supplementary Information). In previous work, we defined a consistent estimator for RESI17, $$\hat{S}={\left(\max

\left\{0,\frac{{T}^{2}-m}{N}\right\}\right)}^{1/2},$$ where ${{T}}^{2}$ is the chi-squared test statistics \({T}^{2}=N{(\beta -{\beta }_{0})}^{T}{{\varSigma }_{\beta }}^{-1}(\beta -{\beta

}_{0})\) for testing the null hypothesis ${H}_{0}:\,\beta ={\beta }_{0}$, $m$ is the number of parameters being tested (that is, the length of $\beta $) and $N$ is the number of

participants. As RESI is generally applicable across different models and data types, it is also applicable to the situation where Cohen’s _d_ was defined. In this scenario, the RESI is

equal to ½ Cohen’s _d_17, so Cohen’s suggested thresholds for effect size can be adopted for RESI: small (RESI = 0.1), medium (RESI = 0.25) and large (RESI = 0.4). Because RESI is robust,

when the assumptions of Cohen’s _d_ are not satisfied, such as when the variances between the groups are not equal, RESI is still a consistent estimator, but Cohen’s _d_ is not. The

confidence intervals for RESI in our analyses were constructed using 1,000 non-parametric bootstraps18. The systematic difference in the standardized effect sizes between cross-sectional and

longitudinal studies puts extra challenges on the comparison and aggregation of standardized effect size estimates across studies with different designs. To improve the comparability of

standardized effect sizes between cross-sectional and longitudinal studies, we proposed a new effect size index: the cross-sectional RESI (CS-RESI) for longitudinal datasets. The CS-RESI for

longitudinal datasets represents the RESI in the same study population if the longitudinal study had been conducted cross-sectionally. Detailed definition, point estimator and confidence

interval construction procedure for CS-RESI can be found in section 3 in Supplementary Information. Comprehensive statistical simulation studies were also performed to demonstrate the valid

performance of the proposed estimator and confidence interval for CS-RESI (section 3.2 in Supplementary Information). With CS-RESI, we can quantify the benefit of using a longitudinal study

design in a single dataset (section 3.3 in Supplementary Information). STUDY-LEVEL MODELS After removing the site effects using ComBat or LongComBat in the multisite data, we estimated the

effects of age and sex on each of the global or regional brain measures using GEEs and linear regression models in the longitudinal datasets and cross-sectional datasets, respectively. The

mean model was specified as below after ComBat or LongComBat: $${y}_{ij} \sim {\rm{ns}}({{\rm{age}}}_{ij},{\rm{d.f.}}=2)+{{\rm{sex}}}_{i},$$ where _y__ij_ was taken to be a global brain

measure (that is, total GMV, WMV, sGMV or mean cortical thickness) or regional brain measure (that is, regional GMV or cortical thickness) at the _j_-th visit from the participant _i_ and

_j_ = 1 for cross-sectional datasets. The age effect was estimated with natural cubic splines with 2 d.f., which means that there were two boundary knots and one interval knot placed at the

median of the covariate age. For the GEEs, we used an exchangeable correlation structure as the working structure and identity linkage function. The model assumes the mean was correctly

specified, but made no assumption about the error distribution. The GEEs were fitted with the ‘geepack’ package61 in R. We used the RESI as a standardized effect size measure. RESIs and

confidence intervals were computed using the ‘RESI’ R package (v1.2.0)19. META-ANALYSIS OF THE AGE AND SEX EFFECTS The weighted linear regression model for the meta-analysis of age effects

across the studies was specified as:

$${\widehat{S}}_{{\rm{age}},k}={{\rm{design}}}_{k}+{\rm{ns}}[{\rm{mean}}{({\rm{age}})}_{k},3]+{\rm{ns}}[{\rm{s.d.}}{({\rm{age}})}_{k},3]+{\rm{ns}}[{\rm{skew}}{({\rm{age}})}_{k},3]+{{\epsilon

}}_{k},$$ where ${\widehat{S}}_{{\rm{age}},k}$ denotes the estimated RESI for study _k_, and the weights were the inverse of the standard error of each RESI estimate. The sample mean,

standard deviation (s.d.) and skewness of the age were included as non-linear terms estimated using natural splines with 3 d.f. (that is, two boundary knots plus two interval knots at the

33rd and 66th percentiles of the covariates), and a binary variable indicating the design type (cross-sectional or longitudinal) was also included. The weighted linear regression model for

the meta-analysis of sex effects across the studies was specified as

$${\widehat{S}}_{{\rm{sex}},k}={{\rm{design}}}_{k}+{\rm{ns}}[{\rm{mean}}{({\rm{age}})}_{k},3]+{\rm{ns}}[{\rm{s.d.}}{({\rm{age}})}_{k},3]+{\rm{ns}}[{\rm{pr}}{({\rm{male}})}_{k},3]+{{\epsilon

}}_{k},$$ where ${\widehat{S}}_{{\rm{sex}},k}$ denotes the estimated RESI of sex for study _k_, and the weights were the inverse of the standard error of each RESI estimate. The sample

mean, standard deviation of the age covariate and the proportion of males in each study were included as non-linear terms estimated using natural splines with 3 d.f., and a binary variable

indicating the design type (cross-sectional or longitudinal) was also included. These meta-analyses were performed for each of the global and regional brain measures. Inferences were

performed using robust standard errors62. In the partial regression plots, the expected standardized effect sizes for the age effects were estimated from the meta-analysis model after fixing

mean age at 45 years, standard deviation of age at 7 years and/or skewness at 0; the expected standardized effect sizes for the sex effects were estimated from the meta-analysis model after

fixing mean age at 45 years, standard deviation of age at 7 years and/or proportion of males at 0.5. SAMPLING SCHEMES FOR AGE IN THE UKB AND ADNI We used bootstrapping to evaluate the

effect of different sampling schemes with different target sample covariate distributions on the standardized effect sizes and replicability in the cross-sectional UKB and longitudinal ADNI

datasets. For a given sample size and sampling schemes, 1,000 bootstrap replicates were conducted. The standardized effect size was estimated as the mean standardized effect size (that is,

RESI) across the bootstrap replicates. The 95% confidence interval for the standardized effect size was estimated using the lower and upper 2.5% quantiles across the 1,000 estimates of the

standardized effect size in the bootstrap replicates. Power was calculated as the proportion of bootstrap replicates producing _P_ values less than or equal to 5% for those associations that

were significant at 0.05 in the full sample. In the UKB, only one region was not significant for age in each of GMV and cortical thickness, and in the ADNI, only one and four regions were

not significant for age in GMV and cortical thickness, respectively. Replicability in previous work has been defined as having a significant _P_ value and the same sign for the regression

coefficient. Because we were fitting non-linear effects, we defined replicability as the probability that two independent studies have significant _P_ values; this is equivalent to the

definition of power squared. The 95% confidence intervals for replicability were derived using Wilson’s method63. In the UKB dataset, to modify the (between-subject) variability of the age

variable, we used the following three target sampling distributions (Extended Data Fig. 1a): bell shaped, where the target distribution had most of the participants distributed in the middle

age range; uniform, where the target distribution had participants equally distributed across the age range; and U shaped, where the target distribution had most of the participants

distributed closer to the range limits of the age in the study. The samples with U-shaped age distribution had the largest sample variance of age, followed by the samples with uniform age

distribution and the samples with bell-shaped age distribution. The bell-shaped and U-shaped functions were proportional to a quadratic function. To sample according to these distributions,

each record was first inversely weighted by the frequency of the records with age falling in the range of ±0.5 years of the age for that record to achieve the uniform sampling distribution.

Each record was then rescaled to derive the weights for bell-shaped and U-shaped sampling distributions. The records with age < 50 or age > 78 years were winsorized at 50 or 78 years

when assigning weights, respectively, to limit the effects of outliers on the weight assignment, but the actual age values were used when analysing each bootstrapped data. In each bootstrap

from the ADNI dataset, each participant was sampled to have two records. We modified the between-subject and within-subject variability of age, respectively, by making the ‘baseline age’

follow one of the three target sampling distributions used for the UKB dataset and the ‘age change’ independently follow one of three new distributions: decreasing, uniform and increasing

(Extended Data Fig. 1b,c). The increasing and decreasing functions were proportional to an exponential function. The samples with increasing distribution of age change had the largest

within-subject variability of age, followed by the samples with the uniform distribution of age change and the samples with decreasing distribution of age change. To modify the baseline age

and the age change from baseline independently, we first created all combinations of the baseline record and one follow-up from each participant, and derived the baseline age and age change

for each combination. The ‘bivariate’ frequency of each combination was obtained as the number of combinations with values of baseline age and age change falling in the range of ±0.5 years

of the values of baseline age and age change for this combination. Then, each combination was inversely weighted by its bivariate frequency to target a uniform bivariate distribution of

baseline age and age change. The weight for each combination was then rescaled to make the baseline age and age change follow different sampling distributions independently. The combinations

with baseline age < 65 or age > 85 years were winsorized at 65 or 85 years, and the combinations with age change greater than 5 years were winsorized at 5 years when assigning weights

to limit the effects of outliers on the weight assignment, but the actual ages were used when analysing each bootstrapped data. The sampling methods could be easily extended to the scenario

in which each participant had three records (and more than three) in the bootstrap data by making combinations of the baseline and two follow-ups. Each combination was inversely weighted to

achieve uniform baseline age and age change distributions, respectively, by the ‘trivariate’ frequency of the combinations with baseline age and the two age changes from baseline for the

two follow-ups falling into the range of ±0.5 years of the corresponding values for this combination. As we only investigated the effect of modifying the number of measurements per

participant under uniform between-subject and within-subject sampling schemes (Fig. 3c,d), we did not need to consider rescaling the weights here to achieve other sampling distributions but

they could be done similarly. For the scenario in which each participant only had one measurement (Fig. 3c,d), the standardized effect sizes and replicability were estimated only using the

baseline measurements. All site effects were removed using ComBat or LongComBat before performing the bootstrap analysis. SAMPLING SCHEMES FOR OTHER NON-BRAIN COVARIATES IN THE ABCD We used

bootstrapping to study how different sampling strategies affect the RESI in the ABCD dataset. Each participant in the bootstrap data had two measurements. We applied the same weight

assignment method described above for the ADNI dataset to modify the between-subject and within-subject variability of a covariate. We made the baseline covariate and the change in covariate

follow bell-shaped and/or U-shaped distributions, to let the sample have larger or smaller between-subject and/or within-subject variability of the covariate, respectively. The baseline

covariate and change of covariate were winsorized at the upper and lower 5% quantiles to limit the effect of outliers on sampling. For each cognitive variable, only the participants with

non-missing baseline measurements and at least one non-missing follow-up were included. Generalized linear models and GEEs were fitted to estimate the effect of each non-brain covariate on

the structural brain measures after controlling for age and sex, $${\rm{GMV}} \sim {\rm{ns}}({\rm{age,\; d.f.}}=2)+{\rm{sex}}+x,$$ where _x_ denotes one of the non-brain covariates. For the

GEEs, we used an exchangeable correlation structure as the working structure and identity linkage function. Only the between-subject sampling schemes were applied for the non-brain

covariates that were stable over time (for example, birth weight and handedness). In other words, the participants were sampled based on their baseline covariate values, and then a follow-up

was selected randomly for each participant. The sampling schemes to increase the between-subject variability in the covariate handedness, which was a binary variable (right-handed or not),

was specified differently. The expected proportion of right-handed participants in the bootstrap samples was 50% under the sampling scheme with larger between-subject variability and 10%

under the sampling scheme with smaller between-subject variability. For given between-subject and/or within-subject sampling schemes, we obtained 1,000 bootstrap replicates. The standardized

effect size was estimated as the mean standardized effect size across the bootstrap replicates. The 95% confidence intervals for standardized effect size were estimated using the lower and

upper 2.5% quantiles across the 1,000 estimates of the standardized effect size in the bootstrap replicates. The sample sizes needed for 80% replicability were estimated based on the (mean)

estimated standardized effect size and _F_-distribution (see below). ANALYSIS OF FUNCTIONAL CONNECTIVITY IN THE ABCD In a subset of the ABCD in which we have preprocessed longitudinal

functional connectivity data at two time points (baseline and 2-year follow-up), we only restricted our analysis to the participants with non-missing measurements at both of the two time

points. In the GEEs used to estimate the effects of non-brain covariates on functional connectivity, the mean model was specified as below after LongComBat: $${y}_{ij} \sim

{\rm{ns}}({{\rm{age}}}_{ij},{\rm{d.f.}}=2)+{{\rm{sex}}}_{i}+{\rm{ns}}({\rm{mean}}\_{\rm{FD}},\mathrm{d.f.}=3)+x,$$ where _y__ij_ was taken to be a functional connectivity outcome, and _x_

denotes a non-brain covariate. The mean frame-wise displacement (mean_FD) was also included as a covariate with natural cubic splines with 3 d.f. We used an exchangeable correlation

structure as the working structure and identity linkage function in the GEEs. The frame count of each scan was used as the weights. When evaluating the effect of different sampling schemes

on the standardized effect sizes, we obtained 1,000 bootstrap replicates for given between-subject and/or within-subject sampling schemes. The standardized effect size was estimated as the

mean standardized effect size across the bootstrap replicates. Confidence intervals were computed as described above. The sample sizes needed for 80% replicability were estimated based on

the (mean) estimated standardized effect sizes and _F_-distribution (see below). SAMPLE SIZE CALCULATION FOR A TARGET POWER OR REPLICABILITY WITH A GIVEN STANDARDIZED EFFECT SIZE After

estimating the standardized effect size for an association, the calculation of the corresponding sample size _N_ needed for detecting this association with _γ_ × 100% power at significance

level of _α_ was based on an _F_-distribution. d.f. denotes the total degree of freedom of the analysis model, $F(z;\lambda )$ denotes the cumulative density function for a random variable

_z_, which follows the (non-central) _F_-distribution with degrees of freedom being 1 and _N_ − d.f. and non-centrality parameter $\lambda $. The corresponding sample size _N_ is:

$$N=\{N:F({F}^{-1}(1-\alpha )\,;\lambda =N{\hat{S}}^{2})=\gamma \},$$ where $\hat{S}$ is the estimated RESI for the standardized effect size. Power curves for the RESI are given in figure

3 of Vandekar et al.17. Replicability was defined as the probability that two independent studies have significant _P_ values, which is equivalent to power squared. ESTIMATION OF THE

BETWEEN-SUBJECT AND WITHIN-SUBJECT EFFECTS For the non-brain covariates that were analysed longitudinally in the ABCD dataset, GEEs with exchangeable correlation structures were fitted to

estimate their cross-sectional and longitudinal effects on structural and functional brain measures after controlling for age and sex, respectively. The mean model was specified as

illustrated with GMV: $${\rm{GMV\; \sim \; ns(age,\; d.f.\; =\; 2)\; +\; sex}}+X\_{\rm{bl}}+X\_{\rm{change,}}$$ where _X__bl denotes the participant-specific baseline covariate values, and

the _X__change denotes the difference of the covariate value at each visit to the participant-specific baseline covariate value (see section 5.2 in Supplementary Information). The

participants without baseline measures were not included in the modelling. The model coefficients for the terms _X__bl and _X__change represent the between-subject and within-subject effects

of this non-brain covariate on total GMV, respectively. For the functional connectivity data, the same covariates and weighting were used as described above. Using the first time point as

the between-subject term was a special case that ensured that comparing the parameter using the baseline cross-sectional model was equal to the parameter for the between-subject effect in

the longitudinal model. In this model, the between-subject variance was defined as the variance of the baseline measurement, and the within-subject variance was the mean square of

_X__change. This model specification ensured that the sampling schemes independently affected the between-subject and within-subject variances separately (equation (16) in section 5.2 in

Supplementary Information). REPORTING SUMMARY Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article. DATA AVAILABILITY

Participant-level data from many datasets are available according to study-level data access rules. Study-level model parameters are available at https://github.com/KaidiK/RESI_BWAS. We

acknowledge the usage of several openly shared MRI datasets, which are available at the respective consortia websites and are subject to the sharing policies of each consortium: OpenNeuro

(https://openneuro.org/), UKB (https://www.ukbiobank.ac.uk/), ABCD (https://abcdstudy.org/), the Laboratory of NeuroImaging (https://loni.usc.edu/), data made available through the Open

Science Framework (https://osf.io/), the Human Connectome Project (http://www.humanconnectomeproject.org/) and the OpenPain project (https://www.openpain.org). The ABCD data repository grows

and changes over time. The ABCD data used in this paper are from the NIMH Data Archive (https://doi.org/10.15154/1503209) and the ABCD BIDS Community Collection (ABCC;

https://collection3165.readthedocs.io). Data used in this article were provided by the Brain Consortium for Reliability, Reproducibility and Replicability (3R-BRAIN)

(https://github.com/zuoxinian/3R-BRAIN). Data used in the preparation of this article were obtained from the Australian Imaging Biomarkers and Lifestyle (AIBL) flagship study of ageing

funded by the Commonwealth Scientific and Industrial Research Organisation (CSIRO), which was made available at the ADNI database

(https://adni.loni.usc.edu/aibl-australian-imaging-biomarkers-and-lifestyle-study-of-ageing-18-month-data-now-released/). The AIBL researchers contributed data but did not participate in

analysis or writing of this report. AIBL researchers are listed at https://www.aibl.csiro.au. Data used in preparation of this article were obtained from the ADNI database

(https://adni.loni.usc.edu/). The investigators within the ADNI contributed to the design and implementation of the ADNI and/or provided data but did not participate in analysis or writing

of this report. A complete listing of ADNI investigators can be found at https://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf. More information on the

ARWIBO Consortium can be found at https://www.arwibo.it/. More information on CALM team members can be found at https://calm.mrc-cbu.cam.ac.uk/team/ and in the Supplementary Information.

Data used in this article were obtained from the developmental component ‘Growing Up in China’ of the Chinese Color Nest Project (http://deepneuro.bnu.edu.cn/?p=163). Data used in the

preparation of this article were obtained from the IConsortium on Vulnerability to Externalizing Disorders and Addictions (c-VEDA), India (https://cveda-project.org/). Data used in the

preparation of this article were obtained from the Harvard Aging Brain Study (HABS P01AG036694; https://habs.mgh.harvard.edu). Data used in the preparation of this article were obtained from

the IMAGEN Consortium (https://imagen-europe.com/). The POND Network (https://pond-network.ca/) is a Canadian translational network in neurodevelopmental disorders, primarily funded by the

Ontario Brain Institute. The LBCC dataset used in the preparation of this article includes data obtained from the ADNI database (https://adni.loni.usc.edu). The ADNI was launched in 2003 as

a public–private partnership, led by Principal Investigator M. W. Weiner. The primary goal of the ADNI has been to test whether serial MRI, positron emission tomography, other biological

markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment and early Alzheimer’s disease. Its data collection and sharing

for this project were funded by the ADNI (National Institutes of Health grant U01 AG024904) and Department of Defense ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is

funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through contributions from the following: AbbVie, Alzheimer’s Association;

Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica; Biogen; Bristol-Myers Squibb; CereSpir; Cogstate; Eisai; Elan Pharmaceuticals; Eli Lilly and Company; EuroImmun; F.

Hoffmann-La Roche and its affiliated company Genentech; Fujirebio; GE Healthcare; IXICO; Janssen Alzheimer Immunotherapy Research & Development; Johnson & Johnson Pharmaceutical

Research & Development; Lumosity; Lundbeck; Merck & Co.; Meso Scale Diagnostics; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer; Piramal

Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research are providing funds to support ADNI clinical sites in Canada. Private

sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and

Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for

NeuroImaging at the University of Southern California. CODE AVAILABILITY All code used to produce the analyses presented in this study is available at https://github.com/KaidiK/RESI_BWAS.

REFERENCES * Marek, S. et al. Reproducible brain-wide association studies require thousands of individuals. _Nature_ 603, 654–660 (2022). Article ADS CAS PubMed PubMed Central Google

Scholar * Owens, M. M. et al. Recalibrating expectations about effect size: a multi-method survey of effect sizes in the ABCD study. _PLoS ONE_ 16, e0257535 (2021). Article CAS PubMed

PubMed Central Google Scholar * Spisak, T., Bingel, U. & Wager, T. D. Multivariate BWAS can be replicable with moderate sample sizes. _Nature_ 615, E4–E7 (2023). Article ADS CAS

PubMed PubMed Central Google Scholar * Bethlehem, Ra. I. et al. Brain charts for the human lifespan. _Nature_ 604, 525–533 (2022). Article CAS PubMed PubMed Central Google Scholar *

Nosek, B. A. et al. Replicability, robustness, and reproducibility in psychological science. _Annu. Rev. Psychol._ 73, 719–748 (2022). Article PubMed Google Scholar * Patil, P., Peng, R.

D. & Leek, J. T. What should we expect when we replicate? A statistical view of replicability in psychological science. _Perspect. Psychol. Sci. J. Assoc. Psychol. Sci._ 11, 539–544

(2016). Article Google Scholar * Liu, S., Abdellaoui, A., Verweij, K. J. H. & van Wingen, G. A. Replicable brain–phenotype associations require large-scale neuroimaging data. _Nat.

Hum. Behav._ 7, 1344–1356 (2023). Article PubMed Google Scholar * Button, K. S. et al. Power failure: why small sample size undermines the reliability of neuroscience. _Nat. Rev.

Neurosci._ 14, 365–376 (2013). Article CAS PubMed Google Scholar * Szucs, D. & Ioannidis, J. P. Sample size evolution in neuroimaging research: an evaluation of highly-cited studies

(1990–2012) and of latest practices (2017–2018) in high-impact journals. _NeuroImage_ 221, 117164 (2020). Article PubMed Google Scholar * Reddan, M. C., Lindquist, M. A. & Wager, T.

D. Effect size estimation in neuroimaging. _JAMA Psychiatry_ 74, 207–208 (2017). Article PubMed Google Scholar * Vul, E. & Pashler, H. Voodoo and circularity errors. _NeuroImage_ 62,

945–948 (2012). Article PubMed Google Scholar * Vul, E., Harris, C., Winkielman, P. & Pashler, H. Puzzlingly high correlations in fMRI studies of emotion, personality, and social

cognition. _Perspect. Psychol. Sci. J. Assoc. Psychol. Sci._ 4, 274–290 (2009). Article Google Scholar * Nee, D. E. fMRI replicability depends upon sufficient individual-level data.

_Commun. Biol._ 2, 130 (2019). Article PubMed PubMed Central Google Scholar * Smith, P. L. & Little, D. R. Small is beautiful: in defense of the small-N design. _Psychon. Bull. Rev._

25, 2083–2101 (2018). Article PubMed PubMed Central Google Scholar * Klapwijk, E. T., van den Bos, W., Tamnes, C. K., Raschle, N. M. & Mills, K. L. Opportunities for increased

reproducibility and replicability of developmental neuroimaging. _Dev. Cogn. Neurosci._ 47, 100902 (2020). Article PubMed PubMed Central Google Scholar * Lawless, J. F., Kalbfleisch, J.

D. & Wild, C. J. Semiparametric methods for response-selective and missing data problems in regression. _J. R. Stat. Soc. Ser. B Stat. Methodol._ 61, 413–438 (1999). Article MathSciNet

Google Scholar * Vandekar, S., Tao, R. & Blume, J. A robust effect size index. _Psychometrika_ 85, 232–246 (2020). Article MathSciNet PubMed PubMed Central Google Scholar * Kang,

K. et al. Accurate confidence and Bayesian interval estimation for non-centrality parameters and effect size indices. _Psychometrika_ https://doi.org/10.1007/s11336-022-09899-x (2023). *

Jones, M., Kang, K. & Vandekar, S. RESI: an R package for robust effect sizes. Preprint at https://arxiv.org/abs/2302.12345 (2023). * Desikan, R. S. et al. An automated labeling system

for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. _NeuroImage_ 31, 968–980 (2006). Article PubMed Google Scholar * Johnson, W. E., Li, C. &

Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. _Biostatistics_ 8, 118–127 (2007). Article PubMed Google Scholar * Beer, J. C. et al.

Longitudinal ComBat: a method for harmonizing longitudinal multi-scanner imaging data. _NeuroImage_ 220, 117129 (2020). Article PubMed Google Scholar * Boos, D. D. & Stefanski, L. A.

_Essential Statistical Inference: Theory and Methods_ (Springer-Verlag, 2013). * Carlozzi, N. E. et al. Construct validity of the NIH Toolbox cognition battery in individuals with stroke.

_Rehabil. Psychol._ 62, 443–454 (2017). Article PubMed PubMed Central Google Scholar * Gordon, E. M. et al. Generation and evaluation of a cortical area parcellation from resting-state

correlations. _Cereb. Cortex_ 26, 288–303 (2016). Article PubMed Google Scholar * Noble, S., Mejia, A. F., Zalesky, A. & Scheinost, D. Improving power in functional magnetic resonance

imaging by moving beyond cluster-level inference. _Proc. Natl Acad. Sci. USA_ 119, e2203020119 (2022). Article CAS PubMed PubMed Central Google Scholar * Tao, R., Zeng, D. & Lin,

D.-Y. Optimal designs of two-phase studies. _J. Am. Stat. Assoc._ 115, 1946–1959 (2020). Article MathSciNet CAS PubMed Google Scholar * Schildcrout, J. S., Garbett, S. P. &

Heagerty, P. J. Outcome vector dependent sampling with longitudinal continuous response data: stratified sampling based on summary statistics. _Biometrics_ 69, 405–416 (2013). Article

MathSciNet PubMed Google Scholar * Tao, R., Zeng, D. & Lin, D.-Y. Efficient semiparametric inference under two-phase sampling, with applications to genetic association studies. _J.

Am. Stat. Assoc._ 112, 1468–1476 (2017). Article MathSciNet CAS PubMed PubMed Central Google Scholar * Fisher, J. E., Guha, A., Heller, W. & Miller, G. A. Extreme-groups designs in

studies of dimensional phenomena: advantages, caveats, and recommendations. _J. Abnorm. Psychol._ 129, 14–20 (2020). Article PubMed Google Scholar * Preacher, K. J., Rucker, D. D.,

MacCallum, R. C. & Nicewander, W. A. Use of the extreme groups approach: a critical reexamination and new recommendations. _Psychol. Methods_ 10, 178–192 (2005). Article PubMed Google

Scholar * Amanat, S., Requena, T. & Lopez-Escamez, J. A. A systematic review of extreme phenotype strategies to search for rare variants in genetic studies of complex disorders. _Genes_

11, 987 (2020). Article CAS PubMed PubMed Central Google Scholar * Lotspeich, S. C., Amorim, G. G. C., Shaw, P. A., Tao, R. & Shepherd, B. E. Optimal multiwave validation of

secondary use data with outcome and exposure misclassification. _Can. J. Stat_. https://doi.org/10.1002/cjs.11772 (2023). * Tao, R. et al. Analysis of sequence data under multivariate

trait-dependent sampling. _J. Am. Stat. Assoc._ 110, 560–572 (2015). Article MathSciNet CAS PubMed PubMed Central Google Scholar * Lin, H. et al. Strategies to design and analyze

targeted sequencing data. _Circ. Cardiovasc. Genet._ 7, 335–343 (2014). Article PubMed PubMed Central Google Scholar * Tao, R., Lotspeich, S. C., Amorim, G., Shaw, P. A. & Shepherd,

B. E. Efficient semiparametric inference for two-phase studies with outcome and covariate measurement errors. _Stat. Med._ 40, 725–738 (2021). Article MathSciNet PubMed Google Scholar *

Nikolaidis, A. et al. Suboptimal phenotypic reliability impedes reproducible human neuroscience. Preprint at _bioRxiv_ https://doi.org/10.1101/2022.07.22.501193 (2022). * Xu, T. et al. ReX:

an integrative tool for quantifying and optimizing measurement reliability for the study of individual differences. _Nat. Methods_ 20, 1025–1028 (2023). Article CAS PubMed Google Scholar

* Gell, M. et al. The burden of reliability: how measurement noise limits brain–behaviour predictions. Preprint at _bioRxiv_ https://doi.org/10.1101/2023.02.09.527898 (2024). * Diggle P.,

Heagerty P., Liang K.-Y. & Zeger S. _Analysis of Longitudinal Data_ 2nd edn (Oxford Univ. Press, 2013). * Pepe, M. S. & Anderson, G. L. A cautionary note on inference for marginal

regression models with longitudinal data and general correlated response data. _Commun. Stat. Simul. Comput_. https://doi.org/10.1080/03610919408813210 (1994). * Begg, M. D. & Parides,

M. K. Separation of individual-level and cluster-level covariate effects in regression analysis of correlated data. _Stat. Med._ 22, 2591–2602 (2003). Article PubMed Google Scholar *

Curran, P. J. & Bauer, D. J. The disaggregation of within-person and between-person effects in longitudinal models of change. _Annu. Rev. Psychol._ 62, 583–619 (2011). Article PubMed

PubMed Central Google Scholar * Di Biase, M. A. et al. Mapping human brain charts cross-sectionally and longitudinally. _Proc. Natl Acad. Sci. USA_ 120, e2216798120 (2023). Article CAS

PubMed PubMed Central Google Scholar * Guillaume, B., Hua, X., Thompson, P. M., Waldorp, L. & Nichols, T. E. Fast and accurate modelling of longitudinal and repeated measures

neuroimaging data. _NeuroImage_ 94, 287–302 (2014). Article PubMed Google Scholar * The ADHD-200 Consortium. The ADHD-200 Consortium: a model to advance the translational potential of

neuroimaging in clinical neuroscience. _Front. Syst. Neurosci._ 6, 62 (2012). PubMed Central Google Scholar * Di Martino, A. et al. The autism brain imaging data exchange: towards a

large-scale evaluation of the intrinsic brain architecture in autism. _Mol. Psychiatry_ 19, 659–667 (2014). Article PubMed Google Scholar * Snoek, L. et al. AOMIC-PIOP1. _OpenNeuro_

https://doi.org/10.18112/openneuro.ds002785.v2.0.0 (2020). * Snoek, L. et al. AOMIC-PIOP2. _OpenNeuro_ https://doi.org/10.18112/openneuro.ds002790.v2.0.0 (2020). * Snoek, L. et al.

AOMIC-ID1000. _OpenNeuro_ https://doi.org/10.18112/openneuro.ds003097.v1.2.1 (2021). * Bilder, R. et al. UCLA Consortium for Neuropsychiatric Phenomics LA5c study. _OpenNeuro_

https://doi.org/10.18112/openneuro.ds000030.v1.0.0 (2020). * Nastase, S. A. et al. Narratives. _OpenNeuro_ https://doi.org/10.18112/openneuro.ds002345.v1.1.4 (2020). * Alexander, L. M. et

al. An open resource for transdiagnostic research in pediatric mental health and learning disorders. _Sci. Data_ 4, 170181 (2017). Article PubMed PubMed Central Google Scholar *

Richardson, H., Lisandrelli, G., Riobueno-Naylor, A. & Saxe, R. Development of the social brain from age three to twelve years. _Nat. Commun._ 9, 1027 (2018). Article ADS PubMed

PubMed Central Google Scholar * Kuklisova-Murgasova, M. et al. A dynamic 4D probabilistic atlas of the developing brain. _NeuroImage_ 54, 2750–2763 (2011). Article PubMed Google Scholar

* Reynolds, J. E., Long, X., Paniukov, D., Bagshawe, M. & Lebel, C. Calgary preschool magnetic resonance imaging (MRI) dataset. _Data Brief._ 29, 105224 (2020). Article PubMed PubMed

Central Google Scholar * Feczko, E. et al. Adolescent Brain Cognitive Development (ABCD) community MRI collection and utilities. Preprint at _bioRxiv_

https://doi.org/10.1101/2021.07.09.451638 (2021). * Casey, B. J. et al. The Adolescent Brain Cognitive Development (ABCD) study: imaging acquisition across 21 sites. _Dev. Cogn. Neurosci._

32, 43–54 (2018). Article CAS PubMed PubMed Central Google Scholar * Fortin J.-P. neuroCombat: harmonization of multi-site imaging data with ComBat. R package version 1.0.13 (2023). *

Beer, J. longCombat: longitudinal ComBat for harmonizing multi-batch longitudinal data. R package version 0.0.0.90000; https://github.com/jcbeer/longCombat (2020). * Højsgaard, S., Halekoh,

U., Yan, J. & Ekstrøm, C. T. geepack: Generalized estimating equation package; https://cran.r-project.org/web/packages/geepack/index.html (2022). * Long, J. S. & Ervin, L. H. Using

heteroscedasticity consistent standard errors in the linear regression model. _Am. Stat._ 54, 217–224 (2000). Article Google Scholar * Agresti, A. & Coull, B. A. Approximate is better

than ‘exact’ for interval estimation of binomial proportions. _Am. Stat._ 52, 119–126 (1998). MathSciNet Google Scholar Download references ACKNOWLEDGEMENTS S.V. was supported by

R01MH123563 from the National Institute of Mental Health (NIMH). A.A.-B. and J.Seidlitz were partially supported by R01MH132934 and R01MH133843 from the NIMH. B.T.-C. was supported by

K23DA057486 from National Institute of Drug Abuse (NIDA). T.D.S. was supported by R01MH120482, R01MH112847, R01MH113550 and R37MH125829 from the NIMH, R01EB022573 from the National Institute

of Biomedical Imaging and Bioengineering (NIBIB), the AE Foundation and the Penn-CHOP Lifespan Brain Institute. B.L. was supported by R00MH127293 from the NIMH. D.F. was supported by

U01DA041148R and U24DA055330 from the NIDA, R37MH125829, R01MH096773 and R01MH115357 from the NIMH, the Masonic Institute for the Developing Brain, and the Lynne and Andrew Redleaf

Foundation. Data processing done at the University of Cambridge is supported by the NIHR Cambridge Biomedical Research Centre (BRC-1215-20014) and NIHR Applied Research Collaboration East of

England. Data used in the preparation of this article include data obtained from the ADNI database (https://adni.loni.usc.edu). As such, the investigators in the ADNI contributed to the

design and implementation of the ADNI and/or provided data, but did not participate in the analysis or writing of this report. A complete list of the ADNI investigators is available

(https://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf). Data were used from the following consortia: 3R-BRAIN, AIBL, Alzheimer’s Disease Neuroimaging

Initiative (ADNI), Alzheimer’s Disease Repository Without Borders Investigators, CALM Team, CCNP, COBRE, cVEDA, Harvard Aging Brain Study, IMAGEN, POND, and The PREVENT-AD Research Group;

and lists of members and their affiliations appears in the Supplementary Information. Any views expressed are those of the authors and not necessarily those of the funders, IHU-JU2, the NIHR

or the Department of Health and Social Care. AUTHOR INFORMATION Author notes * These authors contributed equally: Aaron Alexander-Bloch, Simon Vandekar * A full list of members and their

affiliations appears in the Supplementary Information AUTHORS AND AFFILIATIONS * Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA Kaidi Kang, Jiangmei

Xiong, Megan T. Jones, Ran Tao, Jonathan Schildcrout & Simon Vandekar * Department of Child and Adolescent Psychiatry and Behavioral Sciences, The Children’s Hospital of Philadelphia,

Philadelphia, PA, USA Jakob Seidlitz, Aaron F. Alexander-Bloch & Aaron Alexander-Bloch * Department of Psychiatry, University of Pennsylvania, Philadelphia, PA, USA Jakob Seidlitz,

Kahini Mehta, Aaron F. Alexander-Bloch, Theodore D. Satterthwaite & Aaron Alexander-Bloch * Lifespan Brain Institute of The Children’s Hospital of Philadelphia and Penn Medicine,

Philadelphia, PA, USA Jakob Seidlitz, Kahini Mehta, Aaron F. Alexander-Bloch, Theodore D. Satterthwaite & Aaron Alexander-Bloch * Department of Psychology, University of Cambridge,

Cambridge, UK Richard A. I. Bethlehem * Penn Lifespan Informatics and Neuroimaging Center (PennLINC), Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA Kahini

Mehta & Theodore D. Satterthwaite * Department of Psychological Sciences, University of Connecticut, Mansfield, CT, USA Arielle S. Keller * Institute for the Brain and Cognitive

Sciences, University of Connecticut, Mansfield, CT, USA Arielle S. Keller * Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA Ran Tao * Department of

Pediatrics, University of Minnesota Medical School, Minneapolis, MN, USA Anita Randolph, Bart Larsen, Eric Feczko, Oscar Miranda Dominguez, Steven M. Nelson, Damien A. Fair & Damien A.

Fair * Masonic Institute for the Developing Brain, University of Minnesota, Minneapolis, MN, USA Anita Randolph, Bart Larsen, Brenden Tervo-Clemmens, Eric Feczko, Oscar Miranda Dominguez,

Steven M. Nelson, Damien A. Fair & Damien A. Fair * Department of Psychiatry and Behavioral Sciences, University of Minnesota Medical School, Minneapolis, MN, USA Brenden Tervo-Clemmens

* Institute of Child Development, University of Minnesota, Minneapolis, MN, USA Damien A. Fair Authors * Kaidi Kang View author publications You can also search for this author inPubMed

Google Scholar * Jakob Seidlitz View author publications You can also search for this author inPubMed Google Scholar * Richard A. I. Bethlehem View author publications You can also search

for this author inPubMed Google Scholar * Jiangmei Xiong View author publications You can also search for this author inPubMed Google Scholar * Megan T. Jones View author publications You

can also search for this author inPubMed Google Scholar * Kahini Mehta View author publications You can also search for this author inPubMed Google Scholar * Arielle S. Keller View author

publications You can also search for this author inPubMed Google Scholar * Ran Tao View author publications You can also search for this author inPubMed Google Scholar * Anita Randolph View

author publications You can also search for this author inPubMed Google Scholar * Bart Larsen View author publications You can also search for this author inPubMed Google Scholar * Brenden

Tervo-Clemmens View author publications You can also search for this author inPubMed Google Scholar * Eric Feczko View author publications You can also search for this author inPubMed Google

Scholar * Oscar Miranda Dominguez View author publications You can also search for this author inPubMed Google Scholar * Steven M. Nelson View author publications You can also search for

this author inPubMed Google Scholar * Jonathan Schildcrout View author publications You can also search for this author inPubMed Google Scholar * Damien A. Fair View author publications You

can also search for this author inPubMed Google Scholar * Theodore D. Satterthwaite View author publications You can also search for this author inPubMed Google Scholar * Aaron

Alexander-Bloch View author publications You can also search for this author inPubMed Google Scholar * Simon Vandekar View author publications You can also search for this author inPubMed

Google Scholar CONSORTIA LIFESPAN BRAIN CHART CONSORTIUM * Aaron F. Alexander-Bloch * , Richard A. I. Bethlehem * , Damien A. Fair * , Theodore D. Satterthwaite * , Jakob Seidlitz * &

Simon Vandekar CONTRIBUTIONS K.K., J. Seidlitz, S.V. and A.A.-B. conceived the work. K.K., S.V., R.T. and J. Schildcrout performed the methodology. K.K. and S.V. conducted the analysis.

K.K., S.V., J. Seidlitz, R.A.I.B., K.M., A.S.K., R.T., A.R., B.L., B.T.-C., E.F., O.M.D., S.M.N., J. Schildcrout, D.F., T.D.S. and A.A.-B interpreted the results. A.A.-B. J. Seidlitz,

R.A.I.B., K.M., A.S.K., A.R., B.L., B.T.-C., E.F., O.M.D., S.M.N., D.F., T.D.S. and consortia authors performed data acquisition and curation. J.X. and M.T.J. performed the validation. K.K.,

J. Seidlitz, A.A.-B. and S.V. drafted the manuscript. All authors revised the manuscript. CORRESPONDING AUTHORS Correspondence to Kaidi Kang or Simon Vandekar. ETHICS DECLARATIONS COMPETING

INTERESTS J.Seidlitz and R.A.I.B. are directors and hold equity in Centile Bioscience. A.A.-B. holds equity in Centile Bioscience and received consulting income from Octave Bioscience in

2023. S.M.N. consults for Turing Medical, which commercializes FIRMM. This interest has been reviewed and managed by the University of Minnesota in accordance with its conflict of interest

policies. All other authors declare no competing interests. PEER REVIEW PEER REVIEW INFORMATION _Nature_ thanks the anonymous reviewer(s) for their contribution to the peer review of this

work. Peer reviewer reports are available. ADDITIONAL INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional

affiliations. EXTENDED DATA FIGURES AND TABLES EXTENDED DATA FIG. 1 The illustration of implemented sampling schemes and the region-specific improvement in the standardized effect sizes and

replicability for the age associations in UKB. (A) The sampling scheme implemented in UKB. The sampling schemes adjust the variability of age in the samples by assigning heavier or lighter

weights to the participants with age at the two tails of the population. The U-shaped scheme produces the largest variability of age in the samples, followed by uniform and bell-shaped

sampling schemes. (B-C) Between- and within-subject sampling schemes implemented in ADNI. (B) The between-subject variability of age is adjusted by assigning heavier or lighter weights to

the participants with baseline age closer to the two tails of the population baseline age distribution. (C) The within-subject variability in age is adjusted by increasing or decreasing the

probability of selecting the follow-up observation(s) with a larger change in age since baseline. (D-E) Region-specific improvement in the RESI and replicability in UKB for the association

between age and (D) regional gray matter volume (GMV) and (E) regional cortical thickness (CT), respectively, by using U-shaped sampling scheme compared with bell-shaped sampling scheme,

when _N_ = 300. EXTENDED DATA FIG. 2 HETEROGENEOUS IMPROVEMENT OF STANDARDIZED EFFECT SIZES (ESS) FOR COGNITIVE, MENTAL HEALTH, AND DEMOGRAPHIC ASSOCIATIONS WITH STRUCTURAL AND FUNCTIONAL

BRAIN MEASURES IN THE ABCD STUDY WITH BOOTSTRAPPED SAMPLES OF _N_ = 500. (A) U-shaped between-subject sampling scheme (blue) that increases between-subject variability of the non-brain

covariate produces larger standardized ESs and (B) reduces the number of participants scanned to obtain 80% replicability in total gray matter volume (GMV). The points and triangles are the

average standardized ESs across bootstraps and the whiskers are the 95% confidence intervals. Increasing within-subject sampling (triangles) can reduce standardized ESs. A similar pattern

holds in (C-D) regional GMV and (E-F) regional cortical thickness (CT); boxplots show the distributions of the standardized ESs across regions. In contrast, (G) regional pairwise functional

connectivity (FC) standardized ESs are improved by increasing between- (blue) and within-subject variability (dashed borders) with a corresponding reduction in the (H) number of participants

scanned for 80% replicability. C-H, Boxplots show the median (horizontal line), interquartile range (grey box), and min-max values (vertical lines). EXTENDED DATA FIG. 3 BOXPLOTS SHOWING

THE DISTRIBUTIONS OF (LOG2 OF) REDUCTION FACTORS OF THE SAMPLE SIZE _N_ NEEDED FOR 80% REPLICABILITY BY INCREASING BETWEEN-SUBJECT VARIABILITY OF THE COVARIATES ACROSS ALL THE ASSOCIATIONS

WITH EACH OF THE OUTCOMES IN ABCD (FIG. 4 AND EXTENDED DATA FIG. 2). The reduction factors are derived by comparing the sample sizes needed for 80% replicability with U-shaped to the one

with bell-shaped between-subject sampling scheme when the within-subject sampling scheme is bell-shaped (Extended Data Fig. 1b). GMV, gray matter volume; CT, cortical thickness; FC,

functional connectivity. Boxplots show the median (horizontal line), interquartile range (grey box), min-max values (vertical lines), and outliers (points). EXTENDED DATA FIG. 4 LONGITUDINAL

STUDY DESIGNS CAN REDUCE STANDARDIZED EFFECT SIZES (ESS) AND REPLICABILITY. Boxplots show the distributions of the standardized ESs across regions. The cross-sectional analyses use only the

baseline or the 2nd measures (indicated by “1st”s or “2nd”s on the x-axes, respectively). The longitudinal analyses use the full longitudinal data (indicated by “all”s on the x-axes). (A-C)

Cross-sectional analyses can have larger standardized ESs than the same longitudinal analyses for structural brain measures in ABCD. (D) The functional connectivity (FC) measures have a

slight benefit of longitudinal modeling. GMV, grey matter volume; CT, cortical thickness. B-D, Boxplots show the median (horizontal line), interquartile range (grey box), and min-max values

(vertical lines). EXTENDED DATA FIG. 5 THE INFLUENCE OF SAMPLING SCHEMES ON THE STANDARDIZED EFFECT SIZES (ESS) FOR BETWEEN- AND WITHIN-SUBJECT ASSOCIATIONS, RESPECTIVELY, OF COGNITION,

MENTAL HEALTH, AND DEMOGRAPHIC COVARIATES WITH DIFFERENT BRAIN MEASURES IN THE ABCD STUDY AT _N_ = 500. Boxplots show the distribution of the standardized ESs across regions. Between-subject

standardized ESs are predominantly affected by the between-subject variance, whereas within-subject standardized ESs are predominantly affected by the within-subject variance. Consistent

results were found for structural brain measures total grey matter volume (GMV; a, b), regional GMV (c-d), regional cortical thickness (CT; e,f) and functional brain measures (g,h). The

results for covariates birthweight and handedness, which do not vary within participants, are not included as the within-subject sampling schemes do not apply to them. C-H, Boxplots show the

median (horizontal line), interquartile range (box), and min-max values (vertical lines). EXTENDED DATA FIG. 6 THE ESTIMATED STANDARDIZED EFFECT SIZES (ESS) FROM CROSS-SECTIONAL AND

LONGITUDINAL ANALYSES, RESPECTIVELY, FOR THE BETWEEN-SUBJECT ASSOCIATIONS FOR COGNITION, MENTAL HEALTH, AND DEMOGRAPHIC COVARIATES WITH DIFFERENT BRAIN MEASURES IN THE ABCD STUDY. The

estimated RESIs for cross-sectional analyses (that only use the baseline measures) are indicated by “1st”s on the x-axes; the estimated RESIs for the between-subject effects from

longitudinal analyses (that use the full longitudinal data and a specification of separate between- and within-subject effects (see Methods: Estimation of the between-subject and

within-subject effects) are indicated by “all”s on the x-axes. By separating the between- and within-subject effects in the longitudinal model, we can avoid averaging the different between-

and within-subject effects and maintain the benefit of longitudinal designs on the estimated RESIs for the between-subject effects on both structural brain measures (a-c) and functional

brain measures (d). The results for covariates birthweight and handedness are not included, as they do not vary within-subjects so only their between-subject effects can be estimated (which

are shown in Extended Data Fig. 4). B-D, Boxplots show the median (horizontal line), interquartile range (gray box), and min-max values (vertical lines). EXTENDED DATA FIG. 7 DECISION TREE

FOR MODIFIED SAMPLING STRATEGY FOR A SINGLE PRIMARY COVARIATE. Random/representative sampling is needed to unbiasedly estimate the variance of the covariate distribution in the population in

order to obtain standardized effect size (ES) estimates consistent with the population. (A) A two-phase design is needed to modify the covariate distribution(s) in the sample to increase

standardized ESs and replicability, where random sampling is performed first in a larger dataset to collect covariate values and sampling based on collected covariates values is used to

optimize the standardized ESs and replicability; unbiased population standardized ES estimates still can be obtained using weighted estimation (see Discussion: Optimal design

considerations). (B) If the distribution(s) of the covariate(s) in the population is bell-shaped, a uniform covariate distribution in the sample can still increase the standardized ES and

replicability in detecting the overall association. (C) The particular target distribution will depend on the difficulty of collecting participants in the tail of the distributions (see

section 4.1 in Supplementary Information). EXTENDED DATA FIG. 8 OPTIMAL STUDY DESIGN AND ANALYSIS DEPENDS ON CHARACTERISTICS OF THE HYPOTHESIZED ASSOCIATION(S). (A) Visualization can be

performed in pilot or study data to evaluate this assumption as in Supplementary Fig. 4 (section 5 in Supplementary Information). (B) If the between- and within-subject effects are

hypothesized to be equal, either a cross-sectional or longitudinal design can be applied, but the efficiency per scan depends on the size of the within-subject error of the brain measure;

pilot/study data can be used to evaluate this question (section 5.1 in Supplementary Information). (C) If estimating the between- and within-subject effects separately, a longitudinal design

is required and common longitudinal data analysis tools such as generalized estimating equations (GEEs) and linear mixed models (LMMs) with separate between- and within-subject effects are

required to unbiasedly estimate these effects (see section 5.2 in Supplementary Information). (D) If there are different between- and within-subject effects, the investigators may still use

a model to target the average effect (i.e., a weighted average of the underlying between- and within-subject effects) if they have cross-sectional data, or if they want results from a

longitudinal study that are consistent for the same biological effect as cross-sectional studies. For longitudinal studies, a GEE with independence working covariance structure targets the

same average effect as the cross-sectional model, but it is less statistically efficient than the cross-sectional model (see section 5.3 in Supplementary Information). All recommendations

are based on the empirical findings in the paper and the theory for exchangeable covariance longitudinal linear models in the Supplementary Information. SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION REPORTING SUMMARY PEER REVIEW FILE RIGHTS AND PERMISSIONS OPEN ACCESS This article is licensed under a Creative Commons Attribution 4.0 International License, which

permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to

the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless

indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or

exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Reprints

and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Kang, K., Seidlitz, J., Bethlehem, R.A.I. _et al._ Study design features increase replicability in brain-wide association studies.

_Nature_ 636, 719–727 (2024). https://doi.org/10.1038/s41586-024-08260-9 Download citation * Received: 27 May 2023 * Accepted: 21 October 2024 * Published: 27 November 2024 * Issue Date: 19

December 2024 * DOI: https://doi.org/10.1038/s41586-024-08260-9 SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable link Sorry, a

shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative

Coronation Street newcomer Lou Michaelis' future appears to be sealed thanks to one co-star. On Friday's episo...

Leicester man filmed smuggling items into prison

A Leicester man smuggled £4,500 worth of drugs and tobacco into a prison during a visit to an inmate, a court heard. Fab...

Large fire creates smoke plumes visible in leicestershire

A major fire near the Leicestershire border is producing clouds of thick, black, smoke, which can be seen for miles. The...

All the pharmacies open this bank holiday weekend

Various pharmacies will be open over the bank holiday to allow people to collect prescriptions and buy over the counter ...

Cut-price lazio transfer won't help leicester city

Lazio remain interested in Leicester City centre-back Wout Faes and could strike a cut-price €10m (£8.4m) deal for the B...

Latests News

Study design features increase replicability in brain-wide association studies

ABSTRACT Brain-wide association studies (BWAS) are a fundamental tool in discovering brain–behaviour associations1,2. Se...

Channelnews : seven west media upgrade fy22 guidance

Seven West Media is expecting a bumper financial year, upgrading its guidance today for FY22. The broadcaster, who is in...

How jimmy carter made an impact on my life

Which is where the afraid part comes in. If his name was on this build, it was going to go well, and he really had high ...

Trump campaign manager paul manafort put under house arrest

Both men pleaded not guilty to a series of charges in a 12-count indictment in a Washington DC court appearance. The cha...

Good governance paper no. 7: executive branch vacancies

[Editors’ note: This essay is one in a series—the _Good Governance Papers_—organized by _Just Security_. In these essays...

Menu

Study design features increase replicability in brain-wide association studies

Study design features increase replicability in brain-wide association studies"

Play all audios:

Trending News

Latests News