Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Promoting and supporting credibility in neuroscience:

Promoting and supporting credibility in neuroscience: 844167 BNA0010.1177/2398212819844167Brain and Neuroscience AdvancesRousselet et al. editorial2019 Editorial Brain and Neuroscience Advances Volume 3: 1 –4 The Author(s) 2019 Promoting and supporting credibility Article reuse guidelines: sagepub.com/journals-permissions in neuroscience https://doi.org/10.1177/2398212819844167 DOI: 10.1177/2398212819844167 journals.sagepub.com/home/bna 1 2 Guillaume A. Rousselet , Georgina Hazell , 2 3 Anne Cooke and Jeffrey W. Dalley Received: 25 March 2019; accepted: 26 March 2019 mainly refer to reproducibility and replicability in this editorial Introduction – but to aid understanding it is important to first describe these A core objective of the British Neuroscience Association (BNA) three terms. As the terms reproducibility and replicability are is to promote and support credibility in neuroscience. Creeping often used interchangeably, it is useful to define them sepa- changes in the research culture have created a major problem for rately. An analysis can be defined as reproducible if an inde- science today. Historically, scientific data that were dramatic, pendent researcher can obtain the same numerical results when novel and positive had been valued and rewarded much more provided with data and code from the original study (Peng, highly than incremental, reproduced or null results. Although 2015). An effect is defined as replicable if a new experiment, novel and positive results are indeed to be celebrated, doing so at following the exact protocol that led to the original result, pro- the cost of ignoring replication studies or null findings has led to duces results similar to the original ones. Replicability thus a marked reduction in reproducible, replicable and reliable sci- depends in part on the reproducibility of the methods and is also ence research (Fanelli, 2010, 2012). less clearly defined than reproducibility because it depends on While the issue of scientific credibility is now being addressed defining an acceptable level of similarity for two results. by many research councils, institutes and journals, which support Finally, reliability mainly relates to the accuracy of the scien- and adopt credibility initiatives, the archaic ‘publish or perish’ atti- tific tools employed. tude still resonates throughout our neuroscience community. Neuroscience can learn much from fields that have already turned the credibility spotlight on themselves (e.g. Psychology), as well as Data and code sharing organisations such as the Centre for Open Science (COS, USA) and The cornerstone of reproducibility is the availability of data and the UK Reproducibility Network (UKRN) who seek to increase the analysis code. While we are not making data sharing compul- ‘openness, integrity, and reproducibility of scientific research’. sory, we request that every article contains a data sharing state- Over the coming years, a core objective of the BNA is to pro- ment, indicating where the data and analysis code can be mote and support credibility in neuroscience, facilitating a cul- downloaded. If data are not available, a reason for not sharing tural shift away from ‘publish or perish’ towards one which is must be provided. Sharing on demand by contacting the authors best for neuroscience, neuroscientists, policymakers and the pub- is not a viable option in the short or the long term and will not be lic. Among many of our credibility activities, we will lead by accepted as a valid statement (Houtkoop et al., 2018). Articles example by ensuring that our journal, Brain and Neuroscience providing a URL or DOI to a third-party public repository con- Advances, exemplifies scientific practices that aim to improve taining their data and analysis code will be flagged by an ‘Open the reproducibility, replicability and reliability of neuroscience data’ badge. research. To support these practices, we are implementing some of the Transparency and Openness Promotion (TOP) guidelines, including badges for open data, open materials and preregistered studies. The journal also offers the Registered Report (RR) article Institute of Neuroscience and Psychology, College of Medical, format. In this editorial, we describe our expectations for articles Veterinary and Life Sciences, University of Glasgow, Glasgow, UK submitted to Brain and Neuroscience Advances. British Neuroscience Association, Bristol, UK Department of Psychology, University of Cambridge, Cambridge, UK Reproducibility, replicability and Corresponding author: Guillaume A. Rousselet, Institute of Neuroscience and Psychology, reliability College of Medical, Veterinary and Life Sciences, University of Glasgow, Three fundamental markers of credibility are the reproducibil- Glasgow G12 8QB, UK. Email: Guillaume.Rousselet@glasgow.ac.uk ity, replicability and reliability of neuroscience research. We Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License (http://www.creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage). 2 Brain and Neuroscience Advances increased false positives; robust statistics can address these Transparent reporting issues and help provide a better understanding of the data Transparent reporting is key to replicability, starting with a clear (Wilcox and Rousselet, 2018). Relying on standard techniques description of sample sizes involved at every level of analysis can also lead to inaccurate sample size estimation when plan- (Weissgerber et al., 2016). For instance, the number of subjects ning for statistical power. Sample size does not need to be justi- and the number of measurements per subject should be justified fied based on statistical power: another approach is to focus on and described in the Methods section. Sample sizes should also estimation accuracy (Peters and Crutzen, 2017; Rothman and be clearly indicated in figures or figure captions and for each Greenland, 2018). analysis, unless sample size is constant across all analyses. Articles providing a URL or DOI to a third-party public reposi- tory containing their experimental materials will be flagged by an Significance ‘Open material’ badge. Materials are field dependent and could Adding the qualifier ‘significant’ or ‘not significant’ after a p include, for instance, auditory stimuli and the code to present value does not add any information. It only provides a false sense them to participants, or a detailed lab notebook describing all the of certainty that no statistical technique can provide (Gelman, steps carried out at the bench. 2018). Instead, readers should be provided with sufficient infor- To let readers assess the results, as much as possible we mation to decide for themselves what they think of the results. request detailed illustrations of the observations, irrespective of Indeed, the American Statistical Association’s (ASA) statement the outcome of statistical analyses. In particular, we do not accept on p values clearly state that ‘Scientific conclusions and business bar and line charts that hide distributions of observations and or policy decisions should not be based only on whether a p-value valuable information about the nature of the effects (Rousselet passes a specific threshold’ (Wasserstein and Lazar, 2016). In et al., 2016; Weissgerber et al., 2015). We expect authors to take addition to p values, authors should consider carefully the infor- advantage of modern software to make the most of their data and mation provided by effect sizes, confidence intervals and other convey an informative and nuanced description of the results to sources of information, and put that information in context the readers (Rousselet et al., 2017; Wickham, 2016). Enough (Amrhein et al., 2018; McShane et al., 2017). information must also be provided about the statistical tests per- Going beyond p values, assessing the practical significance of formed (Weissgerber et al., 2018). research findings is critical, in part because it is trivial to achieve ‘significance’ from noisy measurements or large collections of Transparent contribution reporting samples, even in the absence of underlying effects (Loken and Gelman, 2017). The problem gets worse with many implicit or They are many ways authors could have contributed to an article. explicit researchers’ degrees of freedom (Forstmeier et al., 2016; To recognise and acknowledge this diversity, a Contributions Simmons et al., 2011). And when dealing with small samples, section must list the specific roles of everyone involved. To help filtering by significance leads to inflated effect sizes, or even reporting this important information, we recommend the CRediT effect sizes in the wrong direction – the so-called type M and type taxonomy (https://www.casrai.org/credit.html) S errors (Gelman and Carlin, 2014). A more productive approach is to describe the methods and results in as much detail as possi- ble, share data and code and let readers make their own mind Statistical reporting about the results, without forcing artificial dichotomies on the readers. In particular, we request that authors declare if all meas- Graphs ures and statistical analyses have been reported. The first step in reporting statistical analyses is to describe the results in detail using graphical representations. In many situa- Registered reports tions, detailed graphs are sufficient to characterise a dataset with- out also presenting statistical tests, especially if the goal of a The introduction of the RR format aims to improve the reproduc- study is to estimate the size of an effect. Along with others, we ibility and replicability of neuroscience research. A thorough believe that a focus on estimation is the most productive way to description of RR is available on the Open Science Framework conduct and report statistical analyses (Cumming, 2014; (OSF) website (https://cos.io/rr/), in particular this FAQ (https:// Kruschke and Liddell, 2018). osf.io/gha9f/). In short, unlike standard research articles, RRs have been designed to minimise questionable research practices (such as p-hacking and HARKing) as well as the publishing Analysis incentives that promote them (Chambers et al., 2015). The RR Whatever the graph choices, authors must justify them explic- format is applicable to standard studies, replication studies and itly, as well as the choice of statistical tests, alpha level for error studies planning the analysis of existing datasets. At the core of control, sample size and hypotheses tested (Lakens et al., 2018). RR is an innovative reviewing process, in which the Introduction Common choices include using t-tests on means, alpha = 0.05 and Methods sections are reviewed before the research is carried and a null hypothesis, but these choices are often inappropriate. out. Articles are thus evaluated solely based on the importance of In particular, many types of variables quantified in neuroscience the topic and the quality of the research methods and analyses, projects violate the core assumptions of techniques such as not based on the results. Thus, RRs offer a great way to improve standard t-tests, analyses of variance (ANOVAs), correlations experimental methods to make the most out of lab resources by and regressions, potentially leading to lower statistical power or getting feedback from experts when it matters most: before data Rousselet et al. 3 collection, not after. Receiving feedback from the scientific com- Exploratory research munity prior to commencing the actual experiments helps By promoting RR and confirmatory research, we do not imply improve experimental designs, the choice of tools, data quantifi- that exploratory research should be discouraged. High-quality cation methods and statistical tests. We all make mistakes or are exploratory research is necessary to the research enterprise by unaware of better alternatives, or both, and this should not be providing useful results that can be used to build theories and gen- demonised – instead we should work together to find solutions, erate hypotheses, which in turn can be tested using a confirmatory make the best of our limited resources, which in turn will increase approach (McIntosh, 2017). After all, some of the best work in the reproducibility, replicability and reliability of our studies. neuroscience was exploratory, for instance, the Nobel Prize work Proposals of sufficient quality are approved to progress to the of Hubel and Wiesel: ‘Looking back, Hubel considered that their data collection stage. Providing the authors followed the methods research “was by and large a huge fishing trip”’ (Martin, 2014). discussed and agreed during stage 1 and reached sensible conclu- More suboptimal is to present exploratory research as con- sions about the results, the article is accepted for publication no firmatory, because while it is easy to obtain ‘significant’ results matter how the results turned up. and to write post hoc stories about them, such findings do not This two-step process clearly delineates exploratory from tend to replicate and thus undermine the credibility of research. confirmatory research (Forstmeier et al., 2016; Wagenmakers Adding p values to exploratory findings cannot make them more et al., 2012), such that readers can trust a study does not suffer than what they are, certainly not turn them into confirmatory from p-hacking and HARKing for instance (Kerr, 1998; Simmons results. In fact, p values can be difficult to interpret for explora- et al., 2011). RRs incorporate other critical features aimed at tory research (Kruschke and Liddell, 2018; Wagenmakers, 2007). boosting research credibility, for instance, mandatory data and While we currently do not offer an exploratory report format, code sharing for reproducibility, and the demand for at least 90% we welcome submission of high-quality exploratory work, on its power to improve replicability. own, or as part of a stage 1 RR submission. Exploratory research Why 0.9 power and not the more traditional 0.8? Both are should be presented without p values or confidence intervals. completely arbitrary values. But let us look at this choice from the perspective of replicability: ‘Studies are often designed or claimed to have 80% power against a key alternative when using A bright future for neuroscience a 0.05 significance level, although in execution often have less power due to unanticipated problems such as low subject recruit- By creating a format that encourages transparent reporting ment. Thus, if the alternative is correct and the actual power of (including negative results), rigorous statistical analyses, sharing two studies is 80%, the chance that the studies will both show at all stages of discovery and highlighting individual authors’ P ⩽ 0.05 will at best be only 0.80(0.80) = 64%; furthermore, the contributions, we hope to increase the reproducibility, replicabil- chance that one study shows P ⩽ 0.05 and the other does not ity and reliability of the research published in Brain and (and thus will be misinterpreted as showing conflicting results) Neuroscience Advances and provide benefits to all involved. We is 2(0.80)0.20 = 32% or about 1 chance in 3’ (Greenland et al., also encourage other neuroscience (and non-neuroscience) jour- 2016). With 90% power, the chance that the two studies will nals to continue to adopt these initiatives. It is only when we show p ⩽ 0.05 will at most be 0.90(0.90) = 81%. This is much work together as a neuroscience community that we will achieve better than 64%, although it still leaves the door open for a large more productive and beneficial investigations – and in turn amount of apparent discrepancies among studies if the outcomes improve the trust of the public in our research. are judged solely on the basis of p values. Also, power estima- tion typically assumes that the data do not violate tests’ expecta- Declaration of conflicting interests tions and that there are no measurement noise and other sources The author(s) declared no potential conflicts of interest with respect to of variability beyond random sampling. As such, the actual the research, authorship and/or publication of this article. power of a line of research will necessarily be lower than antici- pated. Hence, we feel that aiming for at least 90% power is Funding entirely justified, given that in practice power will tend to be lower. The author(s) received no financial support for the research, authorship and/or publication of this article. Preregistration ORCID iD In addition to RR, the journal welcomes the submission of pre- Guillaume A. Rousselet https://orcid.org/0000-0003-0006-8729 registered work, for instance, using the OSF. Unlike RR, prereg- istered studies are not reviewed before data collection or data References analysis. Preregistration affords only some of the benefits of RR: Amrhein V, Trafimow D and Greenland S (2018) Inferential statistics as most notably it allows a clear demarcation between confirmatory descriptive statistics: There is no replication crisis if we don’t expect and exploratory analyses; it also enhances the discoverability of replication. The American Statistician 73(Suppl. 1): 262–270. research that might not be ultimately published. If authors can Chambers CD, Dienes Z, McIntosh RD, et al. (2015) Registered reports: provide a public time-stamped document describing their experi- Realigning incentives in scientific publishing. Cortex 66: A1–A2. mental design and analysis protocol, dated before the start of data Cumming G (2014) The new statistics: Why and how. Psychological Sci- collection or data examination, their articles could be awarded a ence 25(1): 7–29. ‘Preregistered’ badge. A badge can also be obtained when only Fanelli D (2010) ‘Positive’ results increase down the hierarchy of the the analyses are preregistered. sciences. PLoS ONE 5(4): e10068. 4 Brain and Neuroscience Advances Fanelli D (2012) Negative results are disappearing from most disciplines Rates: Accuracy in Parameter Estimation as a Partial Solution to the and countries. Scientometrics 90(3): 891–904. Replication Crisis. PsyArXiv Epub ahead of print: 31 March 2017, Forstmeier W, Wagenmakers E-J and Parker TH (2016) Detecting and DOI: 10.31234/osf.io/cjsk2. avoiding likely false-positive findings – A practical guide. Biologi- Rothman KJ and Greenland S (2018) Planning Study Size Based on Pre- cal Reviews 92(4): 1941–1968. cision Rather Than Power. Epidemiology 29: 599. Gelman A (2018) The failure of null hypothesis significance testing when Rousselet GA, Foxe JJ and Bolam JP (2016) A few simple steps to studying incremental changes, and what to do about it. Personality improve the description of group results in neuroscience. European and Social Psychology Bulletin 44(1): 16–23. Journal of Neuroscience 44(9): 2647–2651. Gelman A and Carlin J (2014) Beyond power calculations: Assessing Rousselet GA, Pernet CR and Wilcox RR (2017) Beyond differences in type S (sign) and type M (magnitude) errors. Perspectives on Psy- means: robust graphical methods to compare two groups in neurosci- chological Science 9(6): 641–651. ence. European Journal of Neuroscience 46(2): 1738–1748. Greenland S, Senn SJ, Rothman KJ, et al. (2016) Statistical tests, P val- Simmons JP, Nelson LD and Simonsohn U (2011) False-positive psy- ues, confidence intervals, and power: A guide to misinterpretations. chology: Undisclosed flexibility in data collection and analysis European Journal of Epidemiology 31(4): 337–350. allows presenting anything as significant. Psychological Science Houtkoop BL, Chambers C, Macleod M, et al. (2018) Data sharing in 22(11): 1359–1366. psychology: A survey on barriers and preconditions. Advances in Wagenmakers E-J (2007) A practical solution to the pervasive problems Methods and Practices in Psychological Science 1(1): 70–85. of P values. Psychonomic Bulletin & Review 14(2): 779–804. Kerr NL (1998) HARKing: Hypothesizing after the results are known. Wagenmakers E-J, Wetzels R, Borsboom D, et al. (2012) An agenda for Personality and Social Psychology Review 2(3): 196–217. purely confirmatory research. Perspectives on Psychological Science Kruschke JK and Liddell TM (2018) The Bayesian new statistics: 7(6): 632–638. Hypothesis testing, estimation, meta-analysis, and power analysis Wasserstein RL and Lazar NA (2016) The ASA’s statement on P-values: from a Bayesian perspective. Psychonomic Bulletin & Review 25(1): Context, process, and purpose. The American Statistician 70(2): 178–206. 129–133. Lakens D, Adolfi FG, Albers CJ, et al. (2018) Justify your alpha. Nature Weissgerber TL, Garcia-Valencia O, Garovic VD, et al. (2018) Why we Human Behaviour 2: 168. need to report more than ‘data were analyzed by T-tests or ANOVA’. Loken E and Gelman A (2017) Measurement error and the replication Elife 7: e36163. crisis. Science 355(6325): 584–585. Weissgerber TL, Garovic VD, Winham SJ, et al. (2016) Transparent McIntosh RD (2017) Exploratory reports: A new article type for Cortex. reporting for reproducible science. Journal of Neuroscience Research Cortex 96: A1–A4. 94(10): 859–864. McShane BB, Gal D, Gelman A, et al. (2017) Abandon statistical signifi- Weissgerber TL, Milic NM, Winham SJ, et al. (2015) Beyond bar and cance. arXiv. Available at: https://arxiv.org/abs/1709.07588 line graphs: Time for a new data presentation paradigm. PLOS Biol- Martin KAC (2014) David H. Hubel (1926–2013). Current Biology ogy 13(4): e1002128. 24(1): PR4–PR7. Wickham H (2016) Ggplot2: Elegant Graphics for Data Analysis (2nd Peng R (2015) The reproducibility crisis in science: A statistical counter- edn). London: Springer. attack. Significance 12: 30–32. Wilcox RR and Rousselet GA (2018) A guide to robust statistical Peters GY and Crutzen R (2017) Knowing How Effective an Interven- methods in neuroscience. Current Protocols in Neuroscience 82: tion, Treatment, or Manipulation Is and Increasing Replication 8.42.1–8.42.30. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Brain and Neuroscience Advances SAGE

Promoting and supporting credibility in neuroscience:

Loading next page...
 
/lp/sage/promoting-and-supporting-credibility-in-neuroscience-GiznF37eOL

References (40)

Publisher
SAGE
Copyright
Copyright © 2022 by SAGE Publications Ltd and British Neuroscience Association, unless otherwise noted. Manuscript content on this site is licensed under Creative Commons Licenses
ISSN
2398-2128
eISSN
2398-2128
DOI
10.1177/2398212819844167
Publisher site
See Article on Publisher Site

Abstract

844167 BNA0010.1177/2398212819844167Brain and Neuroscience AdvancesRousselet et al. editorial2019 Editorial Brain and Neuroscience Advances Volume 3: 1 –4 The Author(s) 2019 Promoting and supporting credibility Article reuse guidelines: sagepub.com/journals-permissions in neuroscience https://doi.org/10.1177/2398212819844167 DOI: 10.1177/2398212819844167 journals.sagepub.com/home/bna 1 2 Guillaume A. Rousselet , Georgina Hazell , 2 3 Anne Cooke and Jeffrey W. Dalley Received: 25 March 2019; accepted: 26 March 2019 mainly refer to reproducibility and replicability in this editorial Introduction – but to aid understanding it is important to first describe these A core objective of the British Neuroscience Association (BNA) three terms. As the terms reproducibility and replicability are is to promote and support credibility in neuroscience. Creeping often used interchangeably, it is useful to define them sepa- changes in the research culture have created a major problem for rately. An analysis can be defined as reproducible if an inde- science today. Historically, scientific data that were dramatic, pendent researcher can obtain the same numerical results when novel and positive had been valued and rewarded much more provided with data and code from the original study (Peng, highly than incremental, reproduced or null results. Although 2015). An effect is defined as replicable if a new experiment, novel and positive results are indeed to be celebrated, doing so at following the exact protocol that led to the original result, pro- the cost of ignoring replication studies or null findings has led to duces results similar to the original ones. Replicability thus a marked reduction in reproducible, replicable and reliable sci- depends in part on the reproducibility of the methods and is also ence research (Fanelli, 2010, 2012). less clearly defined than reproducibility because it depends on While the issue of scientific credibility is now being addressed defining an acceptable level of similarity for two results. by many research councils, institutes and journals, which support Finally, reliability mainly relates to the accuracy of the scien- and adopt credibility initiatives, the archaic ‘publish or perish’ atti- tific tools employed. tude still resonates throughout our neuroscience community. Neuroscience can learn much from fields that have already turned the credibility spotlight on themselves (e.g. Psychology), as well as Data and code sharing organisations such as the Centre for Open Science (COS, USA) and The cornerstone of reproducibility is the availability of data and the UK Reproducibility Network (UKRN) who seek to increase the analysis code. While we are not making data sharing compul- ‘openness, integrity, and reproducibility of scientific research’. sory, we request that every article contains a data sharing state- Over the coming years, a core objective of the BNA is to pro- ment, indicating where the data and analysis code can be mote and support credibility in neuroscience, facilitating a cul- downloaded. If data are not available, a reason for not sharing tural shift away from ‘publish or perish’ towards one which is must be provided. Sharing on demand by contacting the authors best for neuroscience, neuroscientists, policymakers and the pub- is not a viable option in the short or the long term and will not be lic. Among many of our credibility activities, we will lead by accepted as a valid statement (Houtkoop et al., 2018). Articles example by ensuring that our journal, Brain and Neuroscience providing a URL or DOI to a third-party public repository con- Advances, exemplifies scientific practices that aim to improve taining their data and analysis code will be flagged by an ‘Open the reproducibility, replicability and reliability of neuroscience data’ badge. research. To support these practices, we are implementing some of the Transparency and Openness Promotion (TOP) guidelines, including badges for open data, open materials and preregistered studies. The journal also offers the Registered Report (RR) article Institute of Neuroscience and Psychology, College of Medical, format. In this editorial, we describe our expectations for articles Veterinary and Life Sciences, University of Glasgow, Glasgow, UK submitted to Brain and Neuroscience Advances. British Neuroscience Association, Bristol, UK Department of Psychology, University of Cambridge, Cambridge, UK Reproducibility, replicability and Corresponding author: Guillaume A. Rousselet, Institute of Neuroscience and Psychology, reliability College of Medical, Veterinary and Life Sciences, University of Glasgow, Three fundamental markers of credibility are the reproducibil- Glasgow G12 8QB, UK. Email: Guillaume.Rousselet@glasgow.ac.uk ity, replicability and reliability of neuroscience research. We Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License (http://www.creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage). 2 Brain and Neuroscience Advances increased false positives; robust statistics can address these Transparent reporting issues and help provide a better understanding of the data Transparent reporting is key to replicability, starting with a clear (Wilcox and Rousselet, 2018). Relying on standard techniques description of sample sizes involved at every level of analysis can also lead to inaccurate sample size estimation when plan- (Weissgerber et al., 2016). For instance, the number of subjects ning for statistical power. Sample size does not need to be justi- and the number of measurements per subject should be justified fied based on statistical power: another approach is to focus on and described in the Methods section. Sample sizes should also estimation accuracy (Peters and Crutzen, 2017; Rothman and be clearly indicated in figures or figure captions and for each Greenland, 2018). analysis, unless sample size is constant across all analyses. Articles providing a URL or DOI to a third-party public reposi- tory containing their experimental materials will be flagged by an Significance ‘Open material’ badge. Materials are field dependent and could Adding the qualifier ‘significant’ or ‘not significant’ after a p include, for instance, auditory stimuli and the code to present value does not add any information. It only provides a false sense them to participants, or a detailed lab notebook describing all the of certainty that no statistical technique can provide (Gelman, steps carried out at the bench. 2018). Instead, readers should be provided with sufficient infor- To let readers assess the results, as much as possible we mation to decide for themselves what they think of the results. request detailed illustrations of the observations, irrespective of Indeed, the American Statistical Association’s (ASA) statement the outcome of statistical analyses. In particular, we do not accept on p values clearly state that ‘Scientific conclusions and business bar and line charts that hide distributions of observations and or policy decisions should not be based only on whether a p-value valuable information about the nature of the effects (Rousselet passes a specific threshold’ (Wasserstein and Lazar, 2016). In et al., 2016; Weissgerber et al., 2015). We expect authors to take addition to p values, authors should consider carefully the infor- advantage of modern software to make the most of their data and mation provided by effect sizes, confidence intervals and other convey an informative and nuanced description of the results to sources of information, and put that information in context the readers (Rousselet et al., 2017; Wickham, 2016). Enough (Amrhein et al., 2018; McShane et al., 2017). information must also be provided about the statistical tests per- Going beyond p values, assessing the practical significance of formed (Weissgerber et al., 2018). research findings is critical, in part because it is trivial to achieve ‘significance’ from noisy measurements or large collections of Transparent contribution reporting samples, even in the absence of underlying effects (Loken and Gelman, 2017). The problem gets worse with many implicit or They are many ways authors could have contributed to an article. explicit researchers’ degrees of freedom (Forstmeier et al., 2016; To recognise and acknowledge this diversity, a Contributions Simmons et al., 2011). And when dealing with small samples, section must list the specific roles of everyone involved. To help filtering by significance leads to inflated effect sizes, or even reporting this important information, we recommend the CRediT effect sizes in the wrong direction – the so-called type M and type taxonomy (https://www.casrai.org/credit.html) S errors (Gelman and Carlin, 2014). A more productive approach is to describe the methods and results in as much detail as possi- ble, share data and code and let readers make their own mind Statistical reporting about the results, without forcing artificial dichotomies on the readers. In particular, we request that authors declare if all meas- Graphs ures and statistical analyses have been reported. The first step in reporting statistical analyses is to describe the results in detail using graphical representations. In many situa- Registered reports tions, detailed graphs are sufficient to characterise a dataset with- out also presenting statistical tests, especially if the goal of a The introduction of the RR format aims to improve the reproduc- study is to estimate the size of an effect. Along with others, we ibility and replicability of neuroscience research. A thorough believe that a focus on estimation is the most productive way to description of RR is available on the Open Science Framework conduct and report statistical analyses (Cumming, 2014; (OSF) website (https://cos.io/rr/), in particular this FAQ (https:// Kruschke and Liddell, 2018). osf.io/gha9f/). In short, unlike standard research articles, RRs have been designed to minimise questionable research practices (such as p-hacking and HARKing) as well as the publishing Analysis incentives that promote them (Chambers et al., 2015). The RR Whatever the graph choices, authors must justify them explic- format is applicable to standard studies, replication studies and itly, as well as the choice of statistical tests, alpha level for error studies planning the analysis of existing datasets. At the core of control, sample size and hypotheses tested (Lakens et al., 2018). RR is an innovative reviewing process, in which the Introduction Common choices include using t-tests on means, alpha = 0.05 and Methods sections are reviewed before the research is carried and a null hypothesis, but these choices are often inappropriate. out. Articles are thus evaluated solely based on the importance of In particular, many types of variables quantified in neuroscience the topic and the quality of the research methods and analyses, projects violate the core assumptions of techniques such as not based on the results. Thus, RRs offer a great way to improve standard t-tests, analyses of variance (ANOVAs), correlations experimental methods to make the most out of lab resources by and regressions, potentially leading to lower statistical power or getting feedback from experts when it matters most: before data Rousselet et al. 3 collection, not after. Receiving feedback from the scientific com- Exploratory research munity prior to commencing the actual experiments helps By promoting RR and confirmatory research, we do not imply improve experimental designs, the choice of tools, data quantifi- that exploratory research should be discouraged. High-quality cation methods and statistical tests. We all make mistakes or are exploratory research is necessary to the research enterprise by unaware of better alternatives, or both, and this should not be providing useful results that can be used to build theories and gen- demonised – instead we should work together to find solutions, erate hypotheses, which in turn can be tested using a confirmatory make the best of our limited resources, which in turn will increase approach (McIntosh, 2017). After all, some of the best work in the reproducibility, replicability and reliability of our studies. neuroscience was exploratory, for instance, the Nobel Prize work Proposals of sufficient quality are approved to progress to the of Hubel and Wiesel: ‘Looking back, Hubel considered that their data collection stage. Providing the authors followed the methods research “was by and large a huge fishing trip”’ (Martin, 2014). discussed and agreed during stage 1 and reached sensible conclu- More suboptimal is to present exploratory research as con- sions about the results, the article is accepted for publication no firmatory, because while it is easy to obtain ‘significant’ results matter how the results turned up. and to write post hoc stories about them, such findings do not This two-step process clearly delineates exploratory from tend to replicate and thus undermine the credibility of research. confirmatory research (Forstmeier et al., 2016; Wagenmakers Adding p values to exploratory findings cannot make them more et al., 2012), such that readers can trust a study does not suffer than what they are, certainly not turn them into confirmatory from p-hacking and HARKing for instance (Kerr, 1998; Simmons results. In fact, p values can be difficult to interpret for explora- et al., 2011). RRs incorporate other critical features aimed at tory research (Kruschke and Liddell, 2018; Wagenmakers, 2007). boosting research credibility, for instance, mandatory data and While we currently do not offer an exploratory report format, code sharing for reproducibility, and the demand for at least 90% we welcome submission of high-quality exploratory work, on its power to improve replicability. own, or as part of a stage 1 RR submission. Exploratory research Why 0.9 power and not the more traditional 0.8? Both are should be presented without p values or confidence intervals. completely arbitrary values. But let us look at this choice from the perspective of replicability: ‘Studies are often designed or claimed to have 80% power against a key alternative when using A bright future for neuroscience a 0.05 significance level, although in execution often have less power due to unanticipated problems such as low subject recruit- By creating a format that encourages transparent reporting ment. Thus, if the alternative is correct and the actual power of (including negative results), rigorous statistical analyses, sharing two studies is 80%, the chance that the studies will both show at all stages of discovery and highlighting individual authors’ P ⩽ 0.05 will at best be only 0.80(0.80) = 64%; furthermore, the contributions, we hope to increase the reproducibility, replicabil- chance that one study shows P ⩽ 0.05 and the other does not ity and reliability of the research published in Brain and (and thus will be misinterpreted as showing conflicting results) Neuroscience Advances and provide benefits to all involved. We is 2(0.80)0.20 = 32% or about 1 chance in 3’ (Greenland et al., also encourage other neuroscience (and non-neuroscience) jour- 2016). With 90% power, the chance that the two studies will nals to continue to adopt these initiatives. It is only when we show p ⩽ 0.05 will at most be 0.90(0.90) = 81%. This is much work together as a neuroscience community that we will achieve better than 64%, although it still leaves the door open for a large more productive and beneficial investigations – and in turn amount of apparent discrepancies among studies if the outcomes improve the trust of the public in our research. are judged solely on the basis of p values. Also, power estima- tion typically assumes that the data do not violate tests’ expecta- Declaration of conflicting interests tions and that there are no measurement noise and other sources The author(s) declared no potential conflicts of interest with respect to of variability beyond random sampling. As such, the actual the research, authorship and/or publication of this article. power of a line of research will necessarily be lower than antici- pated. Hence, we feel that aiming for at least 90% power is Funding entirely justified, given that in practice power will tend to be lower. The author(s) received no financial support for the research, authorship and/or publication of this article. Preregistration ORCID iD In addition to RR, the journal welcomes the submission of pre- Guillaume A. Rousselet https://orcid.org/0000-0003-0006-8729 registered work, for instance, using the OSF. Unlike RR, prereg- istered studies are not reviewed before data collection or data References analysis. Preregistration affords only some of the benefits of RR: Amrhein V, Trafimow D and Greenland S (2018) Inferential statistics as most notably it allows a clear demarcation between confirmatory descriptive statistics: There is no replication crisis if we don’t expect and exploratory analyses; it also enhances the discoverability of replication. The American Statistician 73(Suppl. 1): 262–270. research that might not be ultimately published. If authors can Chambers CD, Dienes Z, McIntosh RD, et al. (2015) Registered reports: provide a public time-stamped document describing their experi- Realigning incentives in scientific publishing. Cortex 66: A1–A2. mental design and analysis protocol, dated before the start of data Cumming G (2014) The new statistics: Why and how. Psychological Sci- collection or data examination, their articles could be awarded a ence 25(1): 7–29. ‘Preregistered’ badge. A badge can also be obtained when only Fanelli D (2010) ‘Positive’ results increase down the hierarchy of the the analyses are preregistered. sciences. PLoS ONE 5(4): e10068. 4 Brain and Neuroscience Advances Fanelli D (2012) Negative results are disappearing from most disciplines Rates: Accuracy in Parameter Estimation as a Partial Solution to the and countries. Scientometrics 90(3): 891–904. Replication Crisis. PsyArXiv Epub ahead of print: 31 March 2017, Forstmeier W, Wagenmakers E-J and Parker TH (2016) Detecting and DOI: 10.31234/osf.io/cjsk2. avoiding likely false-positive findings – A practical guide. Biologi- Rothman KJ and Greenland S (2018) Planning Study Size Based on Pre- cal Reviews 92(4): 1941–1968. cision Rather Than Power. Epidemiology 29: 599. Gelman A (2018) The failure of null hypothesis significance testing when Rousselet GA, Foxe JJ and Bolam JP (2016) A few simple steps to studying incremental changes, and what to do about it. Personality improve the description of group results in neuroscience. European and Social Psychology Bulletin 44(1): 16–23. Journal of Neuroscience 44(9): 2647–2651. Gelman A and Carlin J (2014) Beyond power calculations: Assessing Rousselet GA, Pernet CR and Wilcox RR (2017) Beyond differences in type S (sign) and type M (magnitude) errors. Perspectives on Psy- means: robust graphical methods to compare two groups in neurosci- chological Science 9(6): 641–651. ence. European Journal of Neuroscience 46(2): 1738–1748. Greenland S, Senn SJ, Rothman KJ, et al. (2016) Statistical tests, P val- Simmons JP, Nelson LD and Simonsohn U (2011) False-positive psy- ues, confidence intervals, and power: A guide to misinterpretations. chology: Undisclosed flexibility in data collection and analysis European Journal of Epidemiology 31(4): 337–350. allows presenting anything as significant. Psychological Science Houtkoop BL, Chambers C, Macleod M, et al. (2018) Data sharing in 22(11): 1359–1366. psychology: A survey on barriers and preconditions. Advances in Wagenmakers E-J (2007) A practical solution to the pervasive problems Methods and Practices in Psychological Science 1(1): 70–85. of P values. Psychonomic Bulletin & Review 14(2): 779–804. Kerr NL (1998) HARKing: Hypothesizing after the results are known. Wagenmakers E-J, Wetzels R, Borsboom D, et al. (2012) An agenda for Personality and Social Psychology Review 2(3): 196–217. purely confirmatory research. Perspectives on Psychological Science Kruschke JK and Liddell TM (2018) The Bayesian new statistics: 7(6): 632–638. Hypothesis testing, estimation, meta-analysis, and power analysis Wasserstein RL and Lazar NA (2016) The ASA’s statement on P-values: from a Bayesian perspective. Psychonomic Bulletin & Review 25(1): Context, process, and purpose. The American Statistician 70(2): 178–206. 129–133. Lakens D, Adolfi FG, Albers CJ, et al. (2018) Justify your alpha. Nature Weissgerber TL, Garcia-Valencia O, Garovic VD, et al. (2018) Why we Human Behaviour 2: 168. need to report more than ‘data were analyzed by T-tests or ANOVA’. Loken E and Gelman A (2017) Measurement error and the replication Elife 7: e36163. crisis. Science 355(6325): 584–585. Weissgerber TL, Garovic VD, Winham SJ, et al. (2016) Transparent McIntosh RD (2017) Exploratory reports: A new article type for Cortex. reporting for reproducible science. Journal of Neuroscience Research Cortex 96: A1–A4. 94(10): 859–864. McShane BB, Gal D, Gelman A, et al. (2017) Abandon statistical signifi- Weissgerber TL, Milic NM, Winham SJ, et al. (2015) Beyond bar and cance. arXiv. Available at: https://arxiv.org/abs/1709.07588 line graphs: Time for a new data presentation paradigm. PLOS Biol- Martin KAC (2014) David H. Hubel (1926–2013). Current Biology ogy 13(4): e1002128. 24(1): PR4–PR7. Wickham H (2016) Ggplot2: Elegant Graphics for Data Analysis (2nd Peng R (2015) The reproducibility crisis in science: A statistical counter- edn). London: Springer. attack. Significance 12: 30–32. Wilcox RR and Rousselet GA (2018) A guide to robust statistical Peters GY and Crutzen R (2017) Knowing How Effective an Interven- methods in neuroscience. Current Protocols in Neuroscience 82: tion, Treatment, or Manipulation Is and Increasing Replication 8.42.1–8.42.30.

Journal

Brain and Neuroscience AdvancesSAGE

Published: Apr 10, 2019

There are no references for this article.