Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Practical Experience of the Application of a Weighted Burden Test to Whole Exome Sequence Data for Obesity and Schizophrenia

Practical Experience of the Application of a Weighted Burden Test to Whole Exome Sequence Data... doi: 10.1111/ahg.12135 Practical Experience of the Application of a Weighted Burden Test to Whole Exome Sequence Data for Obesity and Schizophrenia David Curtis and The UK10K Consortium UCL Genetics Institute, UCL, Darwin Building, Gower Street, London WC1E 6BT, UK Summary For biological and statistical reasons it makes sense to combine information from variants at the level of the gene. One may wish to give more weight to variants which are rare and those that are more likely to affect function. A combined weighting scheme, implemented in the SCOREASSOC program, was applied to whole exome sequence data for 1392 subjects with schizophrenia and 982 with obesity from the UK10K project. Results conformed fairly well with null hypothesis expectations and no individual gene was strongly implicated. However, a number of the higher ranked genes appear plausible candidates as being involved in one or other phenotype and may warrant further investigation. These include MC4R, NLGN2, CRP, DONSON, GTF3A, IL36B, ADCYAP1R1, ARSA, DLG1, SIK2, SLAIN1, UBE2Q2, ZNF507, CRHR1, MUSK, NSF, SNORD115, GDF3 and HIBADH. Some individual variants in these genes have different frequencies between cohorts and could be genotyped in additional subjects. For other genes, there is a general excess of variants at many different sites so attempts at replication would be more difficult. Overall, the weighted burden test provides a convenient method for using sequence data to highlight genes of interest. Keywords: Association, exome, burden test, DNA variant one is only interested in detecting variants with a large effect Introduction size then one may ignore common variants if the results of Although next generation sequencing has been used exten- genomewide association studies have demonstrated that there sively for the study of rare Mendelian diseases, there is less are no common variants with a large effect size. Another ap- experience of its application to large case-control association proach is to group variants together and analyse them jointly. studies of diseases with complex inheritance. Because of issues This may be done at the level of the gene, so that any cor- such as incomplete penetrance and allelic and locus hetero- rection only needs to be applied for the number of genes geneity, the task is to identify variants which are more frequent tested rather than the number of variants. Additionally, one in cases rather than variants which are shared by all cases and may group genes into sets according to biological function not seen in any controls or in the general population. There and this can be viewed as a way of reducing multiple testing are a number of approaches which can be used in an attempt still further. Thus, if one is unable to conclusively implicate a to gain power by reducing the necessary correction for multi- gene then one may at least succeed in implicating a pathway. ple testing. One general approach may be to restrict attention A recent case-control study of schizophrenia using whole to variants which are judged to be “important” according exome sequence from over 5000 subjects provides a useful il- to some criteria. These might relate to the predicted effect lustration of these approaches (Purcell et al., 2014). Attention of the variant or to which gene it occurs in. The frequency was focussed on genes deemed apriori to be of interest based of the variant might be used as a criterion; for example, if on previous GWAS, CNV and de novo SNV studies. Analyses were carried out with three different criteria for including variants based on their predicted effect on gene function. For each of these, three sets of allele frequency were used Corresponding author: DAVID CURTIS, UCL Genetics Institute, to include variants: singletons, up to 0.1% or up to 0.5% so UCL, Darwin Building, Gower Street, London WC1E 6BT, UK. that overall nine sets of analyses were carried out. Individual E-mail: d.curtis@ucl.ac.uk, Tel: 020 8702 3200, Fax: 020 3108 2194 38 Annals of Human Genetics (2016) 80,38–49 2015 The Authors. Annals of Human Genetics published by University College London (UCL) and John Wiley & Sons Ltd. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. Exome Study of Schizophrenia and Obesity variants were tested and in addition two gene-based tests were Variants were called with samtools/bcftools version 0.1.19- applied—a one-sided burden test for increased rare variants 3-g4b70907. GATK Unified Genotyper (v1.6-13-g91f02df) in cases and the SNP-set (sequence) kernel association test was only used to recall at SNP sites discovered by samtools. (SKAT), which tests for differences in case-control allele fre- This was to enable VQSR filtering of SNP calls. Three fil- quencies in either direction (Wu et al., 2011). A polygenic ters were applied to SNPs: LowQual, Description = "Low burden test was applied to the predefined set of genes and quality variant according to GATK (GATK)"; MinVQSLOD, also to subsets defined on the basis of biological function or Description = "Minimum VQSLOD score [SNPs:-1.9667, through having been implicated by de novo SNVs. None of truth sensitivity 99.48]"; SnpGap, Description = "SNP within the variant and gene-based tests achieved statistical signifi- INT bp around a gap to be filtered [10].” All SNP sites that did cance but the study did demonstrate increased variant alleles not fail these filters were marked as PASS. For the purpose of in the gene sets. the current analyses, a number of additional constraints were As has been suggested previously (Madsen & Browning, applied to the downloaded VCF files to exclude some variants 2009; Curtis, 2012), an alternative approach to carrying out from analysis. Only single nucleotide variants (SNVs), not in- repeated analyses with different sets of variants included ac- dels, were considered. Variants were excluded if they did not cording to effect and/or frequency is to carry out a combined have a PASS in the information field, if there were more than analysis which accords different weights to different variants. five genotypes missing in either cohort or if the heterozygote Such an approach is implemented in the SCOREASSOC count was smaller than both homozygote counts in both co- program, which provides a parabolic function to give higher horts. At a subject level, variants were excluded if they had a weights to rarer alleles and which allows a functional weight genotype quality score less than 30. to be specified which can be based on the predicted effect of the variant. Ethics Statement This paper involves the analysis of data produced by UK10K. All subjects gave informed consent and each group received Materials and Methods approval from the appropriate Research Ethics Committee in Data Used the United Kingdom. In order to assess the performance of the weighted burden test in a real world example, it was applied to data produced by Method of Analysis the UK10K project (The UK10K Consortium, 2015). Two cohorts of subjects were used, selected from the UK10K ex- Custom software was written to extract information for one omes arm. The OB cohort consisted of 982 subjects from gene at a time from the master VCF files. The variants from the Severe Childhood Onset Obesity Project (Wheeler et al., all transcripts of each refseq gene were extracted, using only 2013) and the SZ cohort consisted of 1392 subjects with variants called from within the targets. Variants were anno- schizophrenia recruited from five British centres. All subjects tated using the hg19 reference sequence and where a variant were British. Although a small proportion of schizophrenia had a different effect in different transcripts the one with the subjects consisted of between two and five members of the largest effect was used. The variants for each gene were anal- same multiply affected pedigrees, for purposes of analysis all ysed using the SCOREASSOC program. Rarer variants were subjects were treated as if they were unrelated. The reason for accorded a higher weight than commoner ones, such that an using these two cohorts, rather than other subjects included extremely rare variant with minor allele frequency (MAF) in UK10K, was primarily that they represented two groups, close to 0 would be allocated a weight 10 times higher than a each of which was phenotypically fairly homogeneous and common one with MAF of 0.5, with variants of intermedi- which had similar geographical origins. The "case–case" de- ate frequencies being allocated intermediate weights using a sign for association studies has the advantage that one does parabolic function (Madsen & Browning, 2009; Curtis, 2012). not require an additional set of controls but may have the dis- The allele of interest was considered to be the rarer allele, even advantage that if allele frequencies differ between the groups if the reference allele was rarer than the alternate allele. If any then one may not know which is the relevant phenotype variant had more than one alternate allele then these were (Curtis et al., 2011). As described elsewhere (The UK10K grouped together so that the variant could be considered as Consortium, 2015), the exome was targeted with the Agi- biallelic. Weights were also allocated according to the effect lent SureSelect 50Mb V3 exome library, followed by Illumina of the variant. An arbitrary weighting scheme was devised, so next generation sequencing with 75bp paired-end reads. An that a variant producing a new stop codon within a coding re- average read depth of 79x was achieved in the bait regions. gion would be allocated 20 times the weight of an intergenic 2015 The Authors. Annals of Human Genetics (2016) 80,38–49 39 Annals of Human Genetics published by University College London (UCL) and John Wiley & Sons Ltd. D. Curtis Table 1 Scheme used to assign weights to each variant according to The SLPs of individual genes were considered. Also, some the predicted effect. (In the analyses described in this report, INDEL genes were grouped into sets of apriori interest. For SZ, the variants were not in fact used.). same sets were used as described in Table S2 of the schizophre- nia exome study, consisting of postsynaptic density (PSD), Predicted effect Weight calcium channel, FMRP targets and SZ de novo (Purcell et al., 2014). For obesity, genes listed in OMIM were used, consist- NULL_CONSEQUENCE 1 INTERGENIC 1 ing of NR0B2, SDC3, POMC, GHRL, PPARG, UCP1, DOWNSTREAM 1 CART, ADRB2, PPARGC1B, SIM1, ENPP1, ADRB3, INTRONIC 3 UCP3, AGRP, PYY, MC4R, LEP, LEPR and PCSK1. 3PRIME_UTR 5 SYNONYMOUS_CODING 3 Database Submission UPSTREAM 5 5PRIME_UTR 5 Variants which appeared to be associated with the phenotypes SPLICE_SITE 5 studied were submitted to the Human Variation database at STOP_LOST 5 NCBI (http://www.ncbi.nlm.nih.gov/). NON_SYNONYMOUS_CODING 10 CODINGINDEL 15 FRAMESHIFT_CODING 20 Results STOP_GAINED 20 Distribution of Test Statistic variant. Variants in coding regions were allocated a weight There were 1,028,678 valid variants in 20,438 genes. The of 3 if they were synonymous and 10 if they were nonsyn- SLPs produced from the broad and narrow categories of vari- onymous. A full list of weights according to the effects of ant were highly correlated, r = 0.65. The Q:Q plots are variants is presented in Table 1. The weights for each type of displayed in Figure 1. These show that when there was an variant were chosen so that variants deemed more likely to excess of rare, functional variants in SZ subjects and the SLP have an effect on gene expression or protein function were was positive then the values obtained conformed well with allocated higher weights. Likewise, these weights were chosen null hypothesis expectations. However when the excess was in to be of the same order of magnitude as those relating to the the OB subjects then the line for the negative SLPs is steeper, rarity of the variant. Thus, a variant might achieve a similar indicating that the p value obtained is somewhat anticonserva- weight either through being very rare or through being likely tive. To explore the possibility that this might have happened to have a functional effect. Each variant was then allocated because the scores were not normally distributed, Wilcoxon’s an overall weight, achieved by simply multiplying together signed rank test was applied instead of a t test but almost iden- the weight according to rarity and the weight according to tical Q:Q plots were obtained. Taking a suitable threshold effect. Thus rarer, more functional variants would be given a for “genomewide significance” as log(0.05×1/20438) = 5.6, higher weight than common variants in noncoding regions. only one gene, NSF, almost reached this with SLP =−5.5 in As described previously (Curtis, 2012), each subject would be the broad analysis. However, this is fairly meaningless if one assigned a score consisting of the sum of the weights of all the takes into account the nonconservative nature of the test for variant alleles possessed by that subject. An unpaired t test is the negative SLPs and overall the results do not produce real used to test whether the average score for cases is higher than evidence for the involvement of any particular gene. controls. Although the analyses did not highlight any genes reaching In the present study we wished to consider the results in conventional criteria for statistical significance once appropri- two ways, firstly designating the SZ cohort as cases with the ate allowance was made for the number of genes tested, the OB cohort as controls and then the other way round. All results could still be used to rank genes, with the idea being results were expressed as a signed log p value (SLP), this being that genes contributing to risk of SZ or OB might tend to the logarithm base 10 of the p value from the t test, being have the highest or lowest ranks respectively. Also, one might given a positive sign if there was an excess of rare, functional expect to see genes belonging to the predefined sets tending variants in SZ cases and a negative sign if the excess was in to have more extreme SLPs than the rest. However when this OB cases. Two sets of analyses were performed with different was formally tested there was no evidence for such enrich- categories of variant. The broad category included all valid ment. For each set, the genes within it had the same average variants. The narrow category was restricted to splice site or SLP as the others not in the set. Nevertheless, ranking genes nonsynonymous or stop variants and having MAF <0.1 in at according to SLP did draw attention to some individual genes least one of the cohorts. which arguably are of interest. The highest and the lowest 40 Annals of Human Genetics (2016) 80,38–49 2015 The Authors. Annals of Human Genetics published by University College London (UCL) and John Wiley & Sons Ltd. Exome Study of Schizophrenia and Obesity AB 5 5 4 4 3 3 2 2 0 0 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 -1 -1 -2 -2 -3 -3 -4 -4 -5 -5 -6 -6 SLP(q) SLP(q) Figure 1 Q:Q plots for SLP obtained from SCOREASSOC compared to expected under null hypothesis. Positive SLPs indicate an excess of rare, functional variants in SZ subjects, negative SLPs indicate an excess in OB subjects. (A) Shows results using broad category of variants, (B) for narrow category. ranked genes for the broad and narrow analyses are shown in stop variant is rs13447324 (chr18.hg19:g.58039478G>T), Table 2. The full results for all genes are provided in Table S1. Y35 , which is likewise well-established as a cause of au- One gene that appears to be of interest is MC4R, tosomal dominant obesity (Hinney et al., 1999; Sina et al., melanocortin 4 receptor, which is the highest ranked when 1999; Farooqi & O’Rahilly, 2006). Thus, these results for the narrow category of variants is used and the third high- MC4R are consistent with the previously reported effects of est using the broad category, implying that there is an ex- the two nonsynonymous variants as being protective against cess of rarer, functional variants among SZ subjects. In obesity but are anomalous in that the stop variant is observed fact, though, MC4R variants have previously been reported in an SZ subject and not in any OB subject. to both increase and decrease the risk for obesity rather Another gene of interest is NLGN2, neuroligin 2, which than having an effect on schizophrenia (Vaisse et al., 1998; is ranked fourth when using the broad category of variants. Yeo et al., 1998; Mergen et al., 2001; Miraglia et al., 2002; This codes for a postsynaptic protein and a previous study Heid et al., 2005; Young et al., 2007; Chambers et al., 2008; reported novel, functional mutations in NLGN2,eachoc- Loos et al., 2008). To assist in understanding this result more curring in one or two subjects with schizophrenia (Sun et al., fully, the raw output from the SCOREASSOC program is 2011). The SCOREASSOC output shows that the SLP of presented in Table 3. This shows that there is a broadly sim- 3.5 reflects a general excess of rare variants among SZ sub- ilar distribution of variants between OB and SZ except that jects, mostly not affecting amino acid sequence. This effect is two nonsynonymous variants, at positions 18:58038832 and spread across dozens of variants, such that one cannot defini- 18:58039276, are more frequent in SZ subjects. There is also tively identify any which might individually be associated with a single, highly weighted stop variant at 18:58039478 in a SZ risk of schizophrenia. subject which would also make a contribution to the high Findings for genes which may be of interest are summarised MLP for this gene. The two nonsynonymous variants are in Table 4. One phenomenon to be aware of is that be- rs52820871 (chr18.hg19:g.58038832T>G) and rs2229616 cause of LD relationships between variants the same signal (chr18.hg19:g.58039276C>T), which are already well estab- can be picked up by multiple genes. This is the case for lished to be associated with lower BMI (Geller et al. 2004; CRHR1, MAPK, STH and IMP5. A haplotype spanning all Heid et al. 2005; Stutzmann et al. 2007; Young et al. 2007; these genes is somewhat commoner in OB than SZ sub- Wang et al. 2010; Evans et al. 2014; Malzahn et al. 2014). The jects. If it represents a real signal then CRHR1 seems to 2015 The Authors. Annals of Human Genetics (2016) 80,38–49 41 Annals of Human Genetics published by University College London (UCL) and John Wiley & Sons Ltd. SLP(p) SLP(p) D. Curtis Table 2 Highest and lowest ranked genes with corresponding SLPs using SZ and OB definitions of caseness and including broad and narrow categories of variant. Highest SLPs (SZ cases) Lowest SLPs (OB cases) Broad category Narrow category Broad category Narrow category Symbol SLP Symbol SLP Symbol SLP Symbol SLP DFNA5 4.3 MC4R 3.8 NSF −5.5 SRSF8 −4.8 GTF3A 4.1 DONSON 3.4 CCDC58 −4.8 KIAA0947 −4.6 MC4R 3.8 SLAIN1 3.3 MAPT −4.7 SLC17A2 −4.5 NLGN2 3.5 ARR3 3.3 KIAA0947 −4.7 HIBCH −4.4 CRP 3.4 ZNF507 3.3 HIST1H4A −4.7 SCEL −4.3 SCARNA11 3.4 ADCYAP1R1 3.2 STH −4.6 DUOX1 −4.3 TRIM77P 3.2 DFNA5 3.1 SNORD115 −4.6 TGM2 −4.1 FHIT 3.1 NPHS1 3.0 TGM2 −4.5 ATRX −4.0 FAM212B 3.0 SKIV2L 3.0 SRSF8 −4.5 MFSD1 −3.9 AGXT 2.9 ARSA 2.9 RPAIN −4.3 PXDN −3.8 IL36B 2.9 PNN 2.9 MERTK −4.3 CD164L2 −3.8 GALNTL5 2.9 SCO1 2.9 HIST1H2AJ −4.2 OR6N2 −3.8 WDR24 2.9 OR2H1 2.8 AK4 −4.1 TINAG −3.7 GABRA3 2.9 OPN4 2.8 LOC100128977 −4.1 MUC17 −3.6 MCCD1 2.8 WDYHV1 2.8 TCEB3 −4.0 MYL10 −3.6 FBXL16 2.8 GTF2E1 2.8 FUT2 −4.0 TCEB3 −3.5 PROCR 2.8 PSMB3 2.8 CRHR1 −3.9 FNDC9 −3.5 PLRG1 2.7 NFU1 2.7 IMP5 −3.9 GDF3 −3.5 OR2H1 2.7 CD8A 2.7 SLC17A2 −3.9 LMAN1L −3.5 C2orf89 2.7 RXRB 2.6 ERBB2IP −3.9 LILRA1 −3.4 FXYD2 2.6 UBE2Q2 2.6 NDST1 −3.8 TADA2A −3.3 GTF2E1 2.6 CRP 2.6 HDAC1 −3.8 ZNF556 −3.3 SPINK8 2.6 MARK1 2.6 CD164L2 −3.8 SUSD4 −3.2 USP48 2.6 LURAP1 2.5 SYF2 −3.8 SYF2 −3.2 DONSON 2.5 NAIF1 2.5 TAB1 −3.8 HIBADH −3.2 SACM1L 2.5 GGCX 2.5 PDE6B −3.7 UFL1 −3.2 ZNF230 2.5 SIK2 2.5 COL4A4 −3.6 C8orf74 −3.2 GGT5 2.5 SPAG17 2.5 HSPA1A −3.6 GPR107 −3.2 KRTAP13-1 2.4 DLG1 2.4 MUSK −3.5 GBA −3.2 LURAP1 2.4 MYH11 2.4 ARHGAP17 −3.5 OR5AR1 −3.2 be the gene mostly likely to be responsible. Likewise, both OB subject and in CRP there is a nonsynonymous variant HIST1H2AJ and HIST1H4A are highly ranked. However previously claimed to be associated with CRP levels which the variants making the most substantial contributions to the is present in 17 SZ and two OB subjects. However, in other MLP for each gene are only 2 Mb apart and are in strong LD genes, such as NLGN2, GTF3A and HIBADH a number of with each other, so there is in fact only one signal. very rare variants collectively account for the SLP. A final point worth making is that for some genes the SLP is driven by just one or two variants with a very marked difference in allele frequencies between cohorts whereas for Discussion others there are large numbers of very rare variants, with a tendency for these to occur more commonly in one cohort or The anticonservative nature of the test when using the OB co- the other. For example, in SLAIN1 there is a nonsynonymous hort as cases was viewed as slightly puzzling, given that it had variant in 18 SZ subjects and only 1 OB subject. Likewise, in previously been shown to behave well when applied to simu- IL36B there are stop variants at two loci which between them lated data (Curtis, 2012). The fact that the results treating the are seen in nine SZ subjects and no OB subject, in DLG1 SZ cohort as cases were not markedly anticonservative make it there is a nonsynonymous variant in 10 SZ subjects and no seem less likely that this phenomenon might be due to popula- 42 Annals of Human Genetics (2016) 80,38–49 2015 The Authors. Annals of Human Genetics published by University College London (UCL) and John Wiley & Sons Ltd. Exome Study of Schizophrenia and Obesity 2015 The Authors. Annals of Human Genetics (2016) 80,38–49 43 Annals of Human Genetics published by University College London (UCL) and John Wiley & Sons Ltd. Table 3 Output from SCOREASSOC for the analysis of MC4R using the broad category of variants and treating SZ subjects as cases. OB SZ Position (hg19, chr18) AA AB BB MAF AA AB BB MAF Weight Variant effect VCF annotation 58038462 982 0 0 0.0000 1391 1 0 0.0004 9.99 DOWNSTREAM T>C 58038470 981 1 0 0.0005 1391 0 0 0.0000 9.99 DOWNSTREAM C>G 58038489 982 0 0 0.0000 1391 1 0 0.0004 9.99 DOWNSTREAM G>A 58038514 982 0 0 0.0000 1391 1 0 0.0004 9.99 DOWNSTREAM C>T 58038524 979 3 0 0.0015 1386 6 0 0.0022 9.93 DOWNSTREAM G>A 58038612 982 0 0 0.0000 1391 1 0 0.0004 99.92 NON_ C>T:PolyPhen: SYNONYMOUS_ benign(0.048) CODING 58038826 982 0 0 0.0000 1391 1 0 0.0004 99.92 NON_ C>T:PolyPhen: SYNONYMOUS_ possibly_ CODING damaging(0.45) 58038829 982 0 0 0.0000 1390 2 0 0.0007 99.85 NON_ C>T:PolyPhen: SYNONYMOUS_ probably_ CODING damaging(1) 58038832 968 14 0 0.0071 1361 31 0 0.0111 96.62 NON_ T>G:PolyPhen: SYNONYMOUS_ benign(0.008) CODING 58038989 982 0 0 0.0000 1391 1 0 0.0004 49.96 SYNONYMOUS_ G>A CODING 58039013 982 0 0 0.0000 1391 1 0 0.0004 49.96 SYNONYMOUS_ A>G CODING 58039049 981 1 0 0.0005 1392 0 0 0.0000 49.96 SYNONYMOUS_ C>T CODING 58039203 982 0 0 0.0000 1391 1 0 0.0004 99.92 NON_ G>A:PolyPhen: SYNONYMOUS_ probably_ CODING damaging(0.997) 58039215 982 0 0 0.0000 1390 2 0 0.0007 99.85 NON_ T>C:PolyPhen: SYNONYMOUS_ probably_ CODING damaging(0.99) (Continued) D. Curtis 44 Annals of Human Genetics (2016) 80,38–49 2015 The Authors. Annals of Human Genetics published by University College London (UCL) and John Wiley & Sons Ltd. Table 3 Continued. OB SZ Position (hg19, chr18) AA AB BB MAF AA AB BB MAF Weight Variant effect VCF annotation 58039219 981 1 0 0.0005 1392 0 0 0.0000 99.92 NON_ C>A:PolyPhen: SYNONYMOUS_ probably_ CODING damaging(0.996) 58039276 964 18 0 0.0092 1337 55 0 0.0198 94.55 NON_ C>T:PolyPhen: SYNONYMOUS_ benign(0.042) CODING 58039301 982 0 0 0.0000 1391 1 0 0.0004 49.96 SYNONYMOUS_ G>A CODING 58039402 982 0 0 0.0000 1391 1 0 0.0004 99.92 NON_ C>T:PolyPhen: SYNONYMOUS_ probably_ CODING damaging(0.985) 58039473 982 0 0 0.0000 1391 1 0 0.0004 99.92 NON_ T>A>C: SYNONYMOUS_ PolyPhen: CODING benign(0) 58039478 982 0 0 0.0000 1391 1 0 0.0004 199.85 STOP_ G>T GAINED 58039552 982 0 0 0.0000 1391 1 0 0.0004 99.92 NON_ T>C: SYNONYMOUS_ PolyPhen: CODING benign(0) 58039642 981 1 0 0.0005 1391 1 0 0.0004 49.92 5PRIME_ G>C UTR The table shows genotype counts, frequencies, weights and effects for each variant. The weighted scores were calculated for each subject and the means compared. Mean scores OB = 3.4, SZ = 7.0, t(2372 df) = 3.8, p = 0.00015, SLP = 3.8. Exome Study of Schizophrenia and Obesity Table 4 List of some of the highest and lowest ranked genes with explanatory notes. Symbol SLP Analysis Gene name Comments MC4R 3.8 SZ, broad Melanocortin 4 Increased frequency among SZ subjects of two receptor nonsynonymous variants previously reported to be protective against obesity NLGN2 3.5 SZ, broad Neuroligin 2 Codes for postsynaptic protein and regarded as candidate gene for schizophrenia. Overall generally increased numbers of rare variants in SZ subjects with no individual variant strongly associated CRP 3.4 SZ, broad C-reactive protein, Involved in immunity and inflammation systems. pentraxin-related Variants generally commoner in SZ subjects. Nonsynonymous variant at 1:159683814 is present in 17 SZ and 2 OB subjects. This is rs77832441, which has been reported to be associated with reduced CRP levels DONSON 3.4 SZ, narrow Downstream Function unknown. Nonsynonymous variants at neighbour of SON 21:34950728 seen in 21 SZ against 7 OB subjects and at 21:34955922 in 10 SZ and 0 OB subjects GABRA3 2.9 SZ, broad GABA A receptor, Some previous reports of involvement of GABA alpha 3 receptors in schizophrenia. However, the gene is on the X chromosome and the result is likely an artifact due to counting hemizygote males as homozygotes GTF3A 4.1 SZ, broad General transcription Result is driven by modest excess of many factor IIIA different variants IL36B 2.9 SZ, broad Interleukin 36, beta Involved in inflammation. There are stop variants at 2:113785602 and 2:113788694 in 7 and 2 SZ subjects and no OB subjects ADCYAP1R1 3.2 SZ, narrow Adenylate cyclase Previous reports of association with schizophrenia activating and involvement in adipose tissue expandability. polypeptide 1 Nonsynonymous variants at 7:31104520 and (pituitary) receptor 7:31124376 commoner in SZ than OB subjects ARSA 2.9 SZ, narrow Arylsulfatase A Mutations in this gene are the known cause of metachromatic leucodystrophy, which can have features similar to schizophrenia. A few very rare nonsynonymous and splice site variants are seen only in SZ cases and the common nonsynonymous variant at 22:51065361 (rs6151415) has MAF 0.085 in SZ and 0.061 in OB subjects DLG1 2.4 SZ, narrow Discs, large homolog 1 Involved in synaptogenesis. A number of (drosophila) nonsynonymous variants somewhat commoner in SZ than OB subjects and non-synonymous variant at 3:196792663 occurs in 10 SZ and no OB subjects SIK2 2.5 SZ, narrow Salt-inducible kinase 2 Known to be involved in lipid homeostasis and adipogenesis. A number of nonsynonymous variants occur only in SZ subjects and nonsynonymous variant at 11:111590605 occurs in 14 SZ and 2 OB subjects (Continued) 2015 The Authors. Annals of Human Genetics (2016) 80,38–49 45 Annals of Human Genetics published by University College London (UCL) and John Wiley & Sons Ltd. D. Curtis Table 4 Continued. Symbol SLP Analysis Gene name Comments SLAIN1 3.3 SZ, narrow SLAIN motif family, Involved in neurodevelopment. Nonsynonymous member 1 variant at 13:78320801 occurs in 18 SZ subjects and1OBsubject UBE2Q2 2.6 SZ, narrow Ubiquitin-conjugating Differentially expressed in mice with diet-induced enzyme E2Q family obesity. Excess of several nonsynonymous member 2 variants in SZ versus OB subjects ZNF 507 3.3 SZ, narrow Zinc finger protein 507 Disruption associated with neurodevelopmental disorders. Several nonsynonymous variants common in SZ subjects and nonsynonymous variant at 19:32844995 occurs in 9 SZ and no OB subjects CRHR1 -3.9 OB, broad Corticotropin-releasing Previously implicated in physiological pathways hormone receptor 1 including obesity and response to stress. A haplotype of several noncoding variants is somewhat commoner in OB than SZ subjects. This haplotype extends through MAPK, STH and IMP5, accounting for their MLPs HIST1H2AJ −4.2 OB, broad Histone cluster 1, H2aj Downstream variant at 6:27782031 has MAF 0.14 in OB and 0.11 in SZ subjects HIST1H4A −4.7 OB, broad Histone cluster 1, H4a 3’ UTR variant at 6:26022244 has MAF 0.14 in OB and 0.096 in SZ subjects. However, this variant is in LD with the one at 6:27782031 so these signals are not independent MAPT -4.7 OB, broad Microtubule-associated A haplotype of several common variants is slightly protein tau commoner among OB subjects MUSK −3.6 OB, broad Interacts with NSF. Splice site variant at 9:113449377 has MAF 0.07 in OB and 0.05 in SZ subjects NSF −5.5 OB, broad N-ethylmaleimide- Interacts with MUSK. Splice site variant at sensitive 17:44788310 has MAF 0.28 in OB and 0.23 in factor SZ subjects SNORD115 −4.6 OB, broad Small nucleolar RNA, In Prader–Willi region and regulates alternative C/D box 115-15 splicing of CRHR1. Several variants have higher MAF in OB than SZ subjects GDF3 −3.5 OB, narrow Growth differentiation Implicated in regulation of adiposity and energy factor 3 expenditure. Nonsynonymous variant at 12:7842587 has frequency 0.040 in OB and 0.022 in SZ subjects HIBADH −3.2 OB, narrow 3-hydroxyisobutyrate Differentially expressed in T2DM. SLP is driven dehydrogenase by splice site or nonsynonymous variants of which 7 are singletons occurring in OB subjects and the other, at 7:27570942, occurs in 8 OB and3SZsubjects tion effects such as stratification or linkage disequlibrium (LD) duplicating some subjects failed to reproduce the observed between variants. An alternative explanation is that the vari- Q:Q plots. Likewise, treating homozygotes as heterozygotes, ance of the scores is underestimated when the excess occurs in in an attempt to nullify the effects of consanguinity, failed to OB subjects and this might, for example, result from subjects produce a closer fit to expected values. The weighted burden being related to each other or being the offspring of consan- method does not adjust for population stratification and it guineous matings. However, attempts to model relatedness by assumes that the cohorts are ethnically well matched and that 46 Annals of Human Genetics (2016) 80,38–49 2015 The Authors. Annals of Human Genetics published by University College London (UCL) and John Wiley & Sons Ltd. Exome Study of Schizophrenia and Obesity subjects are unrelated. It is not clear how violations of these theless, the example of MC4R makes it clear that one cannot assumptions might impact on the results obtained. rely on this. In the light of this, it seems that one should at- The finding that no genes produce results withstanding tend to both strongly positive and negative SLPs in order to correction for multiple testing is in line with that of the pre- detect either an increase or decrease in variants among cases. vious schizophrenia exome study, which used a somewhat Of course, this would not at all address the issue of different larger sample size. It is becoming apparent that next genera- variants within the same gene having effects in different di- tion sequencing studies applied to complex diseases in samples rections, in which case a test such as SKAT would be needed numbering the low thousands are hypothesis-generating and alongside the weighted burden test. are unlikely to produce results which conclusively implicate Approaches such as this might benefit from an improve- individual variants or genes. Nevertheless, some findings do ment in the ability to predict the likely consequences of a appear to be of interest and worthy of attempts at follow-up, change in DNA sequence. The weighting scheme used was although there is a question of how much to focus attention on crude and fairly arbitrary. One could easily imagine introduc- genes which seem to be plausible candidates without running ing other considerations, such as utilising SIFT and PolyPhen the risk of overlooking novel findings which might point to scores or information on regulatory function (Ng & Henikoff, previously unsuspected mechanisms of pathogenesis. For ex- 2003; Adzhubei et al., 2010). If effects could be predicted ac- ample, neuronal and inflammatory genes are thought to be curately then weights could be assigned on a more rational involved in the susceptibility to schizophrenia and in this light basis. On the other hand, it can be argued that the whole the results for NLGN2, ARSA, DLG1, SLAIN1, ZNF507, point of performing empirical studies is that one does not CRP and IL36B seem interesting. Likewise, given previous know which variants contribute to risk until one sees the ex- findings related to obesity, the results for SIK2, CRHR1, tent to which they are associated with a disease phenotype. SNORD115, GDF3 and HIBADH may be of note, espe- The effects of varying the weights given to different types of cially in view of the fact that SNORD115 has been reported variant were not explored systematically. One might expect to be involved in the alternative splicing of CRHR1 (Kishore that varying the weights would have some impact on the SLPs et al., 2010). obtained and their ranks but, as is often the case, the advan- Following up suggestive results in larger samples may not be tages of carrying out exploratory analyses in order to find straightforward. For variants which are not extremely rare this a more appropriate model may be outweighed by the diffi- might simply involve carrying out genotyping in additional, culties in interpreting results obtained from testing multiple larger case-control cohorts. However, some genes are high- scenarios. lighted on the basis of an excess of many different variants, The weighted burden test provides a quick, simple and in- each occurring in only one or two subjects. Validating these tuitive test summarising the extent to which a gene harbours findings might require sequencing the gene in large numbers more functional, rare variants in cases than controls. It can be of subjects. It has been suggested that an alternative approach used to rank genes and highlight genes of interest and one to following up an extremely rare variant is to carry out family can then look at the results for individual genes and variants studies of the subjects possessing the variant (Curtis, 2011). in more detail. The method allows the user to test all vari- Thus, if there are affected relatives who also have the variant ants simultaneously in a single analysis, implementing a crude one gains confidence that it has an effect whereas an affected model of variant effects which may have some face validity. relative not sharing the variant casts doubt on its relevance. On the other hand, it requires that weights be specified in ad- The results illustrate a problem with the analytic approach vance and makes no attempt to fit them to the observed data. which was originally proposed, which was to test for an ex- The method is implemented only for dichotomous pheno- cess of rare and/or functional variants in cases compared with types although it might be possible to extend it to be applied controls. In fact, three genes possibly involved in susceptibility to quantitative measures. In the light of the results obtained, to obesity, MC4R, ADCYAP1R1 and SIK2, produced highly it seems sensible to implement a two-tailed version of the ap- ranked positive SLPs indicating that such an excess was oc- proach, in that one should test for an excess of variants either curring in SZ rather than OB subjects. Thus, had the method in cases or in controls. However, this would not be helpful if been applied as intended in a case-control study with OB some variants within a gene acted to increase risk and others chosen as the case phenotype then these findings might have to decrease it and in this situation one would not expect the been overlooked. It was argued previously that there were method to be successful. Thus it should not be used in isola- biological and statistical reasons to expect that, when dealing tion. Hopefully, it will be possible to refine such approaches with a fairly rare disease with a deleterious effect on fitness, further once there is greater knowledge regarding the nature rare variants identified in a case-control study would be more of genetic variation which influences risk of non-Mendelian likely to increase risk than reduce it (Curtis, 2012). Never- disease. 2015 The Authors. Annals of Human Genetics (2016) 80,38–49 47 Annals of Human Genetics published by University College London (UCL) and John Wiley & Sons Ltd. D. Curtis a nonsense and a frameshift mutation associated with dominantly Acknowledgement inherited obesity in humans. J Clin Endocrinol Metab 84, 1483– Thanks to Sadaf Farooqi for helpful comments on MC4R Kishore, S., Khanna, A., Zhang, Z., Hui, J., Balwierz, P.J., Stefan, variants. This study makes use of data generated by the M., Beach, C., Nicholls, R. D., Zavolan, M., & Stamm, S. (2010) UK10K Consortium, derived from samples from UK10K_ The snoRNA MBII-52 (SNORD 115) is processed into smaller NEURO_Iop_Collier, UK10K_NEURO_UKSCZ, RNAs and regulates alternative splicing. Hum Mol Genet 19, 1153– UK10K_NEURO_ABERDEEN, UK10K_EURO_ Loos, R. J., Lindgren, C. M., Li, S., Wheeler, E., Zhao, J. H., NEDINBURGH, UK10K_NEURO_EDINBURGH, Prokopenko, I., Inouye, M., Freathy, R. M., Attwood, A. P., & UK10K_NEURO_UCL and UK10K_OBESITY_SCOOP. Beckmann, J.S. (2008) Common variants near MC4R are asso- A full list of the investigators who contributed to the genera- ciated with fat mass, weight and risk of obesity. Nat Genet 40, tion of the data is available online (http://www.UK10K.org). 768–775. Funding for UK10K was provided by the Wellcome Trust Madsen, B. E. & Browning, S. R. (2009) A groupwise association under award WT091310. test for rare mutations using a weighted sum statistic. PLoS Genetics 5, e1000384. Malzahn, D., Muller-Nurasyid, ¨ M., Heid, I. M., Wichmann, H.- E., & Bickeboller, ¨ H. (2014) Controversial association results for Conflict of Interest INSIG2 on body mass index may be explained by interactions with age and with MC4R. Eur J Hum Genet, 22, 1217–24. The authors declare they have no conflict of interest. Mergen, M., Mergen, H., Ozata, M., Oner, R., & Oner, C. (2001) Rapid communication: A novel melanocortin 4 receptor (MC4R) gene mutation associated with morbid obesity. J Clin Endocrinol References Metab 86, 3448-3448. Miraglia, D. G. E., Cirillo, G., Nigro, V., Santoro, N., D’urso, L., Adzhubei, I. A., Schmidt, S., Peshkin, L., Ramensky, V. E., Gerasi- Raimondo, P., Cozzolino, D., Scafato, D., & Perrone, L. (2002) mova, A., Bork, P., Kondrashov, A. S. & Sunyaev, S. R. (2010) A Low frequency of melanocortin-4 receptor (MC4R) mutations in method and server for predicting damaging missense mutations. a Mediterranean population with early-onset obesity. Int J Obes Nat Methods 7, 248–249. Relat Metab Disord 26, 647–651. Chambers, J. C., Elliott, P., Zabaneh, D., Zhang, W., Li, Y., Froguel, Ng, P. C. & Henikoff, S. (2003) SIFT: Predicting amino acid changes P., Balding, D., Scott, J. & Kooner, J. S. (2008) Common genetic that affect protein function. Nucleic Acids Res 31, 3812–3814. variation near MC4R is associated with waist circumference and Purcell, S. M., Moran, J. L., Fromer, M., Ruderfer, D., Solovieff, insulin resistance. Nat Genet 40, 716–718. N., Roussos, P., O’dushlaine, C., Chambert, K., Bergen, S. E., & Curtis, D. (2011) Assessing the contribution family data can make to Kahler, ¨ A. (2014) A polygenic burden of rare disruptive mutations case-control studies of rare variants. Ann Hum Genet 75, 630–638. in schizophrenia. Nature, 506, 185–90. Curtis, D. (2012) A rapid method for combined analysis of common Sina, M., Hinney, A., Ziegler, A., Neupert, T., Mayer, H., Siegfried, and rare variants at the level of a region, gene, or pathway. Adv W., Blum, W. F., Remschmidt, H., & Hebebrand, J. (1999) Appl Bioinform Chem 5,1–9. Phenotypes in three pedigrees with autosomal dominant obesity Curtis, D., Vine, A.E., Mcquillin, A., Bass, N. J., Pereira, A., Kan- caused by haploinsufficiency mutations in the melanocortin-4 re- daswamy, R., Lawrence, J., Anjorin, A., Choudhury, K. & Datta, ceptor gene. Am J Hum Genet 65, 1501–1507. S. R. (2011) Case-case genome wide association analysis reveals Stutzmann, F., Vatin, V., Cauchi, S., Morandi, A., Jouret, B., Landt, markers differentially associated with schizophrenia and bipolar O., Tounian, P., Levy-Marchal, C., Buzzetti, R., & Pinelli, L. disorder and implicates calcium channel genes. Psychiatr Genet 21, (2007) Non-synonymous polymorphisms in melanocortin-4 re- 1–4. ceptor protect against obesity: The two facets of a Janus obesity Evans, D. S., Calton, M. A., Kim, M. J., Kwok, P.-Y., Miljkovic, gene. Hum Mol Genet 16, 1837–1844. I., Harris, T., Koster, A., Liu, Y., Tranah, G. J., & Ahituv, N. Sun, C., Cheng, M.-C., Qin, R., Liao, D.-L., Chen, T.-T., Koong, (2014) Genetic association study of adiposity and melanocortin-4 F.-J., Chen, G., & Chen, C.-H. (2011) Identification and func- receptor (MC4R) common variants: Replication and functional tional characterization of rare mutations of the neuroligin-2 gene characterization of non-coding regions. PLoS One 9, e96805. (NLGN2) associated with schizophrenia. Hum Mol Genet 20, Farooqi, I. S. & O’rahilly, S. (2006) Genetics of obesity in humans. 3042–3051. Endocr Rev 27, 710–718. The UK10K Consortium. (2015) The UK10K project identifies rare Geller, F., Reichwald, K., Dempfle, A., Illig, T., Vollmert, C., Her- variants in health and disease. Nature. doi: 10.1038/nature14962. pertz, S., Siffert, W., Platzer, M., Hess, C., & Gudermann, T. Vaisse, C., Clement, K., Guy-Grand, B., & Froguel, P. (1998) A (2004) Melanocortin-4 receptor gene variant I103 is negatively frameshift mutation in human MC4R is associated with a domi- associated with obesity. Am J Hum Genet 74, 572–581. nant form of obesity. Nat Genet 20, 113–114. Heid, I., Vollmert, C., Hinney, A., Dor ¨ ing, A., Geller, F., Low ¨ el, Wang, D., Ma, J., Zhang, S., Hinney, A., Hebebrand, J., Wang, Y., H., Wichmann, H., Illig, T., Hebebrand, J., & Kronenberg, F. & Wang, H. J. (2010) Association of the MC4R V103I poly- (2005) Association of the 103I MC4R allele with decreased body morphism with obesity: A Chinese case–control study and meta- mass in 7937 participants of two population based surveys. JMed analysis in 55,195 individuals. Obesity 18, 573–579. Genet 42, e21-e21. Wheeler, E., Huang, N., Bochukova, E. G., Keogh, J. M., Lindsay, Hinney, A., Schmidt, A., Nottebom, K., Heibult, O., Becker, I., S., Garg, S., Henning, E., Blackburn, H., Loos, R. J., & Wareham, Ziegler, A., Gerber, G., Sina, M., Gorg, T., & Mayer, H. (1999) N. J. (2013) Genome-wide SNP and CNV analysis identifies Several mutations in the melanocortin-4 receptor gene including 48 Annals of Human Genetics (2016) 80,38–49 2015 The Authors. Annals of Human Genetics published by University College London (UCL) and John Wiley & Sons Ltd. Exome Study of Schizophrenia and Obesity common and low-frequency variants associated with severe early- Population based studies and meta-analysis of 29 563 individuals. onset obesity. Nat Genet 45, 513–517. Int J Obes 31, 1437–1441. Wu, M. C., Lee, S., Cai, T., Li, Y., Boehnke, M., & Lin, X. (2011) Rare-variant association testing for sequencing data with the se- quence kernel association test. Am J Hum Genet 89, 82–93. Supporting Information Yeo, G. S., Farooqi, I. S., Aminian, S., Halsall, D. J., Stanhope, R. G., & O’rahilly, S. (1998) A frameshift mutation in MC4R Additional Supporting Information may be found in the on- associated with dominantly inherited human obesity. Nat Genet line version of this article: 20, 111–112. Young, E. H., Wareham, N. J., Farooqi, S., Hinney, A., Hebe- Table S1 SLPs for all genes obtained from weighted burden brand, J., Scherag, A., O’rahilly, S., Barroso, I., & Sandhu, M. S. test using broad and narrow sets of variants. (2007) The V103I polymorphism of the MC4R gene and obesity: 2015 The Authors. Annals of Human Genetics (2016) 80,38–49 49 Annals of Human Genetics published by University College London (UCL) and John Wiley & Sons Ltd. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Annals of Human Genetics Wiley

Practical Experience of the Application of a Weighted Burden Test to Whole Exome Sequence Data for Obesity and Schizophrenia

Annals of Human Genetics , Volume 80 (1) – Jan 1, 2016

Loading next page...
 
/lp/wiley/practical-experience-of-the-application-of-a-weighted-burden-test-to-3JagYSvfz8

References (57)

Publisher
Wiley
Copyright
Copyright © 2015 John Wiley & Sons Ltd/University College London
ISSN
0003-4800
eISSN
1469-1809
DOI
10.1111/ahg.12135
pmid
26474449
Publisher site
See Article on Publisher Site

Abstract

doi: 10.1111/ahg.12135 Practical Experience of the Application of a Weighted Burden Test to Whole Exome Sequence Data for Obesity and Schizophrenia David Curtis and The UK10K Consortium UCL Genetics Institute, UCL, Darwin Building, Gower Street, London WC1E 6BT, UK Summary For biological and statistical reasons it makes sense to combine information from variants at the level of the gene. One may wish to give more weight to variants which are rare and those that are more likely to affect function. A combined weighting scheme, implemented in the SCOREASSOC program, was applied to whole exome sequence data for 1392 subjects with schizophrenia and 982 with obesity from the UK10K project. Results conformed fairly well with null hypothesis expectations and no individual gene was strongly implicated. However, a number of the higher ranked genes appear plausible candidates as being involved in one or other phenotype and may warrant further investigation. These include MC4R, NLGN2, CRP, DONSON, GTF3A, IL36B, ADCYAP1R1, ARSA, DLG1, SIK2, SLAIN1, UBE2Q2, ZNF507, CRHR1, MUSK, NSF, SNORD115, GDF3 and HIBADH. Some individual variants in these genes have different frequencies between cohorts and could be genotyped in additional subjects. For other genes, there is a general excess of variants at many different sites so attempts at replication would be more difficult. Overall, the weighted burden test provides a convenient method for using sequence data to highlight genes of interest. Keywords: Association, exome, burden test, DNA variant one is only interested in detecting variants with a large effect Introduction size then one may ignore common variants if the results of Although next generation sequencing has been used exten- genomewide association studies have demonstrated that there sively for the study of rare Mendelian diseases, there is less are no common variants with a large effect size. Another ap- experience of its application to large case-control association proach is to group variants together and analyse them jointly. studies of diseases with complex inheritance. Because of issues This may be done at the level of the gene, so that any cor- such as incomplete penetrance and allelic and locus hetero- rection only needs to be applied for the number of genes geneity, the task is to identify variants which are more frequent tested rather than the number of variants. Additionally, one in cases rather than variants which are shared by all cases and may group genes into sets according to biological function not seen in any controls or in the general population. There and this can be viewed as a way of reducing multiple testing are a number of approaches which can be used in an attempt still further. Thus, if one is unable to conclusively implicate a to gain power by reducing the necessary correction for multi- gene then one may at least succeed in implicating a pathway. ple testing. One general approach may be to restrict attention A recent case-control study of schizophrenia using whole to variants which are judged to be “important” according exome sequence from over 5000 subjects provides a useful il- to some criteria. These might relate to the predicted effect lustration of these approaches (Purcell et al., 2014). Attention of the variant or to which gene it occurs in. The frequency was focussed on genes deemed apriori to be of interest based of the variant might be used as a criterion; for example, if on previous GWAS, CNV and de novo SNV studies. Analyses were carried out with three different criteria for including variants based on their predicted effect on gene function. For each of these, three sets of allele frequency were used Corresponding author: DAVID CURTIS, UCL Genetics Institute, to include variants: singletons, up to 0.1% or up to 0.5% so UCL, Darwin Building, Gower Street, London WC1E 6BT, UK. that overall nine sets of analyses were carried out. Individual E-mail: d.curtis@ucl.ac.uk, Tel: 020 8702 3200, Fax: 020 3108 2194 38 Annals of Human Genetics (2016) 80,38–49 2015 The Authors. Annals of Human Genetics published by University College London (UCL) and John Wiley & Sons Ltd. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. Exome Study of Schizophrenia and Obesity variants were tested and in addition two gene-based tests were Variants were called with samtools/bcftools version 0.1.19- applied—a one-sided burden test for increased rare variants 3-g4b70907. GATK Unified Genotyper (v1.6-13-g91f02df) in cases and the SNP-set (sequence) kernel association test was only used to recall at SNP sites discovered by samtools. (SKAT), which tests for differences in case-control allele fre- This was to enable VQSR filtering of SNP calls. Three fil- quencies in either direction (Wu et al., 2011). A polygenic ters were applied to SNPs: LowQual, Description = "Low burden test was applied to the predefined set of genes and quality variant according to GATK (GATK)"; MinVQSLOD, also to subsets defined on the basis of biological function or Description = "Minimum VQSLOD score [SNPs:-1.9667, through having been implicated by de novo SNVs. None of truth sensitivity 99.48]"; SnpGap, Description = "SNP within the variant and gene-based tests achieved statistical signifi- INT bp around a gap to be filtered [10].” All SNP sites that did cance but the study did demonstrate increased variant alleles not fail these filters were marked as PASS. For the purpose of in the gene sets. the current analyses, a number of additional constraints were As has been suggested previously (Madsen & Browning, applied to the downloaded VCF files to exclude some variants 2009; Curtis, 2012), an alternative approach to carrying out from analysis. Only single nucleotide variants (SNVs), not in- repeated analyses with different sets of variants included ac- dels, were considered. Variants were excluded if they did not cording to effect and/or frequency is to carry out a combined have a PASS in the information field, if there were more than analysis which accords different weights to different variants. five genotypes missing in either cohort or if the heterozygote Such an approach is implemented in the SCOREASSOC count was smaller than both homozygote counts in both co- program, which provides a parabolic function to give higher horts. At a subject level, variants were excluded if they had a weights to rarer alleles and which allows a functional weight genotype quality score less than 30. to be specified which can be based on the predicted effect of the variant. Ethics Statement This paper involves the analysis of data produced by UK10K. All subjects gave informed consent and each group received Materials and Methods approval from the appropriate Research Ethics Committee in Data Used the United Kingdom. In order to assess the performance of the weighted burden test in a real world example, it was applied to data produced by Method of Analysis the UK10K project (The UK10K Consortium, 2015). Two cohorts of subjects were used, selected from the UK10K ex- Custom software was written to extract information for one omes arm. The OB cohort consisted of 982 subjects from gene at a time from the master VCF files. The variants from the Severe Childhood Onset Obesity Project (Wheeler et al., all transcripts of each refseq gene were extracted, using only 2013) and the SZ cohort consisted of 1392 subjects with variants called from within the targets. Variants were anno- schizophrenia recruited from five British centres. All subjects tated using the hg19 reference sequence and where a variant were British. Although a small proportion of schizophrenia had a different effect in different transcripts the one with the subjects consisted of between two and five members of the largest effect was used. The variants for each gene were anal- same multiply affected pedigrees, for purposes of analysis all ysed using the SCOREASSOC program. Rarer variants were subjects were treated as if they were unrelated. The reason for accorded a higher weight than commoner ones, such that an using these two cohorts, rather than other subjects included extremely rare variant with minor allele frequency (MAF) in UK10K, was primarily that they represented two groups, close to 0 would be allocated a weight 10 times higher than a each of which was phenotypically fairly homogeneous and common one with MAF of 0.5, with variants of intermedi- which had similar geographical origins. The "case–case" de- ate frequencies being allocated intermediate weights using a sign for association studies has the advantage that one does parabolic function (Madsen & Browning, 2009; Curtis, 2012). not require an additional set of controls but may have the dis- The allele of interest was considered to be the rarer allele, even advantage that if allele frequencies differ between the groups if the reference allele was rarer than the alternate allele. If any then one may not know which is the relevant phenotype variant had more than one alternate allele then these were (Curtis et al., 2011). As described elsewhere (The UK10K grouped together so that the variant could be considered as Consortium, 2015), the exome was targeted with the Agi- biallelic. Weights were also allocated according to the effect lent SureSelect 50Mb V3 exome library, followed by Illumina of the variant. An arbitrary weighting scheme was devised, so next generation sequencing with 75bp paired-end reads. An that a variant producing a new stop codon within a coding re- average read depth of 79x was achieved in the bait regions. gion would be allocated 20 times the weight of an intergenic 2015 The Authors. Annals of Human Genetics (2016) 80,38–49 39 Annals of Human Genetics published by University College London (UCL) and John Wiley & Sons Ltd. D. Curtis Table 1 Scheme used to assign weights to each variant according to The SLPs of individual genes were considered. Also, some the predicted effect. (In the analyses described in this report, INDEL genes were grouped into sets of apriori interest. For SZ, the variants were not in fact used.). same sets were used as described in Table S2 of the schizophre- nia exome study, consisting of postsynaptic density (PSD), Predicted effect Weight calcium channel, FMRP targets and SZ de novo (Purcell et al., 2014). For obesity, genes listed in OMIM were used, consist- NULL_CONSEQUENCE 1 INTERGENIC 1 ing of NR0B2, SDC3, POMC, GHRL, PPARG, UCP1, DOWNSTREAM 1 CART, ADRB2, PPARGC1B, SIM1, ENPP1, ADRB3, INTRONIC 3 UCP3, AGRP, PYY, MC4R, LEP, LEPR and PCSK1. 3PRIME_UTR 5 SYNONYMOUS_CODING 3 Database Submission UPSTREAM 5 5PRIME_UTR 5 Variants which appeared to be associated with the phenotypes SPLICE_SITE 5 studied were submitted to the Human Variation database at STOP_LOST 5 NCBI (http://www.ncbi.nlm.nih.gov/). NON_SYNONYMOUS_CODING 10 CODINGINDEL 15 FRAMESHIFT_CODING 20 Results STOP_GAINED 20 Distribution of Test Statistic variant. Variants in coding regions were allocated a weight There were 1,028,678 valid variants in 20,438 genes. The of 3 if they were synonymous and 10 if they were nonsyn- SLPs produced from the broad and narrow categories of vari- onymous. A full list of weights according to the effects of ant were highly correlated, r = 0.65. The Q:Q plots are variants is presented in Table 1. The weights for each type of displayed in Figure 1. These show that when there was an variant were chosen so that variants deemed more likely to excess of rare, functional variants in SZ subjects and the SLP have an effect on gene expression or protein function were was positive then the values obtained conformed well with allocated higher weights. Likewise, these weights were chosen null hypothesis expectations. However when the excess was in to be of the same order of magnitude as those relating to the the OB subjects then the line for the negative SLPs is steeper, rarity of the variant. Thus, a variant might achieve a similar indicating that the p value obtained is somewhat anticonserva- weight either through being very rare or through being likely tive. To explore the possibility that this might have happened to have a functional effect. Each variant was then allocated because the scores were not normally distributed, Wilcoxon’s an overall weight, achieved by simply multiplying together signed rank test was applied instead of a t test but almost iden- the weight according to rarity and the weight according to tical Q:Q plots were obtained. Taking a suitable threshold effect. Thus rarer, more functional variants would be given a for “genomewide significance” as log(0.05×1/20438) = 5.6, higher weight than common variants in noncoding regions. only one gene, NSF, almost reached this with SLP =−5.5 in As described previously (Curtis, 2012), each subject would be the broad analysis. However, this is fairly meaningless if one assigned a score consisting of the sum of the weights of all the takes into account the nonconservative nature of the test for variant alleles possessed by that subject. An unpaired t test is the negative SLPs and overall the results do not produce real used to test whether the average score for cases is higher than evidence for the involvement of any particular gene. controls. Although the analyses did not highlight any genes reaching In the present study we wished to consider the results in conventional criteria for statistical significance once appropri- two ways, firstly designating the SZ cohort as cases with the ate allowance was made for the number of genes tested, the OB cohort as controls and then the other way round. All results could still be used to rank genes, with the idea being results were expressed as a signed log p value (SLP), this being that genes contributing to risk of SZ or OB might tend to the logarithm base 10 of the p value from the t test, being have the highest or lowest ranks respectively. Also, one might given a positive sign if there was an excess of rare, functional expect to see genes belonging to the predefined sets tending variants in SZ cases and a negative sign if the excess was in to have more extreme SLPs than the rest. However when this OB cases. Two sets of analyses were performed with different was formally tested there was no evidence for such enrich- categories of variant. The broad category included all valid ment. For each set, the genes within it had the same average variants. The narrow category was restricted to splice site or SLP as the others not in the set. Nevertheless, ranking genes nonsynonymous or stop variants and having MAF <0.1 in at according to SLP did draw attention to some individual genes least one of the cohorts. which arguably are of interest. The highest and the lowest 40 Annals of Human Genetics (2016) 80,38–49 2015 The Authors. Annals of Human Genetics published by University College London (UCL) and John Wiley & Sons Ltd. Exome Study of Schizophrenia and Obesity AB 5 5 4 4 3 3 2 2 0 0 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 -1 -1 -2 -2 -3 -3 -4 -4 -5 -5 -6 -6 SLP(q) SLP(q) Figure 1 Q:Q plots for SLP obtained from SCOREASSOC compared to expected under null hypothesis. Positive SLPs indicate an excess of rare, functional variants in SZ subjects, negative SLPs indicate an excess in OB subjects. (A) Shows results using broad category of variants, (B) for narrow category. ranked genes for the broad and narrow analyses are shown in stop variant is rs13447324 (chr18.hg19:g.58039478G>T), Table 2. The full results for all genes are provided in Table S1. Y35 , which is likewise well-established as a cause of au- One gene that appears to be of interest is MC4R, tosomal dominant obesity (Hinney et al., 1999; Sina et al., melanocortin 4 receptor, which is the highest ranked when 1999; Farooqi & O’Rahilly, 2006). Thus, these results for the narrow category of variants is used and the third high- MC4R are consistent with the previously reported effects of est using the broad category, implying that there is an ex- the two nonsynonymous variants as being protective against cess of rarer, functional variants among SZ subjects. In obesity but are anomalous in that the stop variant is observed fact, though, MC4R variants have previously been reported in an SZ subject and not in any OB subject. to both increase and decrease the risk for obesity rather Another gene of interest is NLGN2, neuroligin 2, which than having an effect on schizophrenia (Vaisse et al., 1998; is ranked fourth when using the broad category of variants. Yeo et al., 1998; Mergen et al., 2001; Miraglia et al., 2002; This codes for a postsynaptic protein and a previous study Heid et al., 2005; Young et al., 2007; Chambers et al., 2008; reported novel, functional mutations in NLGN2,eachoc- Loos et al., 2008). To assist in understanding this result more curring in one or two subjects with schizophrenia (Sun et al., fully, the raw output from the SCOREASSOC program is 2011). The SCOREASSOC output shows that the SLP of presented in Table 3. This shows that there is a broadly sim- 3.5 reflects a general excess of rare variants among SZ sub- ilar distribution of variants between OB and SZ except that jects, mostly not affecting amino acid sequence. This effect is two nonsynonymous variants, at positions 18:58038832 and spread across dozens of variants, such that one cannot defini- 18:58039276, are more frequent in SZ subjects. There is also tively identify any which might individually be associated with a single, highly weighted stop variant at 18:58039478 in a SZ risk of schizophrenia. subject which would also make a contribution to the high Findings for genes which may be of interest are summarised MLP for this gene. The two nonsynonymous variants are in Table 4. One phenomenon to be aware of is that be- rs52820871 (chr18.hg19:g.58038832T>G) and rs2229616 cause of LD relationships between variants the same signal (chr18.hg19:g.58039276C>T), which are already well estab- can be picked up by multiple genes. This is the case for lished to be associated with lower BMI (Geller et al. 2004; CRHR1, MAPK, STH and IMP5. A haplotype spanning all Heid et al. 2005; Stutzmann et al. 2007; Young et al. 2007; these genes is somewhat commoner in OB than SZ sub- Wang et al. 2010; Evans et al. 2014; Malzahn et al. 2014). The jects. If it represents a real signal then CRHR1 seems to 2015 The Authors. Annals of Human Genetics (2016) 80,38–49 41 Annals of Human Genetics published by University College London (UCL) and John Wiley & Sons Ltd. SLP(p) SLP(p) D. Curtis Table 2 Highest and lowest ranked genes with corresponding SLPs using SZ and OB definitions of caseness and including broad and narrow categories of variant. Highest SLPs (SZ cases) Lowest SLPs (OB cases) Broad category Narrow category Broad category Narrow category Symbol SLP Symbol SLP Symbol SLP Symbol SLP DFNA5 4.3 MC4R 3.8 NSF −5.5 SRSF8 −4.8 GTF3A 4.1 DONSON 3.4 CCDC58 −4.8 KIAA0947 −4.6 MC4R 3.8 SLAIN1 3.3 MAPT −4.7 SLC17A2 −4.5 NLGN2 3.5 ARR3 3.3 KIAA0947 −4.7 HIBCH −4.4 CRP 3.4 ZNF507 3.3 HIST1H4A −4.7 SCEL −4.3 SCARNA11 3.4 ADCYAP1R1 3.2 STH −4.6 DUOX1 −4.3 TRIM77P 3.2 DFNA5 3.1 SNORD115 −4.6 TGM2 −4.1 FHIT 3.1 NPHS1 3.0 TGM2 −4.5 ATRX −4.0 FAM212B 3.0 SKIV2L 3.0 SRSF8 −4.5 MFSD1 −3.9 AGXT 2.9 ARSA 2.9 RPAIN −4.3 PXDN −3.8 IL36B 2.9 PNN 2.9 MERTK −4.3 CD164L2 −3.8 GALNTL5 2.9 SCO1 2.9 HIST1H2AJ −4.2 OR6N2 −3.8 WDR24 2.9 OR2H1 2.8 AK4 −4.1 TINAG −3.7 GABRA3 2.9 OPN4 2.8 LOC100128977 −4.1 MUC17 −3.6 MCCD1 2.8 WDYHV1 2.8 TCEB3 −4.0 MYL10 −3.6 FBXL16 2.8 GTF2E1 2.8 FUT2 −4.0 TCEB3 −3.5 PROCR 2.8 PSMB3 2.8 CRHR1 −3.9 FNDC9 −3.5 PLRG1 2.7 NFU1 2.7 IMP5 −3.9 GDF3 −3.5 OR2H1 2.7 CD8A 2.7 SLC17A2 −3.9 LMAN1L −3.5 C2orf89 2.7 RXRB 2.6 ERBB2IP −3.9 LILRA1 −3.4 FXYD2 2.6 UBE2Q2 2.6 NDST1 −3.8 TADA2A −3.3 GTF2E1 2.6 CRP 2.6 HDAC1 −3.8 ZNF556 −3.3 SPINK8 2.6 MARK1 2.6 CD164L2 −3.8 SUSD4 −3.2 USP48 2.6 LURAP1 2.5 SYF2 −3.8 SYF2 −3.2 DONSON 2.5 NAIF1 2.5 TAB1 −3.8 HIBADH −3.2 SACM1L 2.5 GGCX 2.5 PDE6B −3.7 UFL1 −3.2 ZNF230 2.5 SIK2 2.5 COL4A4 −3.6 C8orf74 −3.2 GGT5 2.5 SPAG17 2.5 HSPA1A −3.6 GPR107 −3.2 KRTAP13-1 2.4 DLG1 2.4 MUSK −3.5 GBA −3.2 LURAP1 2.4 MYH11 2.4 ARHGAP17 −3.5 OR5AR1 −3.2 be the gene mostly likely to be responsible. Likewise, both OB subject and in CRP there is a nonsynonymous variant HIST1H2AJ and HIST1H4A are highly ranked. However previously claimed to be associated with CRP levels which the variants making the most substantial contributions to the is present in 17 SZ and two OB subjects. However, in other MLP for each gene are only 2 Mb apart and are in strong LD genes, such as NLGN2, GTF3A and HIBADH a number of with each other, so there is in fact only one signal. very rare variants collectively account for the SLP. A final point worth making is that for some genes the SLP is driven by just one or two variants with a very marked difference in allele frequencies between cohorts whereas for Discussion others there are large numbers of very rare variants, with a tendency for these to occur more commonly in one cohort or The anticonservative nature of the test when using the OB co- the other. For example, in SLAIN1 there is a nonsynonymous hort as cases was viewed as slightly puzzling, given that it had variant in 18 SZ subjects and only 1 OB subject. Likewise, in previously been shown to behave well when applied to simu- IL36B there are stop variants at two loci which between them lated data (Curtis, 2012). The fact that the results treating the are seen in nine SZ subjects and no OB subject, in DLG1 SZ cohort as cases were not markedly anticonservative make it there is a nonsynonymous variant in 10 SZ subjects and no seem less likely that this phenomenon might be due to popula- 42 Annals of Human Genetics (2016) 80,38–49 2015 The Authors. Annals of Human Genetics published by University College London (UCL) and John Wiley & Sons Ltd. Exome Study of Schizophrenia and Obesity 2015 The Authors. Annals of Human Genetics (2016) 80,38–49 43 Annals of Human Genetics published by University College London (UCL) and John Wiley & Sons Ltd. Table 3 Output from SCOREASSOC for the analysis of MC4R using the broad category of variants and treating SZ subjects as cases. OB SZ Position (hg19, chr18) AA AB BB MAF AA AB BB MAF Weight Variant effect VCF annotation 58038462 982 0 0 0.0000 1391 1 0 0.0004 9.99 DOWNSTREAM T>C 58038470 981 1 0 0.0005 1391 0 0 0.0000 9.99 DOWNSTREAM C>G 58038489 982 0 0 0.0000 1391 1 0 0.0004 9.99 DOWNSTREAM G>A 58038514 982 0 0 0.0000 1391 1 0 0.0004 9.99 DOWNSTREAM C>T 58038524 979 3 0 0.0015 1386 6 0 0.0022 9.93 DOWNSTREAM G>A 58038612 982 0 0 0.0000 1391 1 0 0.0004 99.92 NON_ C>T:PolyPhen: SYNONYMOUS_ benign(0.048) CODING 58038826 982 0 0 0.0000 1391 1 0 0.0004 99.92 NON_ C>T:PolyPhen: SYNONYMOUS_ possibly_ CODING damaging(0.45) 58038829 982 0 0 0.0000 1390 2 0 0.0007 99.85 NON_ C>T:PolyPhen: SYNONYMOUS_ probably_ CODING damaging(1) 58038832 968 14 0 0.0071 1361 31 0 0.0111 96.62 NON_ T>G:PolyPhen: SYNONYMOUS_ benign(0.008) CODING 58038989 982 0 0 0.0000 1391 1 0 0.0004 49.96 SYNONYMOUS_ G>A CODING 58039013 982 0 0 0.0000 1391 1 0 0.0004 49.96 SYNONYMOUS_ A>G CODING 58039049 981 1 0 0.0005 1392 0 0 0.0000 49.96 SYNONYMOUS_ C>T CODING 58039203 982 0 0 0.0000 1391 1 0 0.0004 99.92 NON_ G>A:PolyPhen: SYNONYMOUS_ probably_ CODING damaging(0.997) 58039215 982 0 0 0.0000 1390 2 0 0.0007 99.85 NON_ T>C:PolyPhen: SYNONYMOUS_ probably_ CODING damaging(0.99) (Continued) D. Curtis 44 Annals of Human Genetics (2016) 80,38–49 2015 The Authors. Annals of Human Genetics published by University College London (UCL) and John Wiley & Sons Ltd. Table 3 Continued. OB SZ Position (hg19, chr18) AA AB BB MAF AA AB BB MAF Weight Variant effect VCF annotation 58039219 981 1 0 0.0005 1392 0 0 0.0000 99.92 NON_ C>A:PolyPhen: SYNONYMOUS_ probably_ CODING damaging(0.996) 58039276 964 18 0 0.0092 1337 55 0 0.0198 94.55 NON_ C>T:PolyPhen: SYNONYMOUS_ benign(0.042) CODING 58039301 982 0 0 0.0000 1391 1 0 0.0004 49.96 SYNONYMOUS_ G>A CODING 58039402 982 0 0 0.0000 1391 1 0 0.0004 99.92 NON_ C>T:PolyPhen: SYNONYMOUS_ probably_ CODING damaging(0.985) 58039473 982 0 0 0.0000 1391 1 0 0.0004 99.92 NON_ T>A>C: SYNONYMOUS_ PolyPhen: CODING benign(0) 58039478 982 0 0 0.0000 1391 1 0 0.0004 199.85 STOP_ G>T GAINED 58039552 982 0 0 0.0000 1391 1 0 0.0004 99.92 NON_ T>C: SYNONYMOUS_ PolyPhen: CODING benign(0) 58039642 981 1 0 0.0005 1391 1 0 0.0004 49.92 5PRIME_ G>C UTR The table shows genotype counts, frequencies, weights and effects for each variant. The weighted scores were calculated for each subject and the means compared. Mean scores OB = 3.4, SZ = 7.0, t(2372 df) = 3.8, p = 0.00015, SLP = 3.8. Exome Study of Schizophrenia and Obesity Table 4 List of some of the highest and lowest ranked genes with explanatory notes. Symbol SLP Analysis Gene name Comments MC4R 3.8 SZ, broad Melanocortin 4 Increased frequency among SZ subjects of two receptor nonsynonymous variants previously reported to be protective against obesity NLGN2 3.5 SZ, broad Neuroligin 2 Codes for postsynaptic protein and regarded as candidate gene for schizophrenia. Overall generally increased numbers of rare variants in SZ subjects with no individual variant strongly associated CRP 3.4 SZ, broad C-reactive protein, Involved in immunity and inflammation systems. pentraxin-related Variants generally commoner in SZ subjects. Nonsynonymous variant at 1:159683814 is present in 17 SZ and 2 OB subjects. This is rs77832441, which has been reported to be associated with reduced CRP levels DONSON 3.4 SZ, narrow Downstream Function unknown. Nonsynonymous variants at neighbour of SON 21:34950728 seen in 21 SZ against 7 OB subjects and at 21:34955922 in 10 SZ and 0 OB subjects GABRA3 2.9 SZ, broad GABA A receptor, Some previous reports of involvement of GABA alpha 3 receptors in schizophrenia. However, the gene is on the X chromosome and the result is likely an artifact due to counting hemizygote males as homozygotes GTF3A 4.1 SZ, broad General transcription Result is driven by modest excess of many factor IIIA different variants IL36B 2.9 SZ, broad Interleukin 36, beta Involved in inflammation. There are stop variants at 2:113785602 and 2:113788694 in 7 and 2 SZ subjects and no OB subjects ADCYAP1R1 3.2 SZ, narrow Adenylate cyclase Previous reports of association with schizophrenia activating and involvement in adipose tissue expandability. polypeptide 1 Nonsynonymous variants at 7:31104520 and (pituitary) receptor 7:31124376 commoner in SZ than OB subjects ARSA 2.9 SZ, narrow Arylsulfatase A Mutations in this gene are the known cause of metachromatic leucodystrophy, which can have features similar to schizophrenia. A few very rare nonsynonymous and splice site variants are seen only in SZ cases and the common nonsynonymous variant at 22:51065361 (rs6151415) has MAF 0.085 in SZ and 0.061 in OB subjects DLG1 2.4 SZ, narrow Discs, large homolog 1 Involved in synaptogenesis. A number of (drosophila) nonsynonymous variants somewhat commoner in SZ than OB subjects and non-synonymous variant at 3:196792663 occurs in 10 SZ and no OB subjects SIK2 2.5 SZ, narrow Salt-inducible kinase 2 Known to be involved in lipid homeostasis and adipogenesis. A number of nonsynonymous variants occur only in SZ subjects and nonsynonymous variant at 11:111590605 occurs in 14 SZ and 2 OB subjects (Continued) 2015 The Authors. Annals of Human Genetics (2016) 80,38–49 45 Annals of Human Genetics published by University College London (UCL) and John Wiley & Sons Ltd. D. Curtis Table 4 Continued. Symbol SLP Analysis Gene name Comments SLAIN1 3.3 SZ, narrow SLAIN motif family, Involved in neurodevelopment. Nonsynonymous member 1 variant at 13:78320801 occurs in 18 SZ subjects and1OBsubject UBE2Q2 2.6 SZ, narrow Ubiquitin-conjugating Differentially expressed in mice with diet-induced enzyme E2Q family obesity. Excess of several nonsynonymous member 2 variants in SZ versus OB subjects ZNF 507 3.3 SZ, narrow Zinc finger protein 507 Disruption associated with neurodevelopmental disorders. Several nonsynonymous variants common in SZ subjects and nonsynonymous variant at 19:32844995 occurs in 9 SZ and no OB subjects CRHR1 -3.9 OB, broad Corticotropin-releasing Previously implicated in physiological pathways hormone receptor 1 including obesity and response to stress. A haplotype of several noncoding variants is somewhat commoner in OB than SZ subjects. This haplotype extends through MAPK, STH and IMP5, accounting for their MLPs HIST1H2AJ −4.2 OB, broad Histone cluster 1, H2aj Downstream variant at 6:27782031 has MAF 0.14 in OB and 0.11 in SZ subjects HIST1H4A −4.7 OB, broad Histone cluster 1, H4a 3’ UTR variant at 6:26022244 has MAF 0.14 in OB and 0.096 in SZ subjects. However, this variant is in LD with the one at 6:27782031 so these signals are not independent MAPT -4.7 OB, broad Microtubule-associated A haplotype of several common variants is slightly protein tau commoner among OB subjects MUSK −3.6 OB, broad Interacts with NSF. Splice site variant at 9:113449377 has MAF 0.07 in OB and 0.05 in SZ subjects NSF −5.5 OB, broad N-ethylmaleimide- Interacts with MUSK. Splice site variant at sensitive 17:44788310 has MAF 0.28 in OB and 0.23 in factor SZ subjects SNORD115 −4.6 OB, broad Small nucleolar RNA, In Prader–Willi region and regulates alternative C/D box 115-15 splicing of CRHR1. Several variants have higher MAF in OB than SZ subjects GDF3 −3.5 OB, narrow Growth differentiation Implicated in regulation of adiposity and energy factor 3 expenditure. Nonsynonymous variant at 12:7842587 has frequency 0.040 in OB and 0.022 in SZ subjects HIBADH −3.2 OB, narrow 3-hydroxyisobutyrate Differentially expressed in T2DM. SLP is driven dehydrogenase by splice site or nonsynonymous variants of which 7 are singletons occurring in OB subjects and the other, at 7:27570942, occurs in 8 OB and3SZsubjects tion effects such as stratification or linkage disequlibrium (LD) duplicating some subjects failed to reproduce the observed between variants. An alternative explanation is that the vari- Q:Q plots. Likewise, treating homozygotes as heterozygotes, ance of the scores is underestimated when the excess occurs in in an attempt to nullify the effects of consanguinity, failed to OB subjects and this might, for example, result from subjects produce a closer fit to expected values. The weighted burden being related to each other or being the offspring of consan- method does not adjust for population stratification and it guineous matings. However, attempts to model relatedness by assumes that the cohorts are ethnically well matched and that 46 Annals of Human Genetics (2016) 80,38–49 2015 The Authors. Annals of Human Genetics published by University College London (UCL) and John Wiley & Sons Ltd. Exome Study of Schizophrenia and Obesity subjects are unrelated. It is not clear how violations of these theless, the example of MC4R makes it clear that one cannot assumptions might impact on the results obtained. rely on this. In the light of this, it seems that one should at- The finding that no genes produce results withstanding tend to both strongly positive and negative SLPs in order to correction for multiple testing is in line with that of the pre- detect either an increase or decrease in variants among cases. vious schizophrenia exome study, which used a somewhat Of course, this would not at all address the issue of different larger sample size. It is becoming apparent that next genera- variants within the same gene having effects in different di- tion sequencing studies applied to complex diseases in samples rections, in which case a test such as SKAT would be needed numbering the low thousands are hypothesis-generating and alongside the weighted burden test. are unlikely to produce results which conclusively implicate Approaches such as this might benefit from an improve- individual variants or genes. Nevertheless, some findings do ment in the ability to predict the likely consequences of a appear to be of interest and worthy of attempts at follow-up, change in DNA sequence. The weighting scheme used was although there is a question of how much to focus attention on crude and fairly arbitrary. One could easily imagine introduc- genes which seem to be plausible candidates without running ing other considerations, such as utilising SIFT and PolyPhen the risk of overlooking novel findings which might point to scores or information on regulatory function (Ng & Henikoff, previously unsuspected mechanisms of pathogenesis. For ex- 2003; Adzhubei et al., 2010). If effects could be predicted ac- ample, neuronal and inflammatory genes are thought to be curately then weights could be assigned on a more rational involved in the susceptibility to schizophrenia and in this light basis. On the other hand, it can be argued that the whole the results for NLGN2, ARSA, DLG1, SLAIN1, ZNF507, point of performing empirical studies is that one does not CRP and IL36B seem interesting. Likewise, given previous know which variants contribute to risk until one sees the ex- findings related to obesity, the results for SIK2, CRHR1, tent to which they are associated with a disease phenotype. SNORD115, GDF3 and HIBADH may be of note, espe- The effects of varying the weights given to different types of cially in view of the fact that SNORD115 has been reported variant were not explored systematically. One might expect to be involved in the alternative splicing of CRHR1 (Kishore that varying the weights would have some impact on the SLPs et al., 2010). obtained and their ranks but, as is often the case, the advan- Following up suggestive results in larger samples may not be tages of carrying out exploratory analyses in order to find straightforward. For variants which are not extremely rare this a more appropriate model may be outweighed by the diffi- might simply involve carrying out genotyping in additional, culties in interpreting results obtained from testing multiple larger case-control cohorts. However, some genes are high- scenarios. lighted on the basis of an excess of many different variants, The weighted burden test provides a quick, simple and in- each occurring in only one or two subjects. Validating these tuitive test summarising the extent to which a gene harbours findings might require sequencing the gene in large numbers more functional, rare variants in cases than controls. It can be of subjects. It has been suggested that an alternative approach used to rank genes and highlight genes of interest and one to following up an extremely rare variant is to carry out family can then look at the results for individual genes and variants studies of the subjects possessing the variant (Curtis, 2011). in more detail. The method allows the user to test all vari- Thus, if there are affected relatives who also have the variant ants simultaneously in a single analysis, implementing a crude one gains confidence that it has an effect whereas an affected model of variant effects which may have some face validity. relative not sharing the variant casts doubt on its relevance. On the other hand, it requires that weights be specified in ad- The results illustrate a problem with the analytic approach vance and makes no attempt to fit them to the observed data. which was originally proposed, which was to test for an ex- The method is implemented only for dichotomous pheno- cess of rare and/or functional variants in cases compared with types although it might be possible to extend it to be applied controls. In fact, three genes possibly involved in susceptibility to quantitative measures. In the light of the results obtained, to obesity, MC4R, ADCYAP1R1 and SIK2, produced highly it seems sensible to implement a two-tailed version of the ap- ranked positive SLPs indicating that such an excess was oc- proach, in that one should test for an excess of variants either curring in SZ rather than OB subjects. Thus, had the method in cases or in controls. However, this would not be helpful if been applied as intended in a case-control study with OB some variants within a gene acted to increase risk and others chosen as the case phenotype then these findings might have to decrease it and in this situation one would not expect the been overlooked. It was argued previously that there were method to be successful. Thus it should not be used in isola- biological and statistical reasons to expect that, when dealing tion. Hopefully, it will be possible to refine such approaches with a fairly rare disease with a deleterious effect on fitness, further once there is greater knowledge regarding the nature rare variants identified in a case-control study would be more of genetic variation which influences risk of non-Mendelian likely to increase risk than reduce it (Curtis, 2012). Never- disease. 2015 The Authors. Annals of Human Genetics (2016) 80,38–49 47 Annals of Human Genetics published by University College London (UCL) and John Wiley & Sons Ltd. D. Curtis a nonsense and a frameshift mutation associated with dominantly Acknowledgement inherited obesity in humans. J Clin Endocrinol Metab 84, 1483– Thanks to Sadaf Farooqi for helpful comments on MC4R Kishore, S., Khanna, A., Zhang, Z., Hui, J., Balwierz, P.J., Stefan, variants. This study makes use of data generated by the M., Beach, C., Nicholls, R. D., Zavolan, M., & Stamm, S. (2010) UK10K Consortium, derived from samples from UK10K_ The snoRNA MBII-52 (SNORD 115) is processed into smaller NEURO_Iop_Collier, UK10K_NEURO_UKSCZ, RNAs and regulates alternative splicing. Hum Mol Genet 19, 1153– UK10K_NEURO_ABERDEEN, UK10K_EURO_ Loos, R. J., Lindgren, C. M., Li, S., Wheeler, E., Zhao, J. H., NEDINBURGH, UK10K_NEURO_EDINBURGH, Prokopenko, I., Inouye, M., Freathy, R. M., Attwood, A. P., & UK10K_NEURO_UCL and UK10K_OBESITY_SCOOP. Beckmann, J.S. (2008) Common variants near MC4R are asso- A full list of the investigators who contributed to the genera- ciated with fat mass, weight and risk of obesity. Nat Genet 40, tion of the data is available online (http://www.UK10K.org). 768–775. Funding for UK10K was provided by the Wellcome Trust Madsen, B. E. & Browning, S. R. (2009) A groupwise association under award WT091310. test for rare mutations using a weighted sum statistic. PLoS Genetics 5, e1000384. Malzahn, D., Muller-Nurasyid, ¨ M., Heid, I. M., Wichmann, H.- E., & Bickeboller, ¨ H. (2014) Controversial association results for Conflict of Interest INSIG2 on body mass index may be explained by interactions with age and with MC4R. Eur J Hum Genet, 22, 1217–24. The authors declare they have no conflict of interest. Mergen, M., Mergen, H., Ozata, M., Oner, R., & Oner, C. (2001) Rapid communication: A novel melanocortin 4 receptor (MC4R) gene mutation associated with morbid obesity. J Clin Endocrinol References Metab 86, 3448-3448. Miraglia, D. G. E., Cirillo, G., Nigro, V., Santoro, N., D’urso, L., Adzhubei, I. A., Schmidt, S., Peshkin, L., Ramensky, V. E., Gerasi- Raimondo, P., Cozzolino, D., Scafato, D., & Perrone, L. (2002) mova, A., Bork, P., Kondrashov, A. S. & Sunyaev, S. R. (2010) A Low frequency of melanocortin-4 receptor (MC4R) mutations in method and server for predicting damaging missense mutations. a Mediterranean population with early-onset obesity. Int J Obes Nat Methods 7, 248–249. Relat Metab Disord 26, 647–651. Chambers, J. C., Elliott, P., Zabaneh, D., Zhang, W., Li, Y., Froguel, Ng, P. C. & Henikoff, S. (2003) SIFT: Predicting amino acid changes P., Balding, D., Scott, J. & Kooner, J. S. (2008) Common genetic that affect protein function. Nucleic Acids Res 31, 3812–3814. variation near MC4R is associated with waist circumference and Purcell, S. M., Moran, J. L., Fromer, M., Ruderfer, D., Solovieff, insulin resistance. Nat Genet 40, 716–718. N., Roussos, P., O’dushlaine, C., Chambert, K., Bergen, S. E., & Curtis, D. (2011) Assessing the contribution family data can make to Kahler, ¨ A. (2014) A polygenic burden of rare disruptive mutations case-control studies of rare variants. Ann Hum Genet 75, 630–638. in schizophrenia. Nature, 506, 185–90. Curtis, D. (2012) A rapid method for combined analysis of common Sina, M., Hinney, A., Ziegler, A., Neupert, T., Mayer, H., Siegfried, and rare variants at the level of a region, gene, or pathway. Adv W., Blum, W. F., Remschmidt, H., & Hebebrand, J. (1999) Appl Bioinform Chem 5,1–9. Phenotypes in three pedigrees with autosomal dominant obesity Curtis, D., Vine, A.E., Mcquillin, A., Bass, N. J., Pereira, A., Kan- caused by haploinsufficiency mutations in the melanocortin-4 re- daswamy, R., Lawrence, J., Anjorin, A., Choudhury, K. & Datta, ceptor gene. Am J Hum Genet 65, 1501–1507. S. R. (2011) Case-case genome wide association analysis reveals Stutzmann, F., Vatin, V., Cauchi, S., Morandi, A., Jouret, B., Landt, markers differentially associated with schizophrenia and bipolar O., Tounian, P., Levy-Marchal, C., Buzzetti, R., & Pinelli, L. disorder and implicates calcium channel genes. Psychiatr Genet 21, (2007) Non-synonymous polymorphisms in melanocortin-4 re- 1–4. ceptor protect against obesity: The two facets of a Janus obesity Evans, D. S., Calton, M. A., Kim, M. J., Kwok, P.-Y., Miljkovic, gene. Hum Mol Genet 16, 1837–1844. I., Harris, T., Koster, A., Liu, Y., Tranah, G. J., & Ahituv, N. Sun, C., Cheng, M.-C., Qin, R., Liao, D.-L., Chen, T.-T., Koong, (2014) Genetic association study of adiposity and melanocortin-4 F.-J., Chen, G., & Chen, C.-H. (2011) Identification and func- receptor (MC4R) common variants: Replication and functional tional characterization of rare mutations of the neuroligin-2 gene characterization of non-coding regions. PLoS One 9, e96805. (NLGN2) associated with schizophrenia. Hum Mol Genet 20, Farooqi, I. S. & O’rahilly, S. (2006) Genetics of obesity in humans. 3042–3051. Endocr Rev 27, 710–718. The UK10K Consortium. (2015) The UK10K project identifies rare Geller, F., Reichwald, K., Dempfle, A., Illig, T., Vollmert, C., Her- variants in health and disease. Nature. doi: 10.1038/nature14962. pertz, S., Siffert, W., Platzer, M., Hess, C., & Gudermann, T. Vaisse, C., Clement, K., Guy-Grand, B., & Froguel, P. (1998) A (2004) Melanocortin-4 receptor gene variant I103 is negatively frameshift mutation in human MC4R is associated with a domi- associated with obesity. Am J Hum Genet 74, 572–581. nant form of obesity. Nat Genet 20, 113–114. Heid, I., Vollmert, C., Hinney, A., Dor ¨ ing, A., Geller, F., Low ¨ el, Wang, D., Ma, J., Zhang, S., Hinney, A., Hebebrand, J., Wang, Y., H., Wichmann, H., Illig, T., Hebebrand, J., & Kronenberg, F. & Wang, H. J. (2010) Association of the MC4R V103I poly- (2005) Association of the 103I MC4R allele with decreased body morphism with obesity: A Chinese case–control study and meta- mass in 7937 participants of two population based surveys. JMed analysis in 55,195 individuals. Obesity 18, 573–579. Genet 42, e21-e21. Wheeler, E., Huang, N., Bochukova, E. G., Keogh, J. M., Lindsay, Hinney, A., Schmidt, A., Nottebom, K., Heibult, O., Becker, I., S., Garg, S., Henning, E., Blackburn, H., Loos, R. J., & Wareham, Ziegler, A., Gerber, G., Sina, M., Gorg, T., & Mayer, H. (1999) N. J. (2013) Genome-wide SNP and CNV analysis identifies Several mutations in the melanocortin-4 receptor gene including 48 Annals of Human Genetics (2016) 80,38–49 2015 The Authors. Annals of Human Genetics published by University College London (UCL) and John Wiley & Sons Ltd. Exome Study of Schizophrenia and Obesity common and low-frequency variants associated with severe early- Population based studies and meta-analysis of 29 563 individuals. onset obesity. Nat Genet 45, 513–517. Int J Obes 31, 1437–1441. Wu, M. C., Lee, S., Cai, T., Li, Y., Boehnke, M., & Lin, X. (2011) Rare-variant association testing for sequencing data with the se- quence kernel association test. Am J Hum Genet 89, 82–93. Supporting Information Yeo, G. S., Farooqi, I. S., Aminian, S., Halsall, D. J., Stanhope, R. G., & O’rahilly, S. (1998) A frameshift mutation in MC4R Additional Supporting Information may be found in the on- associated with dominantly inherited human obesity. Nat Genet line version of this article: 20, 111–112. Young, E. H., Wareham, N. J., Farooqi, S., Hinney, A., Hebe- Table S1 SLPs for all genes obtained from weighted burden brand, J., Scherag, A., O’rahilly, S., Barroso, I., & Sandhu, M. S. test using broad and narrow sets of variants. (2007) The V103I polymorphism of the MC4R gene and obesity: 2015 The Authors. Annals of Human Genetics (2016) 80,38–49 49 Annals of Human Genetics published by University College London (UCL) and John Wiley & Sons Ltd.

Journal

Annals of Human GeneticsWiley

Published: Jan 1, 2016

Keywords: ; ; ;

There are no references for this article.