Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Allelic expression imbalance of PIK3CA mutations is frequent in breast cancer and prognostically significant

Allelic expression imbalance of PIK3CA mutations is frequent in breast cancer and prognostically... www.nature.com/npjbcancer ARTICLE OPEN Allelic expression imbalance of PIK3CA mutations is frequent in breast cancer and prognostically significant 1,10 2,10 2 1,8 2 1,3 2 Lizelle Correia , Ramiro Magno , Joana M. Xavier , Bernardo P. de Almeida , Isabel Duarte , Filipa Esteves , Marinella Ghezzo , 4 5,9 5 5 1 5 Matthew Eldridge , Chong Sun , Astrid Bosma , Lorenza Mittempergher , Ana Marreiros , Rene Bernards , 4,6,7 4,6,11 1,2,11 ✉ ✉ Carlos Caldas , Suet-Feung Chin and Ana-Teresa Maia PIK3CA mutations are the most common in breast cancer, particularly in the estrogen receptor-positive cohort, but the benefitof PI3K inhibitors has had limited success compared with approaches targeting other less common mutations. We found a frequent allelic expression imbalance between the missense mutant and wild-type PIK3CA alleles in breast tumors from the METABRIC (70.2%) and the TCGA (60.1%) projects. When considering the mechanisms controlling allelic expression, 27.7% and 11.8% of tumors showed imbalance due to regulatory variants in cis, in the two studies respectively. Furthermore, preferential expression of the mutant allele due to cis-regulatory variation is associated with poor prognosis in the METABRIC tumors (P = 0.031). Interestingly, ER−,PR−, and HER2+ tumors showed significant preferential expression of the mutated allele in both datasets. Our work provides compelling evidence to support the clinical utility of PIK3CA allelic expression in breast cancer in identifying patients of poorer prognosis, and those with low expression of the mutated allele, who will unlikely benefit from PI3K inhibitors. Furthermore, our work proposes a model of differential regulation of a critical cancer-promoting gene in breast cancer. npj Breast Cancer (2022) 8:71 ; https://doi.org/10.1038/s41523-022-00435-9 INTRODUCTION breast cancer when the remaining wild-type allele was highly expressed . Activating oncogenic mutations are often characterized by gain- Here, we hypothesize that cis-regulatory variation also mod- of-function single-base alterations or focal DNA copy-number ulates the penetrance of oncogenic coding mutations. In the amplification, where the gain of just a single copy of a mutant 1 context of a gene cis-regulated by a genetic variant generating allele is sufficient for tumorigenesis . These gains change the imbalanced allelic expression, we postulate that an oncogenic stoichiometric balance between mutant and wild-type alleles and activating mutation in the same gene will have a different clinical are selected for in cancers, affecting approximately half of all impact depending upon whether it occurs in the preferentially oncogenic driver mutations . Ultimately, they could dictate expressed allele or the less expressed one. We tested this model in prognosis and therapeutic sensitivity. the context of heterozygous mutations in PIK3CA, the most However, the impact of gene dosage differences of oncogenic frequently mutated gene in breast cancer. First, we investigated mutations generated at the gene expression level has been largely whether normal cis-regulatory variation regulated the expression unexplored. Genetic variation and mutations regulate gene of PIK3CA in normal breast tissue. Then, we calculated allelic expression in an allele-specific manner—known as cis-regulatory expression ratios between mutant and wild-type copies in tumors variation —by altering protein and miRNA binding, for example. from two large breast cancer datasets—METABRIC and TCGA— Normal cis-regulatory variation affects most of the human both normalized for DNA copy number or not. Finally, we genome in all tissues and generates the wealth of phenotypic correlated the allelic expression ratios with clinical data. This 3–5 variation seen in species . Moreover, an extensive contribution approach allows us to distinguish between expression imbalances from noncoding variants to RNA alterations was recently observed generated from cis-regulatory variation alone, altered DNA copy 6 7 number, or both mechanisms. in tumors , including allelic imbalance of somatic mutations . Nevertheless, one unsolved aspect is how much each mechan- ism contributes to generating allelic imbalances in expression and whether they do it independently or in synergy. In breast tissue, RESULTS germline regulatory variation is associated with disease risk and Normal cis-regulatory variation affects PIK3CA expression in affects frequently mutated genes . We and others have shown healthy breast tissue that variants affecting the expression levels of BRCA1 and BRCA2 To investigate whether cis-regulatory variation modulates the 10,11 modify the risk of breast cancer in germline mutation carriers . expression of PIK3CA in normal breast tissue, we analyzed data We found that carriers of germline nonsense mutations in the from previous allelic expression analysis of normal breast tissue tumor suppressor gene BRCA2 were at a lower risk of developing from 64 healthy donors . We calculated the ratio of expression of 1 2 Faculty of Medicine and Biomedical Sciences (FMCB), Universidade do Algarve, Faro, Portugal. Center for Research in Health Technologies and Information Systems (CINTESIS), 3 4 Universidade do Algarve, Faro, Portugal. ProRegeM-PhD Program in Mechanisms of Disease and Regenerative Medicine, Universidade do Algarve, Faro, Portugal. Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Robinson Way, Cambridge, UK. Division of Molecular Carcinogenesis, The Netherlands Cancer 6 7 Institute, Amsterdam, The Netherlands. Department of Oncology, University of Cambridge, Cambridge, UK. Cancer Research UK Cambridge Cancer Centre, Cambridge, UK. 8 9 10 Present address: The Research Institute of Molecular Pathology, Vienna, Austria. Present address: DKFZ, Heidelberg, Germany. These authors contributed equally: Lizelle Correia, Ramiro Magno. These authors jointly supervised this work: Suet-Feung Chin, Ana-Teresa Maia. email: suet-feung.chin@cruk.cam.ac.uk; atmaia@ualg.pt Published in partnership with the Breast Cancer Research Foundation 1234567890():,; L. Correia et al. 2.5 1.5 0.5 −0.5 −1.5 −2.5 rs12488074 rs3729679 rs3960984 rs4855093 rs7636454 rs9838411 daeSNPs Fig. 1 Cis-regulatory variation impacts on PIK3CA gene expression in normal breast tissue. AE ratios for six daeSNPs in the PIK3CA gene region, each dot is a heterozygous individual for the corresponding variant indicated in the x-axis, dotted lines delimit the levels of 1.5-fold difference for either allele preferential expression (jj AE ¼ 0:58). Boxplots display the median, the lower and upper hinges corresponding to the first and third quartiles, and lower and upper whiskers corresponding to the smallest and largest values from the 1.5 * IQR (interquartile range), respectively. one allele by the other in heterozygous variant positions, which is rs2699887, which is associated with higher expression of PIK3CA in tumors (Supplementary Fig. 1E). a robust approach to detect cis-acting variant effects, as it cancels out the trans effects that act on the same gene and influence both alleles equally. We found six variants in PIK3CA displaying Preferential expression of the PIK3CA mutated alleles is differential allelic expression (daeSNPs) (see “Methods”) (Fig. 1). frequent in breast tumors Of these six, only rs3729679 is not in strong linkage Changes in DNA copy number in tumors are associated with disequilibrium (LD) with the others (Supplementary Table 1). 1,14–17 changes in gene expression in cis leading to dosage rs3960984 showed the largest proportion of heterozygotes imbalances of coding mutations . However, these differences displaying allelic differences (57%), while three other daeSNPs can also be due to germline and somatic cis-regulatory variation, shared the smallest fraction (14%): rs12488074, rs4855093, but their effect on mutation dosage imbalance is underexplored. and rs9838411. So, we set out to assess whether PIK3CA somatic mutations would In the daeSNPs rs7636454, rs3960984, rs12488074, and have their functional effects, or penetrance, modified by rs9838411, the ratios showed a unilateral distribution, with imbalances in allelic expression generated by regulatory variants. samples displaying preferential expression towards the same We hypothesized that preferential expression of a gain-of-function allele. These patterns of allelic expression ratios’ distribution mutation would have a more substantial clinical impact than those suggest that the daeSNPs at which allelic expression is being occurring in lowly expressed alleles, thus generating intertumor measured and the possible functional regulatory variants (rSNPs) clinical heterogeneity (Fig. 2a). To test this, we carried mutant vs. are in strong, yet incomplete, LD with each other . wild-type allelic expression analysis in breast tumor samples While the mapping analysis carried out to identify candidate carrying somatic PIK3CA missense mutations on two independent rSNPs did not find a significant association after multiple testing sets of data, the METABRIC (n = 94) and the TCGA (n = 178) correction (Supplementary Table 2), one of the variants with projects. Supplementary Table 4 presents a summary description of the two datasets and Supplementary Fig. 2 shows the number, nominal P value ≤ 0.05, rs2699887 (Wilcoxon two-sample test location, and amino acid alterations of the mutations across the estimated difference of 0.22, 95% CI = [0.031-Inf]) (Supplementary two datasets. Fig. 1A), showed great regulatory potential. Namely, it is an eQTL Next, we calculated three allelic ratios from DNA-seq and RNA- (expression quantitative trait locus) for PIK3CA (P = 0.011, Supple- seq data for each mutation: mentary Fig. 1B) in tumors from METABRIC , is located at its promotor region and at a DNAse I hypersensitivity site (Supple- (1) α ¼ log (number of mutant RNA-seq reads/number of wild- mentary Fig. 1C), and is bound by POL2 in a breast cancer cell line type RNA-seq reads), i.e., the net mutant allele expression (Supplementary Table 3). In-silico functional analysis of this variant imbalance; suggested a disruption of the binding motif of the transcription (2) β ¼ log (number of mutant DNA-seq reads/number of factor NF-YA (Supplementary Fig. 1D), and in vitro studies revealed wild-type DNA-seq reads), i.e., the mutant allele relative copy-number; a preferential protein::DNA binding to the minor T allele of npj Breast Cancer (2022) 71 Published in partnership with the Breast Cancer Research Foundation 1234567890():,; AE ratios L. Correia et al. Fig. 2 Mutant allelic imbalance in gene expression of somatic missense PIK3CA mutations is frequent in breast tumors, particularly for preferential expression of the mutant allele. a Schematic representation of the hypothesis: cis-acting regulatory variants (rVar), either from germline or somatically acquired, generate different relative allelic expression ratios of mutant and wild-type alleles, resulting in tumors of different prognosis. b Top: log ratio α, β, and γ 89% credible intervals (CI) in breast tumors. Bottom: CIs collective posterior distribution split according to imbalance. A sample is deemed imbalanced if the CI does not cross zero. Samples with significant imbalance are displayed in red. c Correlation analysis of α vs. β and α vs. γ, showing that both genomic copy-number dosage and allelic expression regulation contribute to imbalances in the expression of mutated alleles in tumors. Point coordinates are Maximum A Posteriori probability estimates (MAP) of the 89% CIs. d Comparison of matched γ and β values, showing predominance of tumors with a preferential allelic expression of the mutated allele. Point coordinates are Maximum A Posteriori probability estimates (MAP) of the 89% CIs. Published in partnership with the Breast Cancer Research Foundation npj Breast Cancer (2022) 71 L. Correia et al. (3) γ = α − β, i.e., the net mutant allele expression imbalance and both copy number and cis-regulatory variation (Fig. 2c), albeit normalized for the DNA allelic copy-number imbalances, with an effect for the copy number over the double the size of which corresponds to a putative mutant allele expression that found for cis-regulatory variation (average Pearson correlation imbalance due to cis-regulation. r = 0.80 and 0.34, respectively). Next, we considered the variance (Var) of the net allelic expression as the sum of the effects of both In this way, α reports on the net allelic expression imbalance, mechanisms, plus the covariance (Cov) accounting for predicted generated by different mechanisms including copy-number non-mutual exclusion of mechanisms acting on any given allele: aberrations, cellularity differences, and cis-regulatory variation, while γ reports specifically on the contribution from cis-regulatory VarðαÞ¼ VarðβÞþ VarðγÞþ 2 Covðβ; γÞ; (1) variation (rVar in Fig. 2a), including normal genetic variation, somatic noncoding mutations, and allelic epigenetic changes. we calculated the contribution of cis-regulatory variation to the Figure 2b displays the distributions of the different ratios. variance of net allelic expression asðÞ VarðγÞþ Covðβ; γÞ =VarðαÞ. We found that net mutant allele expression imbalances (α ratio) Here, we found that cis-regulatory variants explain 20.6% and are frequent in breast tumors, at 70.2% in METABRIC (66 out of 94) 14.4% of the variability of net mutant allelic expression seen in and 60.1% in TCGA (107 out of 178). The same is true for γ ratios, METABRIC and TCGA, respectively (Supplementary Table 5). at 27.7% for METABRIC (26 out of 94) and 11.8% for TCGA (21 out Finally, assessing how the two mechanisms act simultaneously of 178), indicating that cis-regulatory effects acting on mutations on each tumor, we found that the majority of samples (70.2% and are also frequent in breast tumors. In both sets, we found samples 54.5% for the METABRIC and TCGA, respectively) had positive γ with striking net preferential allelic expression for the mutant and negative β values (Fig. 2d), suggesting that although the allele (maximum 44.8-fold and 220-fold in METABRIC and TCGA, mutant allele was in lower genomic quantity, it was nevertheless respectively), but not so for the preferential expression of the wild- preferentially expressed compared to the wild-type allele. Inter- type allele (fold differences of 5.4 and 29 in METABRIC and TCGA, estingly, there were 10.6% and 11.2% samples with positive α and respectively) (Fig. 2b). Similarly, the mutant allele’s most negative β values, in METABRIC and TCGA respectively. This shows pronounced preferential expression trend was found for the γ that these tumors overexpress the mutant allele despite this allele ratio, 10- and 4.2-fold for METABRIC and TCGA, respectively, albeit being in lower copy number. with smaller fold differences between alleles. Only a minor fraction of samples displayed co-occurring Interestingly, we observed that within the samples with preferential allelic expression and a higher allele copy number significant mutant allele expression imbalance due to cis- of the mutant allele (6.38% and 8.43% for the METABRIC and regulatory variation there was a significant prevalence of samples TCGA, respectively). These results were independent of the effect that preferentially expressed the mutated allele in both datasets of tumor cellularity (Supplementary Fig. 3). −8 (binomial test Prob. = 1, 89%−CI = [0.89, 1.00], P = 3×10 for −4 METABRIC and Prob. = 0.90, 89%−CI = [0.73, 0.98], P = 2× 10 Preferential expression of mutant alleles by cis-regulatory for TCGA). variation associates with poor prognosis To investigate the impact of differential cis-regulation of PIK3CA’s Cis-regulatory variants contribute significantly to imbalances mutations on clinical outcome (overall and disease-specific in the expression of mutant alleles survival), we performed univariate survival analysis with γ ratios Next, hypothesizing that both copy number and cis-regulatory categorized in three groups, based on the existence of imbalance variants are the major contributors to allelic expression, we set out and its direction, i.e. whether there was significant predominance of to assess the contribution of each mechanism toward the net expression of the mutated allele γ , of the wild-type allele γ ,or mut wt mutant allele expression imbalances detected in these tumors. balanced allelic expression γ . We uncovered that the group balanced First, we found positive correlations between net allelic expression γ had a poorer disease-specific survival rate (P= 0.031, Fig. 3a) mut Fig. 3 Allelic preferential expression of PIK3CA mutations is associated with survival and clinicopathological parameters in breast cancer. a Kaplan–Meier curve of disease-specific survival showing the worse prognosis of patients with differential expression of the PIK3CA mutations (γ group, shown in blue) compared to those expressing equimolar levels of mutation and wild-type alleles (γ group, shown in red), in mut balanced METABRIC. Shown below the graph are the numbers of patients at risk per group throughout time. b Preferential expression of the mutated allele is associated with ER-negative, PR-negative, and Her2-positive breast tumors. In all graphs, samples were colored according to the significance of the allelic expression imbalance. q values indicated correspond to the Wilcoxon rank-sum test with continuity correction, corrected for multiple testing using the Benjamini & Hochberg method. Survival plots indicate the 95%CI as colored shades. Boxplots display the median, the lower and upper hinges corresponding to the first and third quartiles, and lower and upper whiskers corresponding to the smallest and largest values from the 1.5 * IQR (interquartile range), respectively. npj Breast Cancer (2022) 71 Published in partnership with the Breast Cancer Research Foundation L. Correia et al. than the γ group for METABRIC. The median overall survival considering that PIK3CA is an oncogene, one possibility is that balanced for the γ group was 5.88 years and for the γ group was positive selection could have a role in generating this difference, mut balanced 12.46 years (Supplementary Fig. 4B), whereas, in the disease- which should be further investigated. Furthermore, we also specific analysis, the mean survival of the γ patients was 7.07 found that allelic imbalance in expression observed for the mut years, and 41% of patients died during the length of the follow-up, mutant alleles in the tumors was greater than that observed for in comparison with 25.3% deaths in the γ group (Fig. 3a). single-nucleotide polymorphisms in the normal-matched tissue balanced The categorized γ ratios were not significantly associated with of patients. These findings support the hypothesis of somatic overall survival in the multivariate analysis (Supplementary Fig. 5). regulatory mutations involvement in generating the imbalances However, some of the variables that are usually independent observed in the tumors. While genomic allelic imbalance remains prognosis factors, such as PR and HER2 statuses, were not the largest determinant of allelic expression dosage (showing the significantly associated with survival either in this analysis. In the highest correlation with and contributing the most to the TCGA set, there was a trend toward a worse disease-specific variability observed in net allelic expression), cis-regulatory survival of those patients whose tumors preferentially express the variation is also significantly correlated with net allelic expression mutated allele (Supplementary Fig. 6). However, due to the and explains ~16% of its variability across samples in these sets relatively shorter follow-up time of this dataset (median ~1 year) of tumors. and the fact that tumors were mainly Luminal A (~61.2% of The analysis of RNA-seq data from two independent cohorts of samples) , the power to detect significant differences is smaller tumor samples, the METABRIC and TCGA projects, strongly than that of METABRIC. Nevertheless, the joint analysis of the two supports our findings. datasets showed a significantly worse disease-specific survival of Moreover, we show that preferential expression of the mutant the α group of patients, with a concordant trend in the γ mut mut allele due to cis-variation is associated with poor prognosis group (Supplementary Fig. 7). variables, such as ER-negative, PR-negative, and Her2-positive 20,21 statuses . In the METABRIC dataset, we also found that preferential expression of the mutant allele was associated with PIK3CA preferential mutant allele expression associates with worse overall and disease-specific survival. The high stringency clinicopathological variables in calling imbalance and the focus on a specific type of mutation Next, we sought to investigate whether PIK3CA’s differential in one gene, limits this study in terms of the sample size mutant allele was associated with known prognostic clinicopatho- analyzed, but on the other hand it provides the simplest scenario logical variables, namely hormone receptors (ER, PR) and HER2 for testing our hypothesis. Interestingly, the joint analysis of the amplification, which are directly and indirectly connected to gene datasets revealed some level of association between disease- expression regulation, respectively. specific survival and the preferential expression of the mutant For both datasets, we observed that preferential mutant allele allele, both net and due to cis-regulation, reaffirming the clinical expression driven by cis-regulatory variation (γ) was associated importance of the expression level of a mutation commonly with markers of worse prognosis, namely it was significantly associated with aggressive tumors. In addition, some tumors higher in ER-negative tumors and PR-negative tumors, and in presented preferential expression of the wild-type allele of HER2-positive tumors only in METABRIC (Fig. 3b). When evaluating PIK3CA, suggesting that these mutations are lowly expressed and the contribution of cis-regulatory variation to this association, we possibly passenger events. also found that higher average γ values associated with lower PR Besides the potential use of our findings as a prognosis expression (P = 0.040) and HER2-positive tumors (P = 0.025), but biomarker in the clinic, these results may also have therapeutic we did not find a significant association with ER expression implications. Some of the major clinical challenges in cancer (P = 0.129) (Supplementary Fig. 8). treatment are identifying biomarkers of prognosis and defining Given these results, we took γ into consideration in the survival which patients will benefit from a given therapy. Particularly, it is analysis within the expression subgroups of ER, PR, and HER2, but crucial to identify patients unlikely to respond to specific therapies did not find significant differences in overall and disease-specific to prevent unnecessary drug cytotoxicity without any therapeutic survival in METABRIC (Supplementary Fig. 4). benefits. Our results reveal the importance of considering allelic Considering other known prognostic variables, including expression in somatic mutation screens in these two aspects of tumor size, grade, and molecular subtypes (PAM50 and patient management. Despite the high frequency of PIK3CA IntClust , we found a significant association between γ ratios mutations in breast cancers, the response to PI3K inhibitor therapy and PAM50 subtypes only in METABRIC (q = 0.027) (Supplemen- has been more challenging than expected, and the prognostic tary Table 6 and Supplementary Fig. 9). significance of detecting somatic PIK3CA mutations in breast Finally, we did not find an association between the candidate tumors is unclear . Relevant to this discussion, we have germline regulatory variant rs2699887 and γ or clinical out- previously shown that the presence of PIK3CA mutations confer come, suggesting germline variants are unlikely to be involved in a poorer prognosis in patients with ER-positive breast cancer the significant associations described above (data not shown). only when stratified into copy-number driven subgroups (IntClust However, supporting the involvement of somatic cis-regulatory 1+,2+,9+) . variants instead, we found smaller fold changes and less samples In this study, we provide new evidence for the prognostic with imbalances measured at common PIK3CA variants in normal- significance of these mutations at the expression level in breast matched tissue data than those measured at mutations in tumor tumors. Particularly for tumors with significant preferential tissue (Supplementary Fig. 10). expression of the wild-type allele, this prognostic significance has a potential impact on therapy response and clinical manage- DISCUSSION ment since one may hypothesize that little to no benefit would Our work reveals the role of cis-regulatory variation acting on come from treatment in the cases not expressing the targetable PIK3CA somatic mutations as modifiers of mutation penetrance. mutation. We show for the first time that allelic expression imbalance Further studies evaluating the allelic expression of mutant between PIK3CA’s mutant and wild-type alleles is common and oncogenes in the tumors of patients enrolled in molecular-driven prognostic in breast cancer. trials will clarify this impact. Particularly, preferential expression of the mutant allele is More challenging is determining which cis-regulatory mechan- 11,24 significantly more common than that of the wild-type allele, and isms are promoting allelic expression imbalances. Both inherited Published in partnership with the Breast Cancer Research Foundation npj Breast Cancer (2022) 71 L. Correia et al. 6,25–27 and acquired variants can affect gene expression in an allelic Differential allelic expression (DAE) at the sample level was defined as ∣AE 28,29 ratio∣ ≥ 0.58 (1.5-fold or greater between alleles), based on previous studies manner . 3,36 using microarray data . Variants with at least 10% and three hetero- Here we show that normal cis-regulatory variation regulates zygous samples displaying DAE were further classified as daeSNPs. PIK3CA’s expression in normal breast tissue, with the possible Linkage disequilibrium (LD) between daeSNPs was evaluated using the contribution of rs2699887 as a regulatory variant. We also found genetic variant-centered annotation browser SNiPA . that the heterozygotes for “rs2699887” were associated with higher expression of the PIK3CA gene compared to the common Genotype imputation analysis on normal breast tissue homozygotes. Although there is published data supporting the samples clinical association of rs2699887 with poor prognosis in other 30,31 Illumina Exon 510 Duo germline genotype data from the 64 samples that cancers , linked to an increase in PI3K signaling, there is still 32,33 passed microarrays quality control, were filtered to keep variants with call some data supporting the opposite association . We did not rates ≥85%, minor allele frequency >0.01, and Hardy–Weinberg equili- find an association between rs2699887 and survival, which −5 38 brium with P >1×10 . Next, genotypes were imputed with MACH1.0 opens the possibility for other mechanisms besides normal cis- for all additional known variants on chromosome 3, using as reference regulatory variation to be considered as contributors to the panel the phased CEU panel haplotypes from the HapMap3 release preferential allelic expression in these tumors (data not shown). (HapMap3 NCBI Build , CEU panel —Utah residents with Northern and Double PIK3CA mutations in the same allele are frequent in Western European ancestry), and the recommended two-step imputation process: model parameters (crossover and error rates) were estimated breast tumors , and the impact of noncoding mutations in cancer before imputation using all haplotypes from the study subjects and is just starting to be explored . So, a possibility is that the running 100 Hidden Markov Model (HMM) iterations; then genotypes were combination of noncoding and coding mutations in the same imputed using the model parameter estimates from the previous round. gene might be underlying the allelic expression imbalances we Imputation results were filtered based on an rq score  0:3 , a platform- are detecting. specific measurement of variant imputation uncertainty. Further studies on allelic expression imbalances of activating mutations, and even inactivating ones, should further reveal the Differential allelic expression (DAE) mapping analysis on contribution of cis-regulatory mechanisms in tumor develop- normal breast tissue samples ment and progression. Particularly interesting to determine is Differential allelic expression mapping analysis was performed by whether the coding mutation originates in an allele predis- stratifying AE ratios at each PIK3CA daeSNP according to the genotype posed with higher expression, or whether a sequence of at variants located within ±250 Kb. somatic events introduces the coding activating mutation and A Mann–Whitney test was applied to test if the mean of the absolute AE additional cis-regulatory noncoding mutations. The answers ratios of the heterozygous samples was greater than those of the could have significant repercussions on our understanding of combined reference and alternative allele homozygous samples. Correc- tumor evolution. tion for multiple testing was performed using BH method (p.adjust, R stats 4.0.3 package ) and limiting the significance to q values ≤0.05. In summary, we show that differential expression between the mutant and wild-type alleles of PIK3CA is common in breast cancer and with a significant contribution from allele-specific cis- Functional annotation of DAE mapping associated variants regulatory effects. We further show that mutant allele differential Variants in LD with SNPs with DAE mapping nominal-p-value ≤0.05 were expression is associated with clinical parameters such as ER, PR, retrieved using the function get_ld_variants_by_window from the and HER2 statuses and is prognostically significant. ensemblr R package (https://github.com/ramiromagno/ensemblr) using Collectively, our work establishes the prognostic relevance of the 1000 GENOMES project data (phase_3) for the EUR population and an r > 0.95. These proxy SNPs were assessed for overlap with epigenetic allele-specific transcriptional regulation of PIK3CA somatic muta- marks derived from the Encyclopedia of DNA Elements (ENCODE) and NIH tions. It also supports a shift in the mutation testing in patient Roadmap Epigenomics projects, such as chromatin states (chromHMM) management, where the level of expression of these mutations annotation, regions of DNase I hypersensitivity, transcription factor binding should be considered, besides the detection at the DNA level. sites, and histone modifications of epigenetic markers (H3K4Me1, H3K4Me3, and H3K27Ac) (http://genome.ucsc.edu/ENCODE/) for normal human mammary epithelial cells (HMECs), human mammary fibroblasts METHODS (HMFs), BR.MYO (breast myoepithelial cells) and BR.H35 (breast vHMEC) Subjects and two breast cancer cell lines MCF-7 and T47D. We prioritized variants located on either active promoter or enhancer regions in mammary cell Normal breast and tumor samples were obtained with the written lines, and for which ChIP-Seq data indicated protein binding or position informed consent from donors and appropriate approval from local weight matrix (PWM) scores predicted differential protein binding for ethical committees, with the detailed information described in the 9 14 35 different alleles. Two publicly available tools, RegulomeDB and HaploReg respective original publications: normal tissue , METABRIC , TCGA . v4.1, and the MotifBreakR Bioconductor package, were also used to 39,41,42 evaluate those candidate functional variants . Differential allelic expression analysis DNA and total RNA from 64 samples of normal breast tissue were Electrophoretic mobility shift assay (EMSA) hybridized onto Illumina Exon510S-Duo arrays (humanexon510s-duo), and MCF-7 (ER-positive) and HCC1954 (ER-negative) breast cancer cell lines were data were analyzed as described before . In short, after sample filtering cultured in DMEM and RPMI culture media, respectively, supplemented with and normalization, variants with average RNA log2 allelic intensity values 10% FBS and 1% PS (penicillin and streptomycin). Nuclear protein extracts greater than 9.5 and heterozygous in five or more samples were kept for were prepared using the Thermo Scientific PierceTM NER kit, according to further analysis. the manufacturer’s instructions. Oligonucleotide sequences corresponding Allelic log ratios were calculated for RNA and DNA intensity data: to the C (common) and T (minor) alleles of rs2699887 (5’-AGCGTGAGT log ratio ¼ log ðAÞ log ðBÞ; (2) AGAGCGCGGA[C/T]TGGCCGGTAGCGGGTGCGGTG-3’) were labeled using 2 2 the Thermo ScientificPierce Biotin 3’ End DNA Labelling Kit, according to the for alleles A and B.Next, variants that showed significant differences manufacturer’s instructions. Oligonucleotides with known binding motifs for 43 44 between the RNA log ratios between heterozygous (AB) and homozygous NF-YA and E2F1 were used in competition assays. Undiluted antibodies groups (AA and BB) (two-sample Student’s t test, P value < 0.05) were used for supershift competition assays were NF-YA (H-209) (Santa Cruz selected for differential allelic expression analysis. Biotechnology, SC-10779X) and HMGA1a/HMGA1b (Abcam, ab4078). EMSA Allelic expression (AE) ratios were normalized for allelic DNA content: experiments were performed using the Thermo ScientificLightShiftTM Chemiluminescent EMSA Kit, using the buffer and binding reaction AE ratio ¼ RNA log - ratio  DNA log - ratio (3) conditions previously described . Each EMSA was repeated at least twice npj Breast Cancer (2022) 71 Published in partnership with the Breast Cancer Research Foundation L. Correia et al. for all combinations of cell extract and oligonucleotide, which were also the normalized mutant allele expression ratio, a proxy for the mutant allelic tested in serial dilution amounts. expression imbalance due to cis-regulation alone. Statistical inference of allelic expression imbalances. According to these Breast tumor samples log-ratio definitions, a positive value indicates an imbalance toward the The METABRIC dataset of tumor samples included 2433 samples from the mutant allele, and a negative value an imbalance favoring the wild-type METABRIC project with DNA sequencing data, among which 480 were allele. However, the statistical significance of each log ratio depends on the subjected to a capture-based RNA sequencing study . Sequencing read coverage of each allele, e.g., low read-coverage values are subject to libraries were generated as previously described. In brief, sequencing greater random variation, and hence less reliable log ratios and imbalances libraries using total RNA generated from frozen tissues with a TruSeq estimation. To assign a measuring of uncertainty to our imbalances’ mRNA Library Preparation Kit using poly-A-enriched RNA (Illumina, San estimates, we assumed that the read counts are well modeled by a Beta- Diego, CA, USA) and enriched with the human kinome DNA capture baits Binomial distribution, and following Bayesian reasoning, we estimated 89% (Agilent Technologies, Santa Clara, CA, USA). Six libraries were pooled for credible intervals (CI) and Maximum A Posteriori probability estimates each capture reaction, with 100 ng of each library, and sequenced (paired- (MAP) for the log ratios β, α, and γ (reported in Fig. 2). end 51bp) on an Illumina HiSeq2000 platform. We selected a subset of samples with DNA and RNA sequencing data and PIK3CA missense Allelic expression imbalances in normal-matched tissue data. Solid normal mutations for further analysis. breast tissue from breast cancer female patients was obtained from TCGA- The TCGA dataset comprised 695 samples from TCGA breast cancers , BRCA. We selected 112 samples with RNA-Seq data, obtained in bam file from which we selected a subset of 289 samples with PIK3CA missense format. Sequence data were converted to fastq format (samtools ), mutations for further analysis. Supplementary Table 3 summarizes the underwent initial quality control (FastQC ), and trimming (Trimmo- demographic features and disease characteristics of the two datasets. matic ). Following QC, six samples were removed from analysis. The remaining sequence data was mapped to the reference genome (hg38) DNA-seq and RNA-seq variants calling in tumors using STAR aligner (v.2.7.7a ). Otherwise, alignment, preprocessing and Alignment and preprocessing. Sequence data (FASTQ) mapped to the variant calling was performed as described below. RNA data was filtered to reference genome (hg19) were aligned using STAR v2.4.1 . A two-pass contain only heterozygous variants at the DNA level, circumscribed to alignment was carried out: splice junctions detected in the first alignment PIK3CA’s genomic location. DNA data was accessed from TCGA-BRCA’s run are used to guide the final alignment. Duplicates were marked with microarray raw data for 111 of the 112 initial RNA-Seq samples. Genotypes Picard v1.131 (http://picard.sourceforge.net). Genome Analysis Toolkit were obtained using the CRLMM algorithm (‘crlmm’ R Bioconductor 46 56 (GATK) was used for indel realignment and base quality score recalibration . package, ) and quality controlled for HWE, major allele frequency and 10% missing genotypes (‘SNPassoc’ R package ). Genotypes were lifted over from hg38 to hg37 (‘rtracklayer’ R Bioconductor package ), Variant calling and annotations. SNV and indel variants were called using harmonized (‘GenotypeHarmonizer’ ), and imputed (Michigan Imputation GATK Haplotype Caller. Hard filters using GATK VariantFiltration were 60 61 Server ). Obtained genotypes were quality controlled using PLINK and applied to variants . Variants were annotated with Ensembl Variant Effect lifted over back to hg38. Allelic expression imbalances, equivalent to α Predictor (VEP) . Heterozygous genotypes were called from DNA data to ratios in tumors, were inferred for heterozygous germline variants as avoid RNA editing and other RNA-related variants because true allelic described above. imbalance can lead to heterozygous sites being called homozygous in RNA-based genotype calling. Two-sample tests of imbalance ratios with clinical covariates. Association between allelic expression imbalance ratios and clinical data was achieved Analysis of allelic expression imbalances in tumors by bivariate analysis Wilcoxon rank-sum test with continuity correction or Before the analysis, a set of filtering steps was performed to select samples: Kruskal–Wallis rank-sum test, as indicated in tables and figures. P values (1) presence of missense mutations; (2) and a minimum of 30 reads for were adjusted per study using the Benjamini & Hochberg correction and 48–50 RNA-seq and DNA-seq data . were considered significant when ≤ 0.05. Clinical data for METABRIC were updated from the original studies with the latest available records. Clinical data for TCGA were imported from Correlation analysis. Correlation analysis α vs β and α vs γ ratios for both https://portal.gdc.cancer.gov/ on November 26, 2018. sets of samples were performed using a Pearson’s test. All statistical analysis and data visualization were performed using R. Filtering of tumor samples. For both datasets —METABRIC and TCGA—,a set of quality control criteria were applied to filter the DNA-seq and RNA- seq samples, namely: Survival analyses a. Keep only samples containing PIK3CA missense mutations; Kaplan–Meier plots and multivariate Cox proportional hazard models were b. Keep samples whose coverage at mutated loci is, all together in both used to examine the association between alpha and gamma allelic 48–50 62,63 alleles, at least 30 reads for both RNA-seq and DNA-seq data . expression ratios and survival using the survival package from R . Death due to all causes was used as the endpoint, and all alive subjects were Clinical data of METABRIC patients were updated from the original censored at the date of the last contact. Kaplan–Meier survival curves were studies with the latest available records. The TCGA clinical dataset was 51,52 obtained from cBioPortal on 28 November 2021 by programmatic compared using the log-rank test. access with the R package cgdsr. For the multivariate analysis, Cox proportional hazard model was used to assess the effect of γ on the overall survival. Hazard ratios (HRs) and Allelic expression imbalances in tumor data. Allelic expression imbalances 95% confidence intervals (CI) were estimated by fitting the Cox model are calculated as follows. For each mutated loci, the pair of read counts (X, while adjusting for age and tumor characteristics, such as size, Y), for wild-type (X) and mutant (Y) alleles, respectively, measured either by Scarff–Bloom–Richardson histological grade, clinical stage and estrogen DNA-seq or RNAseq, are transformed using the log ratios β, α, and γ, which receptor (ER), progesterone (PR), and human epidermal growth factor 2 are defined as follows: (HER2) statuses. For the bivariate analysis, Wilcoxon rank-sum two-sample tests were β ¼ log ðY =X Þ; (4) 2 DNA DNA used to compare α and γ between different hormone receptor statuses the DNA mutant allele ratio, which served to control for sequencing and q ≤ 0.05, calculated using the Benjamini & Hochberg method, were artifacts from heterozygous genotypes and to account for differences in considered statistically significant. variant frequencies in DNA; α ¼ log ðY =X Þ; (5) RNA RNA Reporting summary that served as a measure of the net allelic expression imbalance in tumors; Further information on research design is available in the Nature Research γ ¼ α  β; (6) Reporting Summary linked to this article. Published in partnership with the Breast Cancer Research Foundation npj Breast Cancer (2022) 71 L. Correia et al. DATA AVAILABILITY 22. Keegan, N. M., Gleeson, J. P., Hennessy, B. T. & Morris, P. G. PI3K inhibition to overcome endocrine resistance in breast cancer. Expert Opin. Investig. Drugs 27, Microarray raw data are deposited in the Gene Expression Omnibus under accession 1–15 (2018). number GSE35023. Primary data (BAM files) for DNA-seq are deposited at the 23. Pereira, B. et al. The somatic mutation profiles of 2,433 breast cancers refine their European Genome-phenome Archive (EGA) under study accession number genomic and transcriptomic landscapes. Nat. Commun. 7, 11479 (2016). EGAS00001001753 and may be downloaded upon request and authorization by 24. Yan, H. Allelic variation in human gene expression. Science 297, 1143–1143 the METABRIC Data Access Committee. Primary data (BAM files) for RNAseq are (2002). available from the authors upon reasonable request. Primary data (BAM files) for 25. Huang, F. W. et al. Highly recurrent TERT promoter mutations in human mela- DNA-seq and RNAseq from TCGA are deposited in the database of Genotypes and noma. Science 339, 957–959 (2013). Phenotypes (dbGaP) under the study accession number phs000178. 26. Mansour, M. R. et al. An oncogenic super-enhancer formed through somatic mutation of a noncoding intergenic element. Science 346, 1373–1377 (2014). 27. Przytycki, P. F. & Singh, M. Differential allele-specific expression uncovers breast CODE AVAILABILITY cancer genes dysregulated by cis noncoding mutations. Cell Syst. 10, 193–203.e4 The filtered data and code for the analysis of mutant allele expression imbalances (2020). and the survival analysis can be publicly accessed at https://github.com/maialab/ 28. Shoemaker, R., Deng, J., Wang, W. & Zhang, K. Allele-specific methylation is npjbcPIK3CA. prevalent and is contributed by CpG-SNPs in the human genome. Genome Res. 20, 883–889 (2010). 29. Ongen, H. et al. Putative cis-regulatory drivers in colorectal cancer. Nature 512, Received: 2 June 2021; Accepted: 31 March 2022; 87–90 (2014). 30. Morgese, F. et al. Impact of phosphoinositide-3-kinase and vitamin D3 nuclear receptor single-nucleotide polymorphisms on the outcome of malignant mela- noma patients. Oncotarget 8, 75914–75923 (2017). 31. Li, Q. et al. Associations between single-nucleotide polymorphisms in the PI3K- REFERENCES PTEN-AKT-mTOR pathway and increased risk of brain metastasis in patients with 1. Bielski, C. M. et al. Widespread selection for oncogenic mutant allele imbalance in non-small cell lung cancer. Clin. Cancer Res. 19, 6252–6260 (2013). cancer. Cancer Cell 3, 852–862.e4 (2018). 32. Wang, L.-E. et al. Roles of genetic variants in the PI3K and RAS/RAF pathways in 2. Pastinen, T. Cis-acting regulatory variation in the human genome. Science 306, susceptibility to endometrial cancer and clinical outcomes. J. Cancer Res. Clin. 647–650 (2004). Oncol.138, 377–385 (2011). 3. Ge, B. et al. Global patterns of cis variation in human cells revealed by high- 33. Pu, X. et al. PI3K/PTEN/AKT/mTOR pathway genetic variation predicts toxicity and density allelic expression analysis. Nature Genet. 41, 1216–1222 (2009). distant progression in lung cancer patients receiving platinum-based che- 4. Pastinen, T. et al. A survey of genetic and epigenetic variation affecting human motherapy. Lung Cancer 71,82–88 (2011). gene expression. Physiol. Genomics 16, 184–193 (2004). 34. Vasan, N. et al. Double PIK3CA mutations in cis increase oncogenicity and sen- 5. Morley, M. et al. Genetic analysis of genome-wide variation in human gene sitivity to PI3Kα inhibitors. Science 366, 714–723 (2019). expression. Nature 430, 743–747 (2004). 35. Wilkerson, M. D. et al. Integrated RNA and DNA sequencing improves mutation 6. Calabrese, C. et al. Genomic basis for RNA alterations in cancer. Nature 578, detection in low purity tumors. Nucleic Acids Res. 42, e107–e107 (2014). 129–136 (2020). 36. Verlaan, D. J. et al. Targeted screening of cis-regulatory variation in human 7. Rhee, J.-K., Lee, S., Park, W.-Y., Kim, Y.-H. & Kim, T.-M. Allelic imbalance of somatic haplotypes. Genome Res. 19, 118–127 (2009). mutations in cancer genomes and transcriptomes. Sci. Rep. 7, 1653 (2017). 37. Arnold, M., Raffler, J., Pfeufer, A., Suhre, K. & Kastenmüller, G. SNiPA: an inter- 8. Meyer, K. B. et al. Allele-specific up-regulation of FGFR2 increases susceptibility to active, genetic variant-centered annotation browser. Bioinformatics 31, breast cancer. PLoS Biol. 6, e108 (2008). 1334–1336 (2014). 9. Maia, A.-T. et al. Extent of differential allelic expression of candidate breast cancer 38. Li, Y., Willer, C. J., Ding, J., Scheet, P. & Abecasis, G. R. MaCH: using sequence and genes is similar in blood and breast. Breast Cancer Res. 11, R88 (2009). genotype data to estimate haplotypes and unobserved genotypes. Genetic Epi- 10. Cox, D. G. et al. Common variants of the BRCA1 wild-type allele modify the risk of demiol. 34, 816–834 (2010). breast cancer in BRCA1 mutation carriers. Human Mol. Genet. 20, 4732–4747 39. Ward, L. D. & Kellis, M. HaploReg: a resource for exploring chromatin states, (2011). conservation, and regulatory motif alterations within sets of genetically linked 11. Maia, A.-T. et al. Effects of BRCA2 cis-regulation in normal breast and cancer risk variants. Nucleic Acids Res. 40, D930–D934 (2011). amongst BRCA2 mutation carriers. Breast Cancer Res. 14, R63 (2012). 40. R Core Team. R: A Language and Environment for Statistical Computing (R Foun- 12. Liu, R. et al. Allele-specific expression analysis methods for high-density SNP dation for Statistical Computing, 2013). microarray data. Bioinformatics 28, 1102–1108 (2012). 41. Auton, A. et al. A global reference for human genetic variation. Nature 526,68–74 13. Xiao, R. & Scott, L. J. Detection of cis-acting regulatory SNPs using allelic (2015). expression data. Genetic Epidemiol. 35, 515–525 (2011). 42. Coetzee, S. G., Coetzee, G. A. & Hazelett, D. J. motifbreakR: an R/Bioconductor 14. Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast package for predicting variant effects at transcription factor binding sites: Fig. 1. tumours reveals novel subgroups. Nature 486, 346–352 (2012). Bioinformatics btv470 https://doi.org/10.1093/bioinformatics/btv470 (2015). 15. Hartman, D. J., Davison, J. M., Foxwell, T. J., Nikiforova, M. N. & Chiosea, S. I. Mutant 43. Xu, H. et al. The CCAAT box-binding transcription factor NF-Y regulates basal allele-specific imbalance modulates prognostic impact of KRAS mutations in expression of human proteasome genes. Biochimica et Biophysica Acta (BBA) - colorectal adenocarcinoma and is associated with worse overall survival. Int. J. Molecular Cell Research 1823, 818–825 (2012). Cancer 131, 1810–1817 (2012). 44. Lees, E., Faha, B., Dulic, V., Reed, S. I. & Harlow, E. Cyclin E/cdk2 and cyclin A/cdk2 16. Soh, J. et al. Oncogene mutations, copy number gains and mutant allele specific kinases associate with p107 and E2F in a temporally distinct manner. Genes Dev. imbalance (MASI) frequently occur together in tumor cells. PLoS ONE 4, e7464 6, 1874–1885 (1992). (2009). 45. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29,15–21 17. Krasinskas, A. M., Moser, A. J., Saka, B., Adsay, N. V. & Chiosea, S. I. KRAS mutant (2012). allele-specific imbalance is associated with worse prognosis in pancreatic cancer 46. McKenna, A. et al. The genome analysis toolkit: a MapReduce framework for ana- and progression to undifferentiated carcinoma of the pancreas. Modern Pathol. lyzing next-generation DNA sequencing data. Genome Res. 20,1297–1303 (2010). 26, 1346–1354 (2013). 47. McLaren, W. et al. The Ensembl variant effect predictor. Genome Biol. 17, 122 18. Liu, J. et al. An integrated TCGA pan-cancer clinical data resource to drive high- (2016). quality survival outcome analytics. Cell 173, 400–416.e11 (2018). 48. Heap, G. A. et al. Genome-wide analysis of allelic expression imbalance in human 19. Sørlie, T. et al. Gene expression patterns of breast carcinomas distinguish tumor primary cells by high-throughput transcriptome resequencing. Human Mol. subclasses with clinical implications. Proc. Natl Acad. Sci. USA 98, 10869–10874 Genet. 19, 122–134 (2009). (2001). 49. Castel, S. E., Levy-Moonshine, A., Mohammadi, P., Banks, E. & Lappalainen, T. Tools 20. Dunnwald, L. K., Rossing, M. A. & Li, C. I. Hormone receptor status, tumor char- and best practices for data processing in allelic expression analysis. Genome Biol. acteristics, and prognosis: a prospective cohort of breast cancer patients. Breast 16, 195 (2015). Cancer Research 9, R6 (2007). 50. Chen, J. et al. A uniform survey of allele-specific binding and expression over 21. Chia, S. et al. Human epidermal growth factor receptor 2 overexpression as a 1000-Genomes-Project individuals. Nat. Commun. 7, 11101 (2016). prognostic factor in a large tissue microarray series of node-negative breast 51. Cerami, E. et al. The cBio cancer genomics portal: an open platform for exploring cancers. J. Clin. Oncol. 26, 5697–5704 (2008). multidimensional cancer genomics data. Cancer Discov. 2, 401–404 (2012). npj Breast Cancer (2022) 71 Published in partnership with the Breast Cancer Research Foundation L. Correia et al. 52. Gao, J. et al. Integrative analysis of complex cancer genomics and clinical profiles the Genomics, Histopathology, and Biorepository Core Facilities at the Cancer using the cBioPortal. Sci. Signaling 6, pl1 (2013). Research UK Cambridge Institute and the Addenbrooke’s Human Research Tissue 53. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, Bank (supported by the National Institute for Health Research Cambridge Biomedical 2078–2079 (2009). Research Centre). 54. Andrews, S. FastQC: A QualityControl Tool for High Throughput Sequence Data [Online]. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/ fastqc/ (2010). AUTHOR CONTRIBUTIONS 55. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina L.C., R.M., J.M.X., S.F.C., and A.T.M. wrote the manuscript. R.M., J.M.X., S.F.C., and A.T.M. sequence data. Bioinformatics 30, 2114–2120 (2014). contributed to the overall design of this study. Data were collected by F.E., A.B., L.M., 56. Carvalho, B. S., Louis, T. A. & Irizarry, R. A. Quantifying uncertainty in genotype R.B., C.C., S.F.C., and A.T.M. Data were analyzed and interpreted by L.C., R.M., J.M.X., calls. Bioinformatics 26, 242–249 (2009). B.P.A., F.E., C.S., I.D., M.E., A.M., I.A.S., J.S., and A.T.M. All authors have read and 57. Gonzalez, J. R. et al. SNPassoc: an R package to perform whole genome asso- approved the final version of the manuscript. ciation studies. Bioinformatics 23, 654–655 (2007). 58. Lawrence, M., Gentleman, R. & Carey, V. rtracklayer: an R package for interfacing with genome browsers. Bioinformatics 25, 1841–1842 (2009). COMPETING INTERESTS 59. Deelen, P. et al. Genotype harmonizer: automatic strand alignment and format The authors declare no competing interests. conversion for genotype data integration. BMC Res. Notes 7, 901 (2014). 60. Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016). ADDITIONAL INFORMATION 61. Purcell, S. et al. PLINK: a tool set for whole-genome association and population- based linkage analyses. Am. J. Human Genet. 81, 559–575 (2007). Supplementary information The online version contains supplementary material 62. Therneau, T. M. & Grambsch, P. M. Modeling Survival Data: Extending the Cox available at https://doi.org/10.1038/s41523-022-00435-9. Model (Springer New York, 2000). 63. Therneau, T. M. A Package for Survival Analysis in R. https://cran.r-project.org/web/ Correspondence and requests for materials should be addressed to Suet-Feung Chin packages/survival/ (2021). or Ana-Teresa Maia. Reprints and permission information is available at http://www.nature.com/ reprints ACKNOWLEDGEMENTS We thank all the patients who donated tissue and the associated pseudo- Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims anonymized clinical data for this project. The authors would also like to thank the in published maps and institutional affiliations. Functional Genomics of Cancer group members at CINTESIS-UAlg for helpful discussions and Vitor Morais at UAIC for administrative support. This work was supported by Portuguese national funding through FCT-Fundação para a Ciência e a Tecnologia, and CRESC ALGARVE 2020, institutional support ALG-01-0145-FEDER- Open Access This article is licensed under a Creative Commons 31477—DevoCancer, ALG-01-0145-FEDER-30895—Intergen, CBMR—UID/BIM/04773/ Attribution 4.0 International License, which permits use, sharing, 2013, CINTESIS R&D Unit—UIDB/4255/2020, POCI-01-0145-FEDER-022184 - Geno- adaptation, distribution and reproduction in any medium or format, as long as you give mePT, the contract DL 57/2016/CP1361/CT0042 (J.M.X.) and individual fellowships appropriate credit to the original author(s) and the source, provide a link to the Creative SFRH/BPD/99502/2014 (J.M.X.) and PD/BD/114252/2016 (F.E.). Funding was also Commons license, and indicate if changes were made. The images or other third party received from the People Programme (Marie Curie Actions) of the European Union’s material in this article are included in the article’s Creative Commons license, unless Seventh Framework Programme FP7/2007-2013/303745 (A.T.M.), and a Maratona da indicated otherwise in a credit line to the material. If material is not included in the Saúde Award (A.T.M.). The METABRIC project was funded by Cancer Research UK, the article’s Creative Commons license and your intended use is not permitted by statutory British Columbia Cancer Foundation, and the Canadian Breast Cancer Foundation BC/ regulation or exceeds the permitted use, you will need to obtain permission directly Yukon. This sequencing project was funded by CRUK grant C507/A16278 and CRUK from the copyright holder. To view a copy of this license, visit http://creativecommons. core grant A16942. The authors also acknowledge the support of the University of org/licenses/by/4.0/. Cambridge, Hutchinson Whampoa, the NIHR Cambridge Biomedical Research Centre, the Cambridge Experimental Cancer Medicine Centre, the Centre for Translational Genomics (CTAG) Vancouver, and the BCCA Breast Cancer Outcomes Unit. We thank © The Author(s) 2022 Published in partnership with the Breast Cancer Research Foundation npj Breast Cancer (2022) 71 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png npj Breast Cancer Springer Journals

Loading next page...
 
/lp/springer-journals/allelic-expression-imbalance-of-pik3ca-mutations-is-frequent-in-breast-7pu5WxKTJ9

References (64)

Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2022
eISSN
2374-4677
DOI
10.1038/s41523-022-00435-9
Publisher site
See Article on Publisher Site

Abstract

www.nature.com/npjbcancer ARTICLE OPEN Allelic expression imbalance of PIK3CA mutations is frequent in breast cancer and prognostically significant 1,10 2,10 2 1,8 2 1,3 2 Lizelle Correia , Ramiro Magno , Joana M. Xavier , Bernardo P. de Almeida , Isabel Duarte , Filipa Esteves , Marinella Ghezzo , 4 5,9 5 5 1 5 Matthew Eldridge , Chong Sun , Astrid Bosma , Lorenza Mittempergher , Ana Marreiros , Rene Bernards , 4,6,7 4,6,11 1,2,11 ✉ ✉ Carlos Caldas , Suet-Feung Chin and Ana-Teresa Maia PIK3CA mutations are the most common in breast cancer, particularly in the estrogen receptor-positive cohort, but the benefitof PI3K inhibitors has had limited success compared with approaches targeting other less common mutations. We found a frequent allelic expression imbalance between the missense mutant and wild-type PIK3CA alleles in breast tumors from the METABRIC (70.2%) and the TCGA (60.1%) projects. When considering the mechanisms controlling allelic expression, 27.7% and 11.8% of tumors showed imbalance due to regulatory variants in cis, in the two studies respectively. Furthermore, preferential expression of the mutant allele due to cis-regulatory variation is associated with poor prognosis in the METABRIC tumors (P = 0.031). Interestingly, ER−,PR−, and HER2+ tumors showed significant preferential expression of the mutated allele in both datasets. Our work provides compelling evidence to support the clinical utility of PIK3CA allelic expression in breast cancer in identifying patients of poorer prognosis, and those with low expression of the mutated allele, who will unlikely benefit from PI3K inhibitors. Furthermore, our work proposes a model of differential regulation of a critical cancer-promoting gene in breast cancer. npj Breast Cancer (2022) 8:71 ; https://doi.org/10.1038/s41523-022-00435-9 INTRODUCTION breast cancer when the remaining wild-type allele was highly expressed . Activating oncogenic mutations are often characterized by gain- Here, we hypothesize that cis-regulatory variation also mod- of-function single-base alterations or focal DNA copy-number ulates the penetrance of oncogenic coding mutations. In the amplification, where the gain of just a single copy of a mutant 1 context of a gene cis-regulated by a genetic variant generating allele is sufficient for tumorigenesis . These gains change the imbalanced allelic expression, we postulate that an oncogenic stoichiometric balance between mutant and wild-type alleles and activating mutation in the same gene will have a different clinical are selected for in cancers, affecting approximately half of all impact depending upon whether it occurs in the preferentially oncogenic driver mutations . Ultimately, they could dictate expressed allele or the less expressed one. We tested this model in prognosis and therapeutic sensitivity. the context of heterozygous mutations in PIK3CA, the most However, the impact of gene dosage differences of oncogenic frequently mutated gene in breast cancer. First, we investigated mutations generated at the gene expression level has been largely whether normal cis-regulatory variation regulated the expression unexplored. Genetic variation and mutations regulate gene of PIK3CA in normal breast tissue. Then, we calculated allelic expression in an allele-specific manner—known as cis-regulatory expression ratios between mutant and wild-type copies in tumors variation —by altering protein and miRNA binding, for example. from two large breast cancer datasets—METABRIC and TCGA— Normal cis-regulatory variation affects most of the human both normalized for DNA copy number or not. Finally, we genome in all tissues and generates the wealth of phenotypic correlated the allelic expression ratios with clinical data. This 3–5 variation seen in species . Moreover, an extensive contribution approach allows us to distinguish between expression imbalances from noncoding variants to RNA alterations was recently observed generated from cis-regulatory variation alone, altered DNA copy 6 7 number, or both mechanisms. in tumors , including allelic imbalance of somatic mutations . Nevertheless, one unsolved aspect is how much each mechan- ism contributes to generating allelic imbalances in expression and whether they do it independently or in synergy. In breast tissue, RESULTS germline regulatory variation is associated with disease risk and Normal cis-regulatory variation affects PIK3CA expression in affects frequently mutated genes . We and others have shown healthy breast tissue that variants affecting the expression levels of BRCA1 and BRCA2 To investigate whether cis-regulatory variation modulates the 10,11 modify the risk of breast cancer in germline mutation carriers . expression of PIK3CA in normal breast tissue, we analyzed data We found that carriers of germline nonsense mutations in the from previous allelic expression analysis of normal breast tissue tumor suppressor gene BRCA2 were at a lower risk of developing from 64 healthy donors . We calculated the ratio of expression of 1 2 Faculty of Medicine and Biomedical Sciences (FMCB), Universidade do Algarve, Faro, Portugal. Center for Research in Health Technologies and Information Systems (CINTESIS), 3 4 Universidade do Algarve, Faro, Portugal. ProRegeM-PhD Program in Mechanisms of Disease and Regenerative Medicine, Universidade do Algarve, Faro, Portugal. Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Robinson Way, Cambridge, UK. Division of Molecular Carcinogenesis, The Netherlands Cancer 6 7 Institute, Amsterdam, The Netherlands. Department of Oncology, University of Cambridge, Cambridge, UK. Cancer Research UK Cambridge Cancer Centre, Cambridge, UK. 8 9 10 Present address: The Research Institute of Molecular Pathology, Vienna, Austria. Present address: DKFZ, Heidelberg, Germany. These authors contributed equally: Lizelle Correia, Ramiro Magno. These authors jointly supervised this work: Suet-Feung Chin, Ana-Teresa Maia. email: suet-feung.chin@cruk.cam.ac.uk; atmaia@ualg.pt Published in partnership with the Breast Cancer Research Foundation 1234567890():,; L. Correia et al. 2.5 1.5 0.5 −0.5 −1.5 −2.5 rs12488074 rs3729679 rs3960984 rs4855093 rs7636454 rs9838411 daeSNPs Fig. 1 Cis-regulatory variation impacts on PIK3CA gene expression in normal breast tissue. AE ratios for six daeSNPs in the PIK3CA gene region, each dot is a heterozygous individual for the corresponding variant indicated in the x-axis, dotted lines delimit the levels of 1.5-fold difference for either allele preferential expression (jj AE ¼ 0:58). Boxplots display the median, the lower and upper hinges corresponding to the first and third quartiles, and lower and upper whiskers corresponding to the smallest and largest values from the 1.5 * IQR (interquartile range), respectively. one allele by the other in heterozygous variant positions, which is rs2699887, which is associated with higher expression of PIK3CA in tumors (Supplementary Fig. 1E). a robust approach to detect cis-acting variant effects, as it cancels out the trans effects that act on the same gene and influence both alleles equally. We found six variants in PIK3CA displaying Preferential expression of the PIK3CA mutated alleles is differential allelic expression (daeSNPs) (see “Methods”) (Fig. 1). frequent in breast tumors Of these six, only rs3729679 is not in strong linkage Changes in DNA copy number in tumors are associated with disequilibrium (LD) with the others (Supplementary Table 1). 1,14–17 changes in gene expression in cis leading to dosage rs3960984 showed the largest proportion of heterozygotes imbalances of coding mutations . However, these differences displaying allelic differences (57%), while three other daeSNPs can also be due to germline and somatic cis-regulatory variation, shared the smallest fraction (14%): rs12488074, rs4855093, but their effect on mutation dosage imbalance is underexplored. and rs9838411. So, we set out to assess whether PIK3CA somatic mutations would In the daeSNPs rs7636454, rs3960984, rs12488074, and have their functional effects, or penetrance, modified by rs9838411, the ratios showed a unilateral distribution, with imbalances in allelic expression generated by regulatory variants. samples displaying preferential expression towards the same We hypothesized that preferential expression of a gain-of-function allele. These patterns of allelic expression ratios’ distribution mutation would have a more substantial clinical impact than those suggest that the daeSNPs at which allelic expression is being occurring in lowly expressed alleles, thus generating intertumor measured and the possible functional regulatory variants (rSNPs) clinical heterogeneity (Fig. 2a). To test this, we carried mutant vs. are in strong, yet incomplete, LD with each other . wild-type allelic expression analysis in breast tumor samples While the mapping analysis carried out to identify candidate carrying somatic PIK3CA missense mutations on two independent rSNPs did not find a significant association after multiple testing sets of data, the METABRIC (n = 94) and the TCGA (n = 178) correction (Supplementary Table 2), one of the variants with projects. Supplementary Table 4 presents a summary description of the two datasets and Supplementary Fig. 2 shows the number, nominal P value ≤ 0.05, rs2699887 (Wilcoxon two-sample test location, and amino acid alterations of the mutations across the estimated difference of 0.22, 95% CI = [0.031-Inf]) (Supplementary two datasets. Fig. 1A), showed great regulatory potential. Namely, it is an eQTL Next, we calculated three allelic ratios from DNA-seq and RNA- (expression quantitative trait locus) for PIK3CA (P = 0.011, Supple- seq data for each mutation: mentary Fig. 1B) in tumors from METABRIC , is located at its promotor region and at a DNAse I hypersensitivity site (Supple- (1) α ¼ log (number of mutant RNA-seq reads/number of wild- mentary Fig. 1C), and is bound by POL2 in a breast cancer cell line type RNA-seq reads), i.e., the net mutant allele expression (Supplementary Table 3). In-silico functional analysis of this variant imbalance; suggested a disruption of the binding motif of the transcription (2) β ¼ log (number of mutant DNA-seq reads/number of factor NF-YA (Supplementary Fig. 1D), and in vitro studies revealed wild-type DNA-seq reads), i.e., the mutant allele relative copy-number; a preferential protein::DNA binding to the minor T allele of npj Breast Cancer (2022) 71 Published in partnership with the Breast Cancer Research Foundation 1234567890():,; AE ratios L. Correia et al. Fig. 2 Mutant allelic imbalance in gene expression of somatic missense PIK3CA mutations is frequent in breast tumors, particularly for preferential expression of the mutant allele. a Schematic representation of the hypothesis: cis-acting regulatory variants (rVar), either from germline or somatically acquired, generate different relative allelic expression ratios of mutant and wild-type alleles, resulting in tumors of different prognosis. b Top: log ratio α, β, and γ 89% credible intervals (CI) in breast tumors. Bottom: CIs collective posterior distribution split according to imbalance. A sample is deemed imbalanced if the CI does not cross zero. Samples with significant imbalance are displayed in red. c Correlation analysis of α vs. β and α vs. γ, showing that both genomic copy-number dosage and allelic expression regulation contribute to imbalances in the expression of mutated alleles in tumors. Point coordinates are Maximum A Posteriori probability estimates (MAP) of the 89% CIs. d Comparison of matched γ and β values, showing predominance of tumors with a preferential allelic expression of the mutated allele. Point coordinates are Maximum A Posteriori probability estimates (MAP) of the 89% CIs. Published in partnership with the Breast Cancer Research Foundation npj Breast Cancer (2022) 71 L. Correia et al. (3) γ = α − β, i.e., the net mutant allele expression imbalance and both copy number and cis-regulatory variation (Fig. 2c), albeit normalized for the DNA allelic copy-number imbalances, with an effect for the copy number over the double the size of which corresponds to a putative mutant allele expression that found for cis-regulatory variation (average Pearson correlation imbalance due to cis-regulation. r = 0.80 and 0.34, respectively). Next, we considered the variance (Var) of the net allelic expression as the sum of the effects of both In this way, α reports on the net allelic expression imbalance, mechanisms, plus the covariance (Cov) accounting for predicted generated by different mechanisms including copy-number non-mutual exclusion of mechanisms acting on any given allele: aberrations, cellularity differences, and cis-regulatory variation, while γ reports specifically on the contribution from cis-regulatory VarðαÞ¼ VarðβÞþ VarðγÞþ 2 Covðβ; γÞ; (1) variation (rVar in Fig. 2a), including normal genetic variation, somatic noncoding mutations, and allelic epigenetic changes. we calculated the contribution of cis-regulatory variation to the Figure 2b displays the distributions of the different ratios. variance of net allelic expression asðÞ VarðγÞþ Covðβ; γÞ =VarðαÞ. We found that net mutant allele expression imbalances (α ratio) Here, we found that cis-regulatory variants explain 20.6% and are frequent in breast tumors, at 70.2% in METABRIC (66 out of 94) 14.4% of the variability of net mutant allelic expression seen in and 60.1% in TCGA (107 out of 178). The same is true for γ ratios, METABRIC and TCGA, respectively (Supplementary Table 5). at 27.7% for METABRIC (26 out of 94) and 11.8% for TCGA (21 out Finally, assessing how the two mechanisms act simultaneously of 178), indicating that cis-regulatory effects acting on mutations on each tumor, we found that the majority of samples (70.2% and are also frequent in breast tumors. In both sets, we found samples 54.5% for the METABRIC and TCGA, respectively) had positive γ with striking net preferential allelic expression for the mutant and negative β values (Fig. 2d), suggesting that although the allele (maximum 44.8-fold and 220-fold in METABRIC and TCGA, mutant allele was in lower genomic quantity, it was nevertheless respectively), but not so for the preferential expression of the wild- preferentially expressed compared to the wild-type allele. Inter- type allele (fold differences of 5.4 and 29 in METABRIC and TCGA, estingly, there were 10.6% and 11.2% samples with positive α and respectively) (Fig. 2b). Similarly, the mutant allele’s most negative β values, in METABRIC and TCGA respectively. This shows pronounced preferential expression trend was found for the γ that these tumors overexpress the mutant allele despite this allele ratio, 10- and 4.2-fold for METABRIC and TCGA, respectively, albeit being in lower copy number. with smaller fold differences between alleles. Only a minor fraction of samples displayed co-occurring Interestingly, we observed that within the samples with preferential allelic expression and a higher allele copy number significant mutant allele expression imbalance due to cis- of the mutant allele (6.38% and 8.43% for the METABRIC and regulatory variation there was a significant prevalence of samples TCGA, respectively). These results were independent of the effect that preferentially expressed the mutated allele in both datasets of tumor cellularity (Supplementary Fig. 3). −8 (binomial test Prob. = 1, 89%−CI = [0.89, 1.00], P = 3×10 for −4 METABRIC and Prob. = 0.90, 89%−CI = [0.73, 0.98], P = 2× 10 Preferential expression of mutant alleles by cis-regulatory for TCGA). variation associates with poor prognosis To investigate the impact of differential cis-regulation of PIK3CA’s Cis-regulatory variants contribute significantly to imbalances mutations on clinical outcome (overall and disease-specific in the expression of mutant alleles survival), we performed univariate survival analysis with γ ratios Next, hypothesizing that both copy number and cis-regulatory categorized in three groups, based on the existence of imbalance variants are the major contributors to allelic expression, we set out and its direction, i.e. whether there was significant predominance of to assess the contribution of each mechanism toward the net expression of the mutated allele γ , of the wild-type allele γ ,or mut wt mutant allele expression imbalances detected in these tumors. balanced allelic expression γ . We uncovered that the group balanced First, we found positive correlations between net allelic expression γ had a poorer disease-specific survival rate (P= 0.031, Fig. 3a) mut Fig. 3 Allelic preferential expression of PIK3CA mutations is associated with survival and clinicopathological parameters in breast cancer. a Kaplan–Meier curve of disease-specific survival showing the worse prognosis of patients with differential expression of the PIK3CA mutations (γ group, shown in blue) compared to those expressing equimolar levels of mutation and wild-type alleles (γ group, shown in red), in mut balanced METABRIC. Shown below the graph are the numbers of patients at risk per group throughout time. b Preferential expression of the mutated allele is associated with ER-negative, PR-negative, and Her2-positive breast tumors. In all graphs, samples were colored according to the significance of the allelic expression imbalance. q values indicated correspond to the Wilcoxon rank-sum test with continuity correction, corrected for multiple testing using the Benjamini & Hochberg method. Survival plots indicate the 95%CI as colored shades. Boxplots display the median, the lower and upper hinges corresponding to the first and third quartiles, and lower and upper whiskers corresponding to the smallest and largest values from the 1.5 * IQR (interquartile range), respectively. npj Breast Cancer (2022) 71 Published in partnership with the Breast Cancer Research Foundation L. Correia et al. than the γ group for METABRIC. The median overall survival considering that PIK3CA is an oncogene, one possibility is that balanced for the γ group was 5.88 years and for the γ group was positive selection could have a role in generating this difference, mut balanced 12.46 years (Supplementary Fig. 4B), whereas, in the disease- which should be further investigated. Furthermore, we also specific analysis, the mean survival of the γ patients was 7.07 found that allelic imbalance in expression observed for the mut years, and 41% of patients died during the length of the follow-up, mutant alleles in the tumors was greater than that observed for in comparison with 25.3% deaths in the γ group (Fig. 3a). single-nucleotide polymorphisms in the normal-matched tissue balanced The categorized γ ratios were not significantly associated with of patients. These findings support the hypothesis of somatic overall survival in the multivariate analysis (Supplementary Fig. 5). regulatory mutations involvement in generating the imbalances However, some of the variables that are usually independent observed in the tumors. While genomic allelic imbalance remains prognosis factors, such as PR and HER2 statuses, were not the largest determinant of allelic expression dosage (showing the significantly associated with survival either in this analysis. In the highest correlation with and contributing the most to the TCGA set, there was a trend toward a worse disease-specific variability observed in net allelic expression), cis-regulatory survival of those patients whose tumors preferentially express the variation is also significantly correlated with net allelic expression mutated allele (Supplementary Fig. 6). However, due to the and explains ~16% of its variability across samples in these sets relatively shorter follow-up time of this dataset (median ~1 year) of tumors. and the fact that tumors were mainly Luminal A (~61.2% of The analysis of RNA-seq data from two independent cohorts of samples) , the power to detect significant differences is smaller tumor samples, the METABRIC and TCGA projects, strongly than that of METABRIC. Nevertheless, the joint analysis of the two supports our findings. datasets showed a significantly worse disease-specific survival of Moreover, we show that preferential expression of the mutant the α group of patients, with a concordant trend in the γ mut mut allele due to cis-variation is associated with poor prognosis group (Supplementary Fig. 7). variables, such as ER-negative, PR-negative, and Her2-positive 20,21 statuses . In the METABRIC dataset, we also found that preferential expression of the mutant allele was associated with PIK3CA preferential mutant allele expression associates with worse overall and disease-specific survival. The high stringency clinicopathological variables in calling imbalance and the focus on a specific type of mutation Next, we sought to investigate whether PIK3CA’s differential in one gene, limits this study in terms of the sample size mutant allele was associated with known prognostic clinicopatho- analyzed, but on the other hand it provides the simplest scenario logical variables, namely hormone receptors (ER, PR) and HER2 for testing our hypothesis. Interestingly, the joint analysis of the amplification, which are directly and indirectly connected to gene datasets revealed some level of association between disease- expression regulation, respectively. specific survival and the preferential expression of the mutant For both datasets, we observed that preferential mutant allele allele, both net and due to cis-regulation, reaffirming the clinical expression driven by cis-regulatory variation (γ) was associated importance of the expression level of a mutation commonly with markers of worse prognosis, namely it was significantly associated with aggressive tumors. In addition, some tumors higher in ER-negative tumors and PR-negative tumors, and in presented preferential expression of the wild-type allele of HER2-positive tumors only in METABRIC (Fig. 3b). When evaluating PIK3CA, suggesting that these mutations are lowly expressed and the contribution of cis-regulatory variation to this association, we possibly passenger events. also found that higher average γ values associated with lower PR Besides the potential use of our findings as a prognosis expression (P = 0.040) and HER2-positive tumors (P = 0.025), but biomarker in the clinic, these results may also have therapeutic we did not find a significant association with ER expression implications. Some of the major clinical challenges in cancer (P = 0.129) (Supplementary Fig. 8). treatment are identifying biomarkers of prognosis and defining Given these results, we took γ into consideration in the survival which patients will benefit from a given therapy. Particularly, it is analysis within the expression subgroups of ER, PR, and HER2, but crucial to identify patients unlikely to respond to specific therapies did not find significant differences in overall and disease-specific to prevent unnecessary drug cytotoxicity without any therapeutic survival in METABRIC (Supplementary Fig. 4). benefits. Our results reveal the importance of considering allelic Considering other known prognostic variables, including expression in somatic mutation screens in these two aspects of tumor size, grade, and molecular subtypes (PAM50 and patient management. Despite the high frequency of PIK3CA IntClust , we found a significant association between γ ratios mutations in breast cancers, the response to PI3K inhibitor therapy and PAM50 subtypes only in METABRIC (q = 0.027) (Supplemen- has been more challenging than expected, and the prognostic tary Table 6 and Supplementary Fig. 9). significance of detecting somatic PIK3CA mutations in breast Finally, we did not find an association between the candidate tumors is unclear . Relevant to this discussion, we have germline regulatory variant rs2699887 and γ or clinical out- previously shown that the presence of PIK3CA mutations confer come, suggesting germline variants are unlikely to be involved in a poorer prognosis in patients with ER-positive breast cancer the significant associations described above (data not shown). only when stratified into copy-number driven subgroups (IntClust However, supporting the involvement of somatic cis-regulatory 1+,2+,9+) . variants instead, we found smaller fold changes and less samples In this study, we provide new evidence for the prognostic with imbalances measured at common PIK3CA variants in normal- significance of these mutations at the expression level in breast matched tissue data than those measured at mutations in tumor tumors. Particularly for tumors with significant preferential tissue (Supplementary Fig. 10). expression of the wild-type allele, this prognostic significance has a potential impact on therapy response and clinical manage- DISCUSSION ment since one may hypothesize that little to no benefit would Our work reveals the role of cis-regulatory variation acting on come from treatment in the cases not expressing the targetable PIK3CA somatic mutations as modifiers of mutation penetrance. mutation. We show for the first time that allelic expression imbalance Further studies evaluating the allelic expression of mutant between PIK3CA’s mutant and wild-type alleles is common and oncogenes in the tumors of patients enrolled in molecular-driven prognostic in breast cancer. trials will clarify this impact. Particularly, preferential expression of the mutant allele is More challenging is determining which cis-regulatory mechan- 11,24 significantly more common than that of the wild-type allele, and isms are promoting allelic expression imbalances. Both inherited Published in partnership with the Breast Cancer Research Foundation npj Breast Cancer (2022) 71 L. Correia et al. 6,25–27 and acquired variants can affect gene expression in an allelic Differential allelic expression (DAE) at the sample level was defined as ∣AE 28,29 ratio∣ ≥ 0.58 (1.5-fold or greater between alleles), based on previous studies manner . 3,36 using microarray data . Variants with at least 10% and three hetero- Here we show that normal cis-regulatory variation regulates zygous samples displaying DAE were further classified as daeSNPs. PIK3CA’s expression in normal breast tissue, with the possible Linkage disequilibrium (LD) between daeSNPs was evaluated using the contribution of rs2699887 as a regulatory variant. We also found genetic variant-centered annotation browser SNiPA . that the heterozygotes for “rs2699887” were associated with higher expression of the PIK3CA gene compared to the common Genotype imputation analysis on normal breast tissue homozygotes. Although there is published data supporting the samples clinical association of rs2699887 with poor prognosis in other 30,31 Illumina Exon 510 Duo germline genotype data from the 64 samples that cancers , linked to an increase in PI3K signaling, there is still 32,33 passed microarrays quality control, were filtered to keep variants with call some data supporting the opposite association . We did not rates ≥85%, minor allele frequency >0.01, and Hardy–Weinberg equili- find an association between rs2699887 and survival, which −5 38 brium with P >1×10 . Next, genotypes were imputed with MACH1.0 opens the possibility for other mechanisms besides normal cis- for all additional known variants on chromosome 3, using as reference regulatory variation to be considered as contributors to the panel the phased CEU panel haplotypes from the HapMap3 release preferential allelic expression in these tumors (data not shown). (HapMap3 NCBI Build , CEU panel —Utah residents with Northern and Double PIK3CA mutations in the same allele are frequent in Western European ancestry), and the recommended two-step imputation process: model parameters (crossover and error rates) were estimated breast tumors , and the impact of noncoding mutations in cancer before imputation using all haplotypes from the study subjects and is just starting to be explored . So, a possibility is that the running 100 Hidden Markov Model (HMM) iterations; then genotypes were combination of noncoding and coding mutations in the same imputed using the model parameter estimates from the previous round. gene might be underlying the allelic expression imbalances we Imputation results were filtered based on an rq score  0:3 , a platform- are detecting. specific measurement of variant imputation uncertainty. Further studies on allelic expression imbalances of activating mutations, and even inactivating ones, should further reveal the Differential allelic expression (DAE) mapping analysis on contribution of cis-regulatory mechanisms in tumor develop- normal breast tissue samples ment and progression. Particularly interesting to determine is Differential allelic expression mapping analysis was performed by whether the coding mutation originates in an allele predis- stratifying AE ratios at each PIK3CA daeSNP according to the genotype posed with higher expression, or whether a sequence of at variants located within ±250 Kb. somatic events introduces the coding activating mutation and A Mann–Whitney test was applied to test if the mean of the absolute AE additional cis-regulatory noncoding mutations. The answers ratios of the heterozygous samples was greater than those of the could have significant repercussions on our understanding of combined reference and alternative allele homozygous samples. Correc- tumor evolution. tion for multiple testing was performed using BH method (p.adjust, R stats 4.0.3 package ) and limiting the significance to q values ≤0.05. In summary, we show that differential expression between the mutant and wild-type alleles of PIK3CA is common in breast cancer and with a significant contribution from allele-specific cis- Functional annotation of DAE mapping associated variants regulatory effects. We further show that mutant allele differential Variants in LD with SNPs with DAE mapping nominal-p-value ≤0.05 were expression is associated with clinical parameters such as ER, PR, retrieved using the function get_ld_variants_by_window from the and HER2 statuses and is prognostically significant. ensemblr R package (https://github.com/ramiromagno/ensemblr) using Collectively, our work establishes the prognostic relevance of the 1000 GENOMES project data (phase_3) for the EUR population and an r > 0.95. These proxy SNPs were assessed for overlap with epigenetic allele-specific transcriptional regulation of PIK3CA somatic muta- marks derived from the Encyclopedia of DNA Elements (ENCODE) and NIH tions. It also supports a shift in the mutation testing in patient Roadmap Epigenomics projects, such as chromatin states (chromHMM) management, where the level of expression of these mutations annotation, regions of DNase I hypersensitivity, transcription factor binding should be considered, besides the detection at the DNA level. sites, and histone modifications of epigenetic markers (H3K4Me1, H3K4Me3, and H3K27Ac) (http://genome.ucsc.edu/ENCODE/) for normal human mammary epithelial cells (HMECs), human mammary fibroblasts METHODS (HMFs), BR.MYO (breast myoepithelial cells) and BR.H35 (breast vHMEC) Subjects and two breast cancer cell lines MCF-7 and T47D. We prioritized variants located on either active promoter or enhancer regions in mammary cell Normal breast and tumor samples were obtained with the written lines, and for which ChIP-Seq data indicated protein binding or position informed consent from donors and appropriate approval from local weight matrix (PWM) scores predicted differential protein binding for ethical committees, with the detailed information described in the 9 14 35 different alleles. Two publicly available tools, RegulomeDB and HaploReg respective original publications: normal tissue , METABRIC , TCGA . v4.1, and the MotifBreakR Bioconductor package, were also used to 39,41,42 evaluate those candidate functional variants . Differential allelic expression analysis DNA and total RNA from 64 samples of normal breast tissue were Electrophoretic mobility shift assay (EMSA) hybridized onto Illumina Exon510S-Duo arrays (humanexon510s-duo), and MCF-7 (ER-positive) and HCC1954 (ER-negative) breast cancer cell lines were data were analyzed as described before . In short, after sample filtering cultured in DMEM and RPMI culture media, respectively, supplemented with and normalization, variants with average RNA log2 allelic intensity values 10% FBS and 1% PS (penicillin and streptomycin). Nuclear protein extracts greater than 9.5 and heterozygous in five or more samples were kept for were prepared using the Thermo Scientific PierceTM NER kit, according to further analysis. the manufacturer’s instructions. Oligonucleotide sequences corresponding Allelic log ratios were calculated for RNA and DNA intensity data: to the C (common) and T (minor) alleles of rs2699887 (5’-AGCGTGAGT log ratio ¼ log ðAÞ log ðBÞ; (2) AGAGCGCGGA[C/T]TGGCCGGTAGCGGGTGCGGTG-3’) were labeled using 2 2 the Thermo ScientificPierce Biotin 3’ End DNA Labelling Kit, according to the for alleles A and B.Next, variants that showed significant differences manufacturer’s instructions. Oligonucleotides with known binding motifs for 43 44 between the RNA log ratios between heterozygous (AB) and homozygous NF-YA and E2F1 were used in competition assays. Undiluted antibodies groups (AA and BB) (two-sample Student’s t test, P value < 0.05) were used for supershift competition assays were NF-YA (H-209) (Santa Cruz selected for differential allelic expression analysis. Biotechnology, SC-10779X) and HMGA1a/HMGA1b (Abcam, ab4078). EMSA Allelic expression (AE) ratios were normalized for allelic DNA content: experiments were performed using the Thermo ScientificLightShiftTM Chemiluminescent EMSA Kit, using the buffer and binding reaction AE ratio ¼ RNA log - ratio  DNA log - ratio (3) conditions previously described . Each EMSA was repeated at least twice npj Breast Cancer (2022) 71 Published in partnership with the Breast Cancer Research Foundation L. Correia et al. for all combinations of cell extract and oligonucleotide, which were also the normalized mutant allele expression ratio, a proxy for the mutant allelic tested in serial dilution amounts. expression imbalance due to cis-regulation alone. Statistical inference of allelic expression imbalances. According to these Breast tumor samples log-ratio definitions, a positive value indicates an imbalance toward the The METABRIC dataset of tumor samples included 2433 samples from the mutant allele, and a negative value an imbalance favoring the wild-type METABRIC project with DNA sequencing data, among which 480 were allele. However, the statistical significance of each log ratio depends on the subjected to a capture-based RNA sequencing study . Sequencing read coverage of each allele, e.g., low read-coverage values are subject to libraries were generated as previously described. In brief, sequencing greater random variation, and hence less reliable log ratios and imbalances libraries using total RNA generated from frozen tissues with a TruSeq estimation. To assign a measuring of uncertainty to our imbalances’ mRNA Library Preparation Kit using poly-A-enriched RNA (Illumina, San estimates, we assumed that the read counts are well modeled by a Beta- Diego, CA, USA) and enriched with the human kinome DNA capture baits Binomial distribution, and following Bayesian reasoning, we estimated 89% (Agilent Technologies, Santa Clara, CA, USA). Six libraries were pooled for credible intervals (CI) and Maximum A Posteriori probability estimates each capture reaction, with 100 ng of each library, and sequenced (paired- (MAP) for the log ratios β, α, and γ (reported in Fig. 2). end 51bp) on an Illumina HiSeq2000 platform. We selected a subset of samples with DNA and RNA sequencing data and PIK3CA missense Allelic expression imbalances in normal-matched tissue data. Solid normal mutations for further analysis. breast tissue from breast cancer female patients was obtained from TCGA- The TCGA dataset comprised 695 samples from TCGA breast cancers , BRCA. We selected 112 samples with RNA-Seq data, obtained in bam file from which we selected a subset of 289 samples with PIK3CA missense format. Sequence data were converted to fastq format (samtools ), mutations for further analysis. Supplementary Table 3 summarizes the underwent initial quality control (FastQC ), and trimming (Trimmo- demographic features and disease characteristics of the two datasets. matic ). Following QC, six samples were removed from analysis. The remaining sequence data was mapped to the reference genome (hg38) DNA-seq and RNA-seq variants calling in tumors using STAR aligner (v.2.7.7a ). Otherwise, alignment, preprocessing and Alignment and preprocessing. Sequence data (FASTQ) mapped to the variant calling was performed as described below. RNA data was filtered to reference genome (hg19) were aligned using STAR v2.4.1 . A two-pass contain only heterozygous variants at the DNA level, circumscribed to alignment was carried out: splice junctions detected in the first alignment PIK3CA’s genomic location. DNA data was accessed from TCGA-BRCA’s run are used to guide the final alignment. Duplicates were marked with microarray raw data for 111 of the 112 initial RNA-Seq samples. Genotypes Picard v1.131 (http://picard.sourceforge.net). Genome Analysis Toolkit were obtained using the CRLMM algorithm (‘crlmm’ R Bioconductor 46 56 (GATK) was used for indel realignment and base quality score recalibration . package, ) and quality controlled for HWE, major allele frequency and 10% missing genotypes (‘SNPassoc’ R package ). Genotypes were lifted over from hg38 to hg37 (‘rtracklayer’ R Bioconductor package ), Variant calling and annotations. SNV and indel variants were called using harmonized (‘GenotypeHarmonizer’ ), and imputed (Michigan Imputation GATK Haplotype Caller. Hard filters using GATK VariantFiltration were 60 61 Server ). Obtained genotypes were quality controlled using PLINK and applied to variants . Variants were annotated with Ensembl Variant Effect lifted over back to hg38. Allelic expression imbalances, equivalent to α Predictor (VEP) . Heterozygous genotypes were called from DNA data to ratios in tumors, were inferred for heterozygous germline variants as avoid RNA editing and other RNA-related variants because true allelic described above. imbalance can lead to heterozygous sites being called homozygous in RNA-based genotype calling. Two-sample tests of imbalance ratios with clinical covariates. Association between allelic expression imbalance ratios and clinical data was achieved Analysis of allelic expression imbalances in tumors by bivariate analysis Wilcoxon rank-sum test with continuity correction or Before the analysis, a set of filtering steps was performed to select samples: Kruskal–Wallis rank-sum test, as indicated in tables and figures. P values (1) presence of missense mutations; (2) and a minimum of 30 reads for were adjusted per study using the Benjamini & Hochberg correction and 48–50 RNA-seq and DNA-seq data . were considered significant when ≤ 0.05. Clinical data for METABRIC were updated from the original studies with the latest available records. Clinical data for TCGA were imported from Correlation analysis. Correlation analysis α vs β and α vs γ ratios for both https://portal.gdc.cancer.gov/ on November 26, 2018. sets of samples were performed using a Pearson’s test. All statistical analysis and data visualization were performed using R. Filtering of tumor samples. For both datasets —METABRIC and TCGA—,a set of quality control criteria were applied to filter the DNA-seq and RNA- seq samples, namely: Survival analyses a. Keep only samples containing PIK3CA missense mutations; Kaplan–Meier plots and multivariate Cox proportional hazard models were b. Keep samples whose coverage at mutated loci is, all together in both used to examine the association between alpha and gamma allelic 48–50 62,63 alleles, at least 30 reads for both RNA-seq and DNA-seq data . expression ratios and survival using the survival package from R . Death due to all causes was used as the endpoint, and all alive subjects were Clinical data of METABRIC patients were updated from the original censored at the date of the last contact. Kaplan–Meier survival curves were studies with the latest available records. The TCGA clinical dataset was 51,52 obtained from cBioPortal on 28 November 2021 by programmatic compared using the log-rank test. access with the R package cgdsr. For the multivariate analysis, Cox proportional hazard model was used to assess the effect of γ on the overall survival. Hazard ratios (HRs) and Allelic expression imbalances in tumor data. Allelic expression imbalances 95% confidence intervals (CI) were estimated by fitting the Cox model are calculated as follows. For each mutated loci, the pair of read counts (X, while adjusting for age and tumor characteristics, such as size, Y), for wild-type (X) and mutant (Y) alleles, respectively, measured either by Scarff–Bloom–Richardson histological grade, clinical stage and estrogen DNA-seq or RNAseq, are transformed using the log ratios β, α, and γ, which receptor (ER), progesterone (PR), and human epidermal growth factor 2 are defined as follows: (HER2) statuses. For the bivariate analysis, Wilcoxon rank-sum two-sample tests were β ¼ log ðY =X Þ; (4) 2 DNA DNA used to compare α and γ between different hormone receptor statuses the DNA mutant allele ratio, which served to control for sequencing and q ≤ 0.05, calculated using the Benjamini & Hochberg method, were artifacts from heterozygous genotypes and to account for differences in considered statistically significant. variant frequencies in DNA; α ¼ log ðY =X Þ; (5) RNA RNA Reporting summary that served as a measure of the net allelic expression imbalance in tumors; Further information on research design is available in the Nature Research γ ¼ α  β; (6) Reporting Summary linked to this article. Published in partnership with the Breast Cancer Research Foundation npj Breast Cancer (2022) 71 L. Correia et al. DATA AVAILABILITY 22. Keegan, N. M., Gleeson, J. P., Hennessy, B. T. & Morris, P. G. PI3K inhibition to overcome endocrine resistance in breast cancer. Expert Opin. Investig. Drugs 27, Microarray raw data are deposited in the Gene Expression Omnibus under accession 1–15 (2018). number GSE35023. Primary data (BAM files) for DNA-seq are deposited at the 23. Pereira, B. et al. The somatic mutation profiles of 2,433 breast cancers refine their European Genome-phenome Archive (EGA) under study accession number genomic and transcriptomic landscapes. Nat. Commun. 7, 11479 (2016). EGAS00001001753 and may be downloaded upon request and authorization by 24. Yan, H. Allelic variation in human gene expression. Science 297, 1143–1143 the METABRIC Data Access Committee. Primary data (BAM files) for RNAseq are (2002). available from the authors upon reasonable request. Primary data (BAM files) for 25. Huang, F. W. et al. Highly recurrent TERT promoter mutations in human mela- DNA-seq and RNAseq from TCGA are deposited in the database of Genotypes and noma. Science 339, 957–959 (2013). Phenotypes (dbGaP) under the study accession number phs000178. 26. Mansour, M. R. et al. An oncogenic super-enhancer formed through somatic mutation of a noncoding intergenic element. Science 346, 1373–1377 (2014). 27. Przytycki, P. F. & Singh, M. Differential allele-specific expression uncovers breast CODE AVAILABILITY cancer genes dysregulated by cis noncoding mutations. Cell Syst. 10, 193–203.e4 The filtered data and code for the analysis of mutant allele expression imbalances (2020). and the survival analysis can be publicly accessed at https://github.com/maialab/ 28. Shoemaker, R., Deng, J., Wang, W. & Zhang, K. Allele-specific methylation is npjbcPIK3CA. prevalent and is contributed by CpG-SNPs in the human genome. Genome Res. 20, 883–889 (2010). 29. Ongen, H. et al. Putative cis-regulatory drivers in colorectal cancer. Nature 512, Received: 2 June 2021; Accepted: 31 March 2022; 87–90 (2014). 30. Morgese, F. et al. Impact of phosphoinositide-3-kinase and vitamin D3 nuclear receptor single-nucleotide polymorphisms on the outcome of malignant mela- noma patients. Oncotarget 8, 75914–75923 (2017). 31. Li, Q. et al. Associations between single-nucleotide polymorphisms in the PI3K- REFERENCES PTEN-AKT-mTOR pathway and increased risk of brain metastasis in patients with 1. Bielski, C. M. et al. Widespread selection for oncogenic mutant allele imbalance in non-small cell lung cancer. Clin. Cancer Res. 19, 6252–6260 (2013). cancer. Cancer Cell 3, 852–862.e4 (2018). 32. Wang, L.-E. et al. Roles of genetic variants in the PI3K and RAS/RAF pathways in 2. Pastinen, T. Cis-acting regulatory variation in the human genome. Science 306, susceptibility to endometrial cancer and clinical outcomes. J. Cancer Res. Clin. 647–650 (2004). Oncol.138, 377–385 (2011). 3. Ge, B. et al. Global patterns of cis variation in human cells revealed by high- 33. Pu, X. et al. PI3K/PTEN/AKT/mTOR pathway genetic variation predicts toxicity and density allelic expression analysis. Nature Genet. 41, 1216–1222 (2009). distant progression in lung cancer patients receiving platinum-based che- 4. Pastinen, T. et al. A survey of genetic and epigenetic variation affecting human motherapy. Lung Cancer 71,82–88 (2011). gene expression. Physiol. Genomics 16, 184–193 (2004). 34. Vasan, N. et al. Double PIK3CA mutations in cis increase oncogenicity and sen- 5. Morley, M. et al. Genetic analysis of genome-wide variation in human gene sitivity to PI3Kα inhibitors. Science 366, 714–723 (2019). expression. Nature 430, 743–747 (2004). 35. Wilkerson, M. D. et al. Integrated RNA and DNA sequencing improves mutation 6. Calabrese, C. et al. Genomic basis for RNA alterations in cancer. Nature 578, detection in low purity tumors. Nucleic Acids Res. 42, e107–e107 (2014). 129–136 (2020). 36. Verlaan, D. J. et al. Targeted screening of cis-regulatory variation in human 7. Rhee, J.-K., Lee, S., Park, W.-Y., Kim, Y.-H. & Kim, T.-M. Allelic imbalance of somatic haplotypes. Genome Res. 19, 118–127 (2009). mutations in cancer genomes and transcriptomes. Sci. Rep. 7, 1653 (2017). 37. Arnold, M., Raffler, J., Pfeufer, A., Suhre, K. & Kastenmüller, G. SNiPA: an inter- 8. Meyer, K. B. et al. Allele-specific up-regulation of FGFR2 increases susceptibility to active, genetic variant-centered annotation browser. Bioinformatics 31, breast cancer. PLoS Biol. 6, e108 (2008). 1334–1336 (2014). 9. Maia, A.-T. et al. Extent of differential allelic expression of candidate breast cancer 38. Li, Y., Willer, C. J., Ding, J., Scheet, P. & Abecasis, G. R. MaCH: using sequence and genes is similar in blood and breast. Breast Cancer Res. 11, R88 (2009). genotype data to estimate haplotypes and unobserved genotypes. Genetic Epi- 10. Cox, D. G. et al. Common variants of the BRCA1 wild-type allele modify the risk of demiol. 34, 816–834 (2010). breast cancer in BRCA1 mutation carriers. Human Mol. Genet. 20, 4732–4747 39. Ward, L. D. & Kellis, M. HaploReg: a resource for exploring chromatin states, (2011). conservation, and regulatory motif alterations within sets of genetically linked 11. Maia, A.-T. et al. Effects of BRCA2 cis-regulation in normal breast and cancer risk variants. Nucleic Acids Res. 40, D930–D934 (2011). amongst BRCA2 mutation carriers. Breast Cancer Res. 14, R63 (2012). 40. R Core Team. R: A Language and Environment for Statistical Computing (R Foun- 12. Liu, R. et al. Allele-specific expression analysis methods for high-density SNP dation for Statistical Computing, 2013). microarray data. Bioinformatics 28, 1102–1108 (2012). 41. Auton, A. et al. A global reference for human genetic variation. Nature 526,68–74 13. Xiao, R. & Scott, L. J. Detection of cis-acting regulatory SNPs using allelic (2015). expression data. Genetic Epidemiol. 35, 515–525 (2011). 42. Coetzee, S. G., Coetzee, G. A. & Hazelett, D. J. motifbreakR: an R/Bioconductor 14. Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast package for predicting variant effects at transcription factor binding sites: Fig. 1. tumours reveals novel subgroups. Nature 486, 346–352 (2012). Bioinformatics btv470 https://doi.org/10.1093/bioinformatics/btv470 (2015). 15. Hartman, D. J., Davison, J. M., Foxwell, T. J., Nikiforova, M. N. & Chiosea, S. I. Mutant 43. Xu, H. et al. The CCAAT box-binding transcription factor NF-Y regulates basal allele-specific imbalance modulates prognostic impact of KRAS mutations in expression of human proteasome genes. Biochimica et Biophysica Acta (BBA) - colorectal adenocarcinoma and is associated with worse overall survival. Int. J. Molecular Cell Research 1823, 818–825 (2012). Cancer 131, 1810–1817 (2012). 44. Lees, E., Faha, B., Dulic, V., Reed, S. I. & Harlow, E. Cyclin E/cdk2 and cyclin A/cdk2 16. Soh, J. et al. Oncogene mutations, copy number gains and mutant allele specific kinases associate with p107 and E2F in a temporally distinct manner. Genes Dev. imbalance (MASI) frequently occur together in tumor cells. PLoS ONE 4, e7464 6, 1874–1885 (1992). (2009). 45. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29,15–21 17. Krasinskas, A. M., Moser, A. J., Saka, B., Adsay, N. V. & Chiosea, S. I. KRAS mutant (2012). allele-specific imbalance is associated with worse prognosis in pancreatic cancer 46. McKenna, A. et al. The genome analysis toolkit: a MapReduce framework for ana- and progression to undifferentiated carcinoma of the pancreas. Modern Pathol. lyzing next-generation DNA sequencing data. Genome Res. 20,1297–1303 (2010). 26, 1346–1354 (2013). 47. McLaren, W. et al. The Ensembl variant effect predictor. Genome Biol. 17, 122 18. Liu, J. et al. An integrated TCGA pan-cancer clinical data resource to drive high- (2016). quality survival outcome analytics. Cell 173, 400–416.e11 (2018). 48. Heap, G. A. et al. Genome-wide analysis of allelic expression imbalance in human 19. Sørlie, T. et al. Gene expression patterns of breast carcinomas distinguish tumor primary cells by high-throughput transcriptome resequencing. Human Mol. subclasses with clinical implications. Proc. Natl Acad. Sci. USA 98, 10869–10874 Genet. 19, 122–134 (2009). (2001). 49. Castel, S. E., Levy-Moonshine, A., Mohammadi, P., Banks, E. & Lappalainen, T. Tools 20. Dunnwald, L. K., Rossing, M. A. & Li, C. I. Hormone receptor status, tumor char- and best practices for data processing in allelic expression analysis. Genome Biol. acteristics, and prognosis: a prospective cohort of breast cancer patients. Breast 16, 195 (2015). Cancer Research 9, R6 (2007). 50. Chen, J. et al. A uniform survey of allele-specific binding and expression over 21. Chia, S. et al. Human epidermal growth factor receptor 2 overexpression as a 1000-Genomes-Project individuals. Nat. Commun. 7, 11101 (2016). prognostic factor in a large tissue microarray series of node-negative breast 51. Cerami, E. et al. The cBio cancer genomics portal: an open platform for exploring cancers. J. Clin. Oncol. 26, 5697–5704 (2008). multidimensional cancer genomics data. Cancer Discov. 2, 401–404 (2012). npj Breast Cancer (2022) 71 Published in partnership with the Breast Cancer Research Foundation L. Correia et al. 52. Gao, J. et al. Integrative analysis of complex cancer genomics and clinical profiles the Genomics, Histopathology, and Biorepository Core Facilities at the Cancer using the cBioPortal. Sci. Signaling 6, pl1 (2013). Research UK Cambridge Institute and the Addenbrooke’s Human Research Tissue 53. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, Bank (supported by the National Institute for Health Research Cambridge Biomedical 2078–2079 (2009). Research Centre). 54. Andrews, S. FastQC: A QualityControl Tool for High Throughput Sequence Data [Online]. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/ fastqc/ (2010). AUTHOR CONTRIBUTIONS 55. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina L.C., R.M., J.M.X., S.F.C., and A.T.M. wrote the manuscript. R.M., J.M.X., S.F.C., and A.T.M. sequence data. Bioinformatics 30, 2114–2120 (2014). contributed to the overall design of this study. Data were collected by F.E., A.B., L.M., 56. Carvalho, B. S., Louis, T. A. & Irizarry, R. A. Quantifying uncertainty in genotype R.B., C.C., S.F.C., and A.T.M. Data were analyzed and interpreted by L.C., R.M., J.M.X., calls. Bioinformatics 26, 242–249 (2009). B.P.A., F.E., C.S., I.D., M.E., A.M., I.A.S., J.S., and A.T.M. All authors have read and 57. Gonzalez, J. R. et al. SNPassoc: an R package to perform whole genome asso- approved the final version of the manuscript. ciation studies. Bioinformatics 23, 654–655 (2007). 58. Lawrence, M., Gentleman, R. & Carey, V. rtracklayer: an R package for interfacing with genome browsers. Bioinformatics 25, 1841–1842 (2009). COMPETING INTERESTS 59. Deelen, P. et al. Genotype harmonizer: automatic strand alignment and format The authors declare no competing interests. conversion for genotype data integration. BMC Res. Notes 7, 901 (2014). 60. Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016). ADDITIONAL INFORMATION 61. Purcell, S. et al. PLINK: a tool set for whole-genome association and population- based linkage analyses. Am. J. Human Genet. 81, 559–575 (2007). Supplementary information The online version contains supplementary material 62. Therneau, T. M. & Grambsch, P. M. Modeling Survival Data: Extending the Cox available at https://doi.org/10.1038/s41523-022-00435-9. Model (Springer New York, 2000). 63. Therneau, T. M. A Package for Survival Analysis in R. https://cran.r-project.org/web/ Correspondence and requests for materials should be addressed to Suet-Feung Chin packages/survival/ (2021). or Ana-Teresa Maia. Reprints and permission information is available at http://www.nature.com/ reprints ACKNOWLEDGEMENTS We thank all the patients who donated tissue and the associated pseudo- Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims anonymized clinical data for this project. The authors would also like to thank the in published maps and institutional affiliations. Functional Genomics of Cancer group members at CINTESIS-UAlg for helpful discussions and Vitor Morais at UAIC for administrative support. This work was supported by Portuguese national funding through FCT-Fundação para a Ciência e a Tecnologia, and CRESC ALGARVE 2020, institutional support ALG-01-0145-FEDER- Open Access This article is licensed under a Creative Commons 31477—DevoCancer, ALG-01-0145-FEDER-30895—Intergen, CBMR—UID/BIM/04773/ Attribution 4.0 International License, which permits use, sharing, 2013, CINTESIS R&D Unit—UIDB/4255/2020, POCI-01-0145-FEDER-022184 - Geno- adaptation, distribution and reproduction in any medium or format, as long as you give mePT, the contract DL 57/2016/CP1361/CT0042 (J.M.X.) and individual fellowships appropriate credit to the original author(s) and the source, provide a link to the Creative SFRH/BPD/99502/2014 (J.M.X.) and PD/BD/114252/2016 (F.E.). Funding was also Commons license, and indicate if changes were made. The images or other third party received from the People Programme (Marie Curie Actions) of the European Union’s material in this article are included in the article’s Creative Commons license, unless Seventh Framework Programme FP7/2007-2013/303745 (A.T.M.), and a Maratona da indicated otherwise in a credit line to the material. If material is not included in the Saúde Award (A.T.M.). The METABRIC project was funded by Cancer Research UK, the article’s Creative Commons license and your intended use is not permitted by statutory British Columbia Cancer Foundation, and the Canadian Breast Cancer Foundation BC/ regulation or exceeds the permitted use, you will need to obtain permission directly Yukon. This sequencing project was funded by CRUK grant C507/A16278 and CRUK from the copyright holder. To view a copy of this license, visit http://creativecommons. core grant A16942. The authors also acknowledge the support of the University of org/licenses/by/4.0/. Cambridge, Hutchinson Whampoa, the NIHR Cambridge Biomedical Research Centre, the Cambridge Experimental Cancer Medicine Centre, the Centre for Translational Genomics (CTAG) Vancouver, and the BCCA Breast Cancer Outcomes Unit. We thank © The Author(s) 2022 Published in partnership with the Breast Cancer Research Foundation npj Breast Cancer (2022) 71

Journal

npj Breast CancerSpringer Journals

Published: Jun 8, 2022

There are no references for this article.