Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Epigenome erosion and SOX10 drive neural crest phenotypic mimicry in triple-negative breast cancer

Epigenome erosion and SOX10 drive neural crest phenotypic mimicry in triple-negative breast cancer www.nature.com/npjbcancer ARTICLE OPEN Epigenome erosion and SOX10 drive neural crest phenotypic mimicry in triple-negative breast cancer 1,2✉ 1 1 1 3 1 Jodi M. Saunus , Xavier M. De Luca , Korinne Northwood , Ashwini Raghavendra , Alexander Hasson , Amy E. McCart Reed , 1 1 1 1 1 4 1 Malcolm Lim , Samir Lal , A. Cristina Vargas , Jamie R. Kutasovic , Andrew J. Dalley , Mariska Miranda , Emarene Kalaw , 1 1 4 5 6 4 Priyakshi Kalita-de Croft , Irma Gresshoff , Fares Al-Ejeh , Julia M. W. Gee , Chris Ormandy , Kum Kum Khanna , 4 4 7 7 7 3,8 Jonathan Beesley , Georgia Chenevix-Trench , Andrew R. Green , Emad A. Rakha , Ian O. Ellis , Dan V. Nicolau Jr , 1 1,9 Peter T. Simpson and Sunil R. Lakhani Intratumoral heterogeneity is caused by genomic instability and phenotypic plasticity, but how these features co-evolve remains unclear. SOX10 is a neural crest stem cell (NCSC) specifier and candidate mediator of phenotypic plasticity in cancer. We investigated its relevance in breast cancer by immunophenotyping 21 normal breast and 1860 tumour samples. Nuclear SOX10 was detected in normal mammary luminal progenitor cells, the histogenic origin of most TNBCs. In tumours, nuclear SOX10 was almost exclusive to TNBC, and predicted poorer outcome amongst cross-sectional (p = 0.0015, hazard ratio 2.02, n = 224) and metaplastic (p = 0.04, n = 66) cases. To understand SOX10’sinfluence over the transcriptome during the transition from normal to malignant states, we performed a systems-level analysis of co-expression data, de-noising the networks with an eigen-decomposition method. This identified a core module in SOX10’s normal mammary epithelial network that becomes rewired to NCSC genes in TNBC. Crucially, this reprogramming was proportional to genome-wide promoter methylation loss, particularly at lineage-specifying CpG- island shores. We propose that the progressive, genome-wide methylation loss in TNBC simulates more primitive epigenome architecture, making cells vulnerable to SOX10-driven reprogramming. This study demonstrates potential utility for SOX10 as a prognostic biomarker in TNBC and provides new insights about developmental phenotypic mimicry—a major contributor to intratumoral heterogeneity. npj Breast Cancer (2022) 8:57 ; https://doi.org/10.1038/s41523-022-00425-x 10–12 INTRODUCTION other sources of heterogeneity are coming to light . For example, cellular heterogeneity is influenced by the differentiation Effective management of triple-negative breast cancer (TNBC) state of the normal cellular precursor(s) , which in TNBC is remains a significant challenge worldwide. These tumours lack 14–17 thought to be the luminal progenitor (LP) cell . expression of oestrogen and progesterone receptors (ER/PR) and ITH is also driven by phenotypic plasticity—the dynamic HER2, hence are not indicated for treatment with classical 10,11 reprogramming of cell state in response to extrinsic stimuli . molecular-targeted agents. Chemotherapy remains the most reliable Cancer cell state transitions can be de-differentiating (the loss of systemic treatment option, producing durable responses in ~60% of lineage commitment and acquisition of stem cell features) and/or patients, while the other ~40% typically present with lung, liver and/ 1–3 trans-differentiating (assuming the state of another cell type) . or brain metastases within 5 years . Second-line chemotherapy Compared to genomic and histogenic sources of ITH, how tumour can temporarily stabilise metastatic disease but is rarely curative, so cells invoke this capability is poorly understood, and yet these patients endure a heavy treatment burden for no lasting potentially more ominous for the patient, as cell state transitions benefit. Efforts to develop alternative treatments have been can be induced by treatment via heritable-epigenetic change. In hampered by molecular and cellular variability between, and within, controlled experimental conditions, drug-tolerant TNBC cell states individual tumours. Intra-tumoural heterogeneity (ITH) directly 19–23 can be averted by epigenome remodelling inhibitors , increases the probability of relapse because it diversifies the 4–7 substrate for clonal selection . It has been proposed that to suggesting these agents might reduce rates of relapse if used 8,11 further improve the prognosis for TNBC patients, we need to clinically . However, epigenetic therapies have genome-wide develop agents that target the drivers of heterogeneity itself . effects, so our ability to use them rationally requires a deeper TNBCs are characterised by defective DNA repair, mitotic understanding of the epigenome-driven features of treatment- spindle dysfunction, chromosomal aberrations, and a mutation refractory human tumours . 4,5 rate around 13 times that of other breast tumours . Genomic SOX10 is a transcription factor that was recently implicated in phenotypic plasticity in experimental models of TNBC .Itis first instability is a key driver of ITH, however only some cases can be explained by the selection of individual driver mutations , and expressed in embryonic neural crest stem cells (NCSCs), where its 1 2 The University of Queensland Faculty of Medicine, UQ Centre for Clinical Research, Herston, QLD, Australia. Mater Research Institute-The University of Queensland, Translational 3 4 Research Institute, Woolloongabba, QLD, Australia. School of Mathematical Sciences, Queensland University of Technology, Brisbane, QLD, Australia. QIMR Berghofer Medical 5 6 Research Institute, Brisbane, QLD, Australia. Breast Cancer Molecular Pharmacology Unit, School of Pharmacy and Pharmaceutical Sciences, Cardiff University, Cardiff, UK. The Kinghorn Cancer Centre, Garvan Institute of Medical Research and St. Vincent’s Hospital Clinical School, UNSW Sydney, Darlinghurst, NSW, Australia. Nottingham Breast Cancer Research Centre, Academic Unit for Translational Medical Sciences, School of Medicine, University of Nottingham Biodiscovery Institute, University Park, Nottingham, UK. 8 9 Mathematical Institute, University of Oxford, and Molecular Sense Ltd, Oxford, UK. Pathology Queensland, Royal Brisbane Women’s Hospital, Herston, QLD, Australia. email: j.saunus@uq.edu.au; s.lakhani@uq.edu.au Published in partnership with the Breast Cancer Research Foundation 1234567890():,; J.M. Saunus et al. self-reinforcing gene regulatory module facilitates multipotency + /CD49f + LP cells, moderate in the EpCAM-/CD49f + basal and cell migration, orchestrating the embryo patterning pro- compartment (myoepithelia and mammary stem cells (MaSCs)) 25–28 cess . Once patterning is complete, SOX10 is silenced in all and low in EpCAM + /CD49f- mature luminal (ML) cells (Fig. 1e). NCSC descendants except glial and melanocyte progenitors; and is SOX10 is epigenetically regulated in mouse mammary 40,41 nascently induced in ectoderm-derived epithelial progenitor cells gland , so we investigated this in human tissue. We isolated 29–33 of the salivary, lacrimal, and mammary glands . In the mouse, hMECs from two fresh RM samples using FACS with antibodies Sox10 is an obligate requirement for mammary gland develop- against CD49f and EpCAM, then performed high-density DNA ment. Its expression marks gland repopulating potential in the methylation array profiling. SOX10 was hypomethylated in LP and −06 basal (myoepithelial) compartment, while Sox10+ luminal cells basal samples (p < 1.0E ; Fig. 1f). Consistently, analysis of hMEC represent the committed progenitor fraction . Functional studies chromatin immunoprecipitation sequencing (ChIP-seq) data from have shown that Sox10 is one of several fate specifiers that six independent RM samples showed the SOX10 locus is regulates the equilibrium between mammary stem cell (MaSC) enriched with activating (H3K4me3, H3K27ac) and depleted of 29,32 and LP states . repressive H3K27me3 marks in LP and basal samples (Fig. 1f). In NCSCs where the genome is unmethylated and accessible, SOX10 facilitates a mesenchymal, migratory state, whereas its SOX10 is associated with poor clinical outcomes in TNBC function in adult tissues is influenced by the tissue-specific growth 43–45 Analysis of TCGA, METABRIC and ICGC breast tumour datasets factor milieu and lineage-specific DNA methylation. Remarkably, showed SOX10 mRNA is expressed almost exclusively in TNBC, ectopic expression of SOX10 reprogrammed postnatal fibroblasts with a bimodal distribution suggesting distinct SOX10 positive with multipotency and migration capabilities equivalent to NCSCs, and negative (+/−) subgroups (Fig. 2a and Supplementary Fig. providing they were also exposed to chromatin unpacking agents 2a). Consistent with other data , SOX10 mRNA is highest amongst and early morphogens (DNA methylation and histone deacetylase TNBCs classified as ‘basal-like, immune-suppressed’ (BLIS), though inhibitors plus Wnt activation) . This established that with the we noted that expression was heterogeneous amongst TNBC erasure of lineage-specific epigenetic marks and appropriate subtypes classified by gene expression profile (e.g. 23% of ‘basal- extrinsic cues, SOX10 can recreate its ‘default’ regulatory circuit like, immune-activated’ (BLIA) TNBCs also had SOX10 levels in the and that this is sufficient to phenocopy NCSCs. top quartile; Supplementary Fig. 2b). In terms of genomic drivers SOX10 expression in human breast cancer is associated with of SOX10 expression in breast cancer, copy-number (CN) TN, basal-like, metaplastic and neural progenitor-like pheno- amplification or gain at the SOX10 locus was evident in ~20% of 4,35–39 types . In transgenic mouse mammary tumour cells, it TNBCs (Fig. 2b) and was associated with higher mRNA levels in promoted invasiveness, expression of mammary stem/progenitor, both METABRIC and TCGA datasets (Fisher’s Exact p ≤ 0.001). EMT and NCSC genes and the repression of epithelial differentia- Analysis of TCGA HM450k methylation array data indicated that tion genes .These findings suggest that SOX10 could mediate SOX10 is frequently hypomethylated in TNBC (Fig. 2b) and that de-differentiation in TNBC; but the relevance is unclear, particu- this correlates strongly with expression (Fig. 2c and Figs. S2c, d), larly given there are no available inhibitors of SOX10 itself. We but does not extend to adjacent genes on chromosome 22 explored the significance of SOX10 in breast cancer development (Fig. 2d). Hence, like normal basal and luminal progenitor cells, and progression by immunophenotyping histologically normal gene-specific hypomethylation also underpins SOX10 expression breast tissue, and large breast tumour sample cohorts. To in a subset of TNBCs, and in some cases, this appears to be understand its contribution to phenotypic plasticity and identify reinforced by clonally selected CN gains. drivers of this capability, we performed systems-level analysis to Analysing published cell line gene expression and methylation map SOX10’s regulatory circuit in the broader TNBC transcrip- 46,47 48,49 array datasets and our cell line bank , we found that in tional network. contrast to tumours, TNBC cell lines express very low to undetectable levels of SOX10, and the SOX10 gene is hypermethy- lated (Fig. S2e, f). shRNA-mediated depletion of SOX10 in one of RESULTS the few positive lines (HCC1569) resulted in 100% cell death SOX10 is expressed in luminal progenitor cells of the human within a few passages (Supplementary Fig. 2g). mammary gland Next, we performed IHC studies to investigate the prognostic Functional studies have shown that SOX10 marks stem and significance of SOX10 expression at the protein level. Surveying a 29,32 luminal progenitor (LP) cells of the mouse mammary gland , large, cross-sectional cohort of invasive primary breast tumours but its expression pattern in the human breast has not been from Australia and the UK (n = 1330), we detected SOX10 almost established. Therefore, we performed immunohistochemical (IHC) exclusively in tumour cell nuclei of TN cases (Fig. 2e; see analysis of 19 histologically normal reduction mammoplasty (RM) Supplementary Table 2 for cohort characteristics). Approximately samples using a validated antibody (Supplementary Fig. 1a and 38% of TNBCs were classified as SOX10+, and another 11.5% Supplementary Table 1). SOX10 was detected in nuclei of ductal exhibited heterogeneous staining (see Fig. 2e and Supplementary and lobular epithelia, with individual terminal ducto-lobular units Fig. 2h for scoring thresholds). SOX10 positivity was associated (TDLUs) exhibiting either basal-restricted or combined baso- with histologic features typical of this group, such as high grade, luminal expression (Fig. 1a). Compared to ducts, lobules were metaplastic and medullary morphology, pushing margins and a more likely to exhibit luminal compartment expression of SOX10 larger size at diagnosis (Supplementary Table 2). Similar, though (Fig. 1b), consistent with a role in lobulogenesis. Indeed, TDLUs statistically weaker trends were found between these variables with basal-restricted SOX10 expressed high levels of luminal and heterogeneous SOX10 staining (Supplementary Fig. 2i). cytokeratins (CK)8/18, while TDLUs with dual-compartment SOX10 Rather than a simple correlate of the TN phenotype, SOX10 had low CK8/18. This was evident even in neighbouring structures positivity stratified TNBC-specific survival in both univariate (Fig. 2f of the same specimen (Fig. 1c and Supplementary Fig. 1b). and Supplementary Fig. 2j) and multivariate regression analyses, IHC analysis of serial sections showed SOX10+ luminal cells with a prognostic value greater than clinicopathologic indicators lacked ER and were positive for the LP marker c-Kit, with no used in current clinical practice: tumour size, grade, and the obvious relationship to proliferation marker Ki67 (Fig. 1d). We also density of tumour-infiltrating lymphocytes (TILs) (hazard ratio 1.8- analysed SOX10 mRNA in a published dataset from FACS-sorted 2.5; p = 0.02–0.002; Supplementary Table 2). Increased propensity human mammary epithelial cells (hMECs) . SOX10 levels were for brain metastasis is one of the factors underlying premature similar to established LP markers ELF5 and KIT: highest in EpCAM death in TNBC, so we also analysed patient-matched pairs of npj Breast Cancer (2022) 57 Published in partnership with the Breast Cancer Research Foundation 1234567890():,; **** **** **** **** **** **** **** **** ** J.M. Saunus et al. (i) basal (ii) basal + luminal c (i) (ii) SOX10 + (i) SOX10-het SOX10 – SOX10 (i) (ii) (iii) SOX10 0 50 100 50 100 % lobules % ducts CK8/18 (ii) lobules ducts c-kit 025 50 75 100 % SOX10+ lum cells merge Basal LP ML indistinct EGFR ESR1 CK8 CK18 SOX10 KIT ELF5 CK14 CK5 ER DNAme 1.00 0.50 0.00 H3K27me3 30 H3K27ac Basal Ki67 LP 10 ML Indistinct H3K4me3 0 exons UTR SOX10 TSS intron 383.7 383.8 383.9 Chrom 22 position (hg19 Mbp) Fig. 1 SOX10 is expressed in basal and luminal progenitor cells of the human mammary gland. a Representative SOX10 IHC analysis of reduction mammoplasty (RM) samples. Some terminal ducto-lobular units (TDLUs) had exclusive basal compartment expression (i) while others had expression in both basal and luminal compartments (ii). b (i) Analysis of SOX10 expression in ducts vs lobules of RM samples from 19 donors (whole sections). (ii) SOX10 expression in lobules was heterogeneous and more likely to occur in the luminal compartment (Mann–Whitney p = 0.011; n = 102 ducts and 102 lobules; median ± 95% confidence interval shown). c Representative immunofluorescent staining of SOX10 and CK8/18. Circled lobules and isolated cells (arrows) exhibited reciprocal expression of SOX10 (green) and CK8/18 (red) in structures with either (i) dual compartment (ii) or basal-restricted SOX10 expression. d IHC analysis of SOX10, c-kit, ER and Ki67 in serial RM sections. The three magnified regions represent major SOX10 staining patterns: (i) dual compartment, heterogeneous; (ii) dual compartment, homogeneous; and (iii) basal-restricted. Luminal SOX10 expression was directly associated with c-kit and inversely associated with ER, with no obvious relationship to Ki67 (e.g., cell cluster indicated with an arrow). e SOX10 mRNA levels in FACS-sorted human mammary epithelial cell (hMEC) subtypes . Differentiation markers were analysed for comparison: basal markers CK14 and CK5; luminal progenitor (LP) markers KIT and ELF5; and markers enriched in mature luminal (ML) cells: CK18 and ESR1 (isolates with significantly different marker levels according to paired ANOVA tests are indicated and colour-coded: ****p < 0.00001; ***p < 0.0001; **p < 0.001). Data shown were means ± standard error of the mean from three donors. f Average methylation beta-values of SOX10 probes in FACS-sorted hMEC samples (DNAme), aligned with histone modification signals in a published ChIP-seq dataset : H3K4me3, H3K27ac (activating) and H3K27me3 (repressive). Data were represented to scale on human chromosome 22. TSS transcription start site, UTR untranslated region. Indistinct = negative for CD45 (hematopoietic cells), CD31 (endothelia), CD140b (fibroblasts), EpCAM and CD49f (epithelia). Published in partnership with the Breast Cancer Research Foundation npj Breast Cancer (2022) 57 RM samples (n=19) Log z-score Signal J.M. Saunus et al. Fig. 2 Expression of SOX10 in human breast cancer. a Bimodal expression of SOX10 in TNBC compared to other breast cancers (nonTNBC) in the METABRIC cohort. b Frequency of copy-number alterations (CNAs) and DNA hypomethylation affecting SOX10 in TNBC and nonTNBC compared to the archetypal SOX10 + malignancy, melanoma (SKCM; TCGA datasets). c Correlation between SOX10 methylation and expression (normalised RNAseq counts) in SKCM, TNBC and nonTNBC (Spearman correlation coefficients (r) and p values are shown; derived from TCGA data). d Proportions of TNBC and nonTNBC cases with hypomethylation at each probe across the SOX10 locus (as defined in (b)). e Representative IHC showing SOX10-neg, heterogeneous and nuclear-positive (+) TNBCs. Tumours with absent or very weak nuclear staining in ≥50% of tumour cells were classified as SOX10-negative, while those with any one of replicate TMA cores exhibiting moderate-strong nuclear staining in <50% OR weak-moderate nuclear staining in ≥50% of tumour cells were classified as heterogeneous (see also Supplementary Fig. 2h). Survival curves of heterogeneous and negative categories overlapped (Supplementary Fig. 2j) and hence are grouped together here. f Kaplan–Meier analysis of the relationship between SOX10 nuclear positivity and breast cancer-specific survival (BCSS) in cross- sectional TNBCs. Log-rank test p value and hazard ratio (HR) are shown (95% confidence interval). g Kaplan–Meier analysis of the relationship between SOX10 nuclear positivity and BCSS in TNBCs classified as metaplastic breast cancers. Gehan–Breslow–Wilcoxon test p value shown. h SOX10 expression in brain-metastatic TNBC and matching brain metastases (BrM), compared to the frequency in cross-sectional TNBCs (Chi- square p value shown). primary TNBCs and brain metastases (n = 19 pairs). Compared to Considering all our IHC study findings, we concluded that cross-sectional TNBCs, SOX10 was over-represented in brain- strong nuclear expression of SOX10 is associated with TNBC metastatic cases, with SOX10 status concordant in ~90% of progression. matching brain tumours (Fig. 2h). Consistent with previous 37,50 reports , we also detected nuclear SOX10 in an independent SOX10’s TNBC regulatory module confers transcriptomic cohort of metaplastic breast cancers (MBC; Asia-Pacific Metaplastic 51 similarity to NCSCs Breast Cancer consortium ). Compared to cross-sectional cases, To investigate the basis of SOX10’s association with poor patient SOX10 staining was more heterogeneous in MBCs, and was not associated with TN status (Supplementary Fig. 2k); but was outcomes, we compared the expression profiles of TNBCs prognostic amongst MBCs with a TN phenotype (Fig. 2g). expressing high versus low levels of SOX10 mRNA and found that npj Breast Cancer (2022) 57 Published in partnership with the Breast Cancer Research Foundation J.M. Saunus et al. high SOX10 tumours were significantly enriched with the expression of mesenchymal, neural, and glial development genes (Supple- mentary Fig. 3 and Tables S3, S4). We then mapped SOX10’s regulatory neighbourhood within the breast cancer transcriptome using weighted gene co- expression network analysis (WGCNA). This approach quantifies co-variation in gene expression across a biological sample set to identify genes with highly coordinated regulation, which is 52,53 indicative of functional relatedness . We built a network from TCGA breast cancer RNAseq data (n = 919 cases) and validated it with datasets from METABRIC (n = 1278, expression array) and ICGC (n = 342, RNAseq). In this model, all genes expressed above a background threshold are connected (12,588 genes, 12,588 connections). The connection between each gene pair is based on a weighted correlation coefficient, and unsupervised cluster- ing can reveal groups of genes with a high probability of co- functionality (modular transcription programmes). The module eigengene (ME) is a centroid calculated for each module in each sample that represents both module expression and net connection strength. WGCNA partitioned ~20% of expressed genes into eight consensus modules that align with established hallmarks of breast cancer; for example, an ER/FOXA1-driven module expressed in luminal tumours, and a mitotic instability module in basal-like and luminal-B tumours (Table 1, Fig. 3a, Tables S5–S8 and Supp File 2). The remaining ~80% of genes were not linked to any one module. SOX10 was identified as one of the most interconnected genes in the ‘green’ module, which has a hierarchical structure (Fig. S4a, b) and is predominantly expressed in high-grade TNBCs (Supple- mentary Fig. 4c). In this module, SOX10’s co-expression profile was highly similar to genes implicated in Wnt signalling, neuroglial differentiation and embryo patterning (Fig. 3b). We named it the SOXE-module and ascribed ‘multipotency’ as its primary ontology, as the member gene list is enriched with developmental phenotypes, and includes all three SOXE family members (SOX8/ 9/10) and embryonic stem cell genes (LMO4, POU5F1) (Fig. 3c and Supplementary Table 9). IHC analysis of six other module members confirmed that their co-expression in TNBC holds true at the protein level (Fig. 3d), with staining often observed in the same cells within individual tumour-rich tissue cores (Fig. 3e). Consistent with the defining features of TNBCs—de-differentiation, genomic instability, high mitotic index and the presence of TILs—TNBCs express variable proportions of primarily three modules: green (SOXE), blue (mitotic instability) and yellow (TILs) (Fig. 3f). Kaplan–Meier analysis showed that cases expressing high levels of both SOXE and mitotic instability modules had shorter survival compared to those with predominant expression of one or the other, while co- expression of the yellow module was associated with better prognosis, consistent with the protective effect of TILs in TNBC (Fig. 3g and Supplementary Fig. 4d). The SOXE-module represents the shift from a luminal progenitor to an NCSC-like state Ontology analysis showed that the SOXE-module includes genes typically expressed in differentiating glia, cardiomyocytes, and odontoblasts, which all descend from NCSCs. In fact, develop- mental genes comprised a large proportion of SOXE-module hubs (genes with the highest network connectivity and centrality values; Fig. 4a and Supplementary Table 10), hence representing points of maximal module vulnerability. These include cell-fate regulators ELF5, FOXC1 and SOX10; Wnt/β-catenin signalling genes SFRP1, MAML2 and TRIM29; and embryonic cell migration and neuronal development genes RGMA, ROPN1, ROPN1B, MID1 and APCN. To directly investigate if the SOXE-module is associated with NCSC phenotypic mimicry, as has been reported for Sox10 in Published in partnership with the Breast Cancer Research Foundation npj Breast Cancer (2022) 57 Table 1. Key features of eight predominant gene co-expression modules extracted by WGCNA. a a b Modules Major functional ontologies Signalling pathways /intrinsic activators Size (no. genes) Top ten hub genes (Highest kWithin; see Supplementary Table 5) Tumour-centric Blue Mitotic instability FOXM1, MYBL2 1239 TPX2, BUB1, CEP55, HJURP, NCAPH, KIF4A, KIF2C, CCNB2, NCAPG, FOXM1 Green Multipotency (SOXE) Wnt signalling 487 ROPN1, SFRP1, FOXC1, RGMA, GABRP, CHST3, MAML2, APCN, ROPN1B, SOX10 Brown Primary cilium ER, FOXA1 1008 FOXA1, MLPH, ESR1, AGR3, XBP1, THSD4, GATA3, CA12, PRR15, ZMYND10 Tumour-stromal Magenta ECM-1 (structural) FBN1, RUNX2 186 COL5A2, COL1A2, COL3A1, COL5A1, COL6A3, FAP, THBS2, COL1A1, LUM, VCAN Black ECM2 (regulatory) – 207 OLFML1, RECK, FSTL1, DCN, MSRB3, ECM2, CCDC80, TCF4, ZEB1, GLT8D2 Red Fatty acid metabolism PPARγ 274 DIA1R, PDE2A, LHFP, LDB2, ARHGEF15, S1PR1, SDPR, EBF1, CD34, ERG Tan Type-I IFN response STAT1, IRF9 33 IFIT3, OAS2, CMPK2, IFI44L, IFI44, IFIT1, MX1, OASL, IFIT2, RSAD2 Stromal Yellow Adaptive immunity (TILs) CD40L, CD40, IFNγ, IRF1 712 SASH3, IL2RG, CD53, PTPN7, CD48, CD2, CD3E, ARHGAP9, CD5, CD3D, SIT1, SH2D1A ECM extracellular matrix. Gene set enrichment analysis (GSEA) of all BRCA genes ranked according to module eigengene correlation (Supplementary Table 9). Ingenuity pathways analysis upstream regulator prediction (p ≤ 1.0E-07) based on kWithin values for module genes. J.M. Saunus et al. Fig. 3 SOX10’s regulatory network is associated with multipotency, cell migration and poor prognosis in TNBC. a Relative expression of eight predominant transcription modules in human breast tumours, according to the PAM50 subtype (TCGA dataset). b SOXE-module co- expression profile similarity matrix, clustered to highlight genes with very highly coordinated expression. The similarity is based on cosine distance and has a maximum value of 1. SOX10 mapped to one of six module sub-clusters, the members of which are shown to the right of the matrix. See also Supplementary Fig. 4a, b. c Summary of results from unsupervised gene set enrichment analysis of the breast cancer transcriptome after ordering transcripts according to their correlations with SOXE-module expression (denoted by the ME value, TCGA dataset). d Tile plot showing overlapping expression of SOXE-module representatives. For each protein, significant co-expression with ≥2 other module members is indicated by a Fisher’s exact test result (*p < 0.05; ***p < 0.001; ****p < 0.0001). Refer to Supplementary Table 1 for scoring criteria. e IHC staining of representative SOXE-module nodes in serial sections from the same tumour. f Proportional expression of all eight modules (coloured as for (a)) in TNBCs annotated with PAM50 and TNBC subtypes (METABRIC dataset; LAR luminal androgen receptor- like, MES mesenchymal, BLIS basal-like immune-suppressed, BLIA basal-like immune-activated ). g Kaplan–Meier analysis of METABRIC TNBCs expressing different proportions of the three predominant TNBC modules. BCSS breast cancer-specific survival. ME fraction thresholds for classifying cases as high or low were 0.33 for SOXE/blue and 0.1 for yellow. 24 55 mouse mammary tumour cells , we performed expression and (‘ch.NCSC’; n = 200 genes) , representing Sox10’s most primitive enrichment analyses using two independent genesets: (1) 308 transcription programme (Supplementary Table 11). Except for genes represented in at least two of the 78 terms matching ‘neural SOX10, SOX8 and LMO4, there is minimal overlap between the crest’ in the gene ontology database (‘NC terms’); and (2) SOXE-module and these genesets (Fig. 4b), but their expression is transcripts specific to migratory, Sox10+ NCSCs in chick embryos strongly correlated (Fig. 4c). This was confirmed by geneset npj Breast Cancer (2022) 57 Published in partnership with the Breast Cancer Research Foundation J.M. Saunus et al. Fig. 4 The SOXE-module drives the transition from normal mammary epithelial stem/progenitor to NCSC-like phenotypic states. a Influence of SOXE-module genes over network architecture and information flow. kWithin: intramodular ‘connectivity’ based on weighted correlations with all other module genes; Eigencentrality: considers the connectivity of each node’s nearest neighbours as an indicator of ‘local influence’; Betweenness centrality: ‘conductivity’ based on each node’s position along the shortest paths between other nodes (genes with high betweenness are information conduits). Key hub genes are indicated (see Supplementary Table 10 for the full dataset). b Chick (ch.)NCSC and neural crest (NC) terms genesets are largely independent of each other and from the SOXE-module. c Correlations between SOXE-ME values and NCSC genesets (singscore values) in TNBC (n= 106 TCGA cases with tumour cellularity ≥0.6). Correlation coefficients (r) and p values are shown. d GSEA using three TNBC gene expression datasets (ICGC, METABRIC, TCGA). Normalised enrichment scores (NES) and corrected p values (q) shown. e Overlap between members of the SOXE-module and SOX10’s normal breast module (from de novo module identification on n = 97 TCGA normal breast samples; Supplementary Table 12). Generic ontology enrichment results are summarised (full GO term lists in Supplementary Table 13). f Comparison of network structure and information flow metrics (as for (a)) between shared and SOXE- module-exclusive genes. Groups were compared using Mann–Whitney tests (**p = 2.4E-03; ***p = 5.6E-04). Boxes show the 10–90th percentiles and median, with whiskers extending to the minimum and maximum values. Mean is indicated with ‘+’. g Model depicting the mammary epithelial progenitor gene regulatory network core being sustained through transformation and rewired as the SOXE-module in TNBC. Shared hub genes are listed. enrichment analysis (GSEA; Fig. 4d). Hence, the SOXE-module TNBC’s normal precursors are comparatively more important to confers transcriptomic similarity to NCSCs. the SOXE-module’s regulatory structure. Together, these data Since several SOXE-module genes (e.g. SOX10, SOX9, LGR6 and suggest that SOXE-module and its associated NCSC-like pheno- ELF5) are key regulators of normal hMEC states , we hypothesised type arise because a core set of epithelial differentiation and that the SOXE-module might evolve from the deregulation of a adhesion genes becomes rewired during TNBC development (Fig. 4g). lineage differentiation programme expressed in TNBC’s normal cellular precursors. Module preservation analysis using RNAseq Genomic and epigenomic determinants of the NCSC-like data from TCGA normal breast samples indicated that the SOXE- transcriptional shift in TNBC module does not exist as an interconnected unit in the normal To address the central question of what drives this transcriptomic breast transcriptome (Supplementary Fig. 4e). But after perform- shift, we analysed case-matched gene copy-number (CN), RNAseq ing de novo WGCNA module identification on this dataset and WGCNA data (TCGA cases). Candidate module drivers were (Supplementary Table 12), we found that SOX10’s normal breast defined as those for which both CN and expression correlated module overlaps with the TNBC-specific SOXE-module signifi- significantly with SOXE-ME values. About 182 genes met these cantly more than expected by chance (Fig. 4e; 109 shared genes, −26 criteria (130 gains and 52 losses), of which 140 (77%) are part of Chi-square p = 2.8E ). large chromosomal alterations: 6p21-22 (gained/amplified in Both ‘normal-exclusive’ and ‘shared’ genes were enriched with 56.7% of TNBC cases), 8q22-24 (gained/amplified in 78.7%), epithelial differentiation ontologies, with cell adhesion distinctly 9q34 (lost in 59.6%) (Supplementary Fig. 5a). SOXE-module genes over-represented in the shared set (Fig. 4e and Supplementary were over-represented amongst the positively correlated genes Table 13). According to network influence metrics, the shared high (25/130 (19.2%) and had increased CN and expression in SOXE genes were significantly more important to the SOXE-module than −31 TNBC; ChiSq p = 9.7E ; Fig. 5a). However, network influence SOXE-exclusive genes (Fig. 4f and Supplementary Fig. 4f). This suggests that while SOXE-exclusive genes are primarily respon- metrics for these 25 were no higher than other module genes sible for conferring NCSC-like attributes, genes ‘inherited’ from (Fig. 5b). Hence, the SOXE-module may be augmented by Published in partnership with the Breast Cancer Research Foundation npj Breast Cancer (2022) 57 J.M. Saunus et al. Yes No subst. c d Increased CN correlated 1387 16,307 sig-13: APOBEC 0.3 indels PAM50 with SOXE ME (p<0.01)? 7.8% 92.2% basal-like her2 sig-1 luminal A Expression also correlated 130 1257 0.2 rearrangements sig-3: HR-deficiency luminal B HRDetect with SOXE ME (p<0.01)? 9.4% 90.6% sig-8 normal-like tSNE-x RS3 RS2 sig-5: unknown Tx factors SOXE module RS5 25 105 Chromatin remodelling 0.1 member gene? 19.2% 80.1% SOX-E ME DNA repair RS4 max RS6 RS1 0.0 connectivity local influence conductivity min tSNE-x ns ns sig-2 sig-17 ns 1.0 4 -0.1 0.8 Median meth max (0.66) -log p (t-test) 0.6 0.4 min (0.15) tSNE-x -2 0.2 -4 -2 0.0 0.5 CGI shore shelf open sea CNA-correlated (n=25 genes) IGR av. meth-β correlation not correlated (n=463 genes) 3'UTR with SOXE eigengene 0.4 -0.3 gene body **** 5'UTR **** -0.5 0.3 TSS200 **** r -0.72 -0.7 TSS1500 Genome-wide promoter meth r 0.419 -13 f p 3.5E correlation with module exp 0.2 -0.03 0.00 0.03 0.06 -0.8 0.8 SOXE-ME Hypomethylated connectivity local influence conductivity clusters * ** * ns ns ns * *** ** 1.0 Developmental 0.8 ontology enrichment 2 0.6 0.4 meth-clust SOXE-mod genes -2 0.2 a n=106 b n=40 c n=35 -4 0.0 -2 other n=306 Fig. 5 The SOXE-module is driven by the erosion of lineage-specific epigenetic marks. a Decision tree for identifying candidate copy- number alteration (CNA) drivers of the SOXE-module. Of 17,694 genes with case-matched GISTIC, RNAseq and WGCNA data, CN, and expression of 130 correlated with the SOXE-module in TNBC, including 25 SOXE-module nodes. b Network influence metrics for SOXE-module nodes coloured according to candidate CN driver status (intramodular connectivity (kWithin), local influence (Eigencentrality) and conductivity (betweenness centrality) defined in Fig. 4a). Boxes show the 10–90th percentiles and median, with whiskers extending to the minimum and maximum values. Mean is indicated with ‘+’. No significant differences by ordinary ANOVA test. c Relationship between SOXE- module levels and mutation signatures in ICGC TNBCs (COSMIC v2 SigProfiler and HRDetect on n = 74 ICGC TNBCs) . Associations are depicted according to the correlation between SOXE-ME values and signature event count (y-axis); and by the significance of average SOXE- ME differences between ICGC TNBCs with low (quartile-1) vs higher (quartile 2–4) signature burden. d t-Distributed stochastic neighbour embedding (t-SNE) visualisation of genome methylation profile similarities amongst cases in the BRCA-TCGA 450k methylation array dataset. Panels are coloured according to PAM50 intrinsic subtype, SOXE-ME values or global median methylation-b values. Circled cases are epigenetically divergent, basal-like TNBCs that express high levels of the SOXE-module and have eroded methylomes. e Correlation analysis summary showing relationships between SOXE-ME values and region-specific methylation (n = 75 TCGA TNBCs, tumour cellularity ≥0.6; n = 215,323 probes after quality filtering); ****p < 1.0E-07. CGI CpG island, IGR intergenic region, TSS transcription start site, UTR untranslated region. Solo-WCpGW: consensus sequence for late-replicating loci demethylated via replicative senescence. f Unsupervised clustering of the BRCA-TCGA 450k methylation dataset according to ME correlation. Data shown were minimum correlation coefficients of ME values versus gene-averaged methylation-b data from promoter region probes (TSS1500, TSS200 and 5′UTR). Of three clusters inversely correlated with SOXE-module expression, two (a, b) were enriched with developmental ontologies (Supplementary Table 14). g Network influence metrics for SOXE-module genes in the hypomethylated clusters versus other SOXE-module genes, as for (b). Ordinary ANOVA p values: *p < 0.05; **p < 0.01; ***p < 0.001; ns not significant. increased CN of some of its component genes, but this seemed breaks (DSBs) and genome editing (sig-3: HR deficiency; HRDetect; unlikely to be an early or dominant driver of module evolution. sig13: APOBEC; Fig. 5c). Next, we investigated whether mutational processes that shape APOBEC activity and DSB repair are both indirectly demethylat- the breast cancer genome could be involved. To this end, we ing. For example, 5-methyl cytosine (5mC) loss occurs because of utilised case-matched mutational signature and WGCNA data for APOBEC-mediated genome editing and/or during the repair of 45,57 the ICGC cohort . There were direct relationships between the edited bases, and DSB repair has been causally linked to the 58,59 SOXE-module and overall mutation burden (substitutions and progressive loss of 5mC during cellular ageing . Therefore, we small insertion-deletion (indels)), as well as specific signatures of hypothesised that the evolution of the SOXE-module in TNBC may genome instability (rearrangement sigs (RS)3 and RS5), homo- be related to epigenetic dysregulation. Consistent with this idea, logous recombination (HR)-directed repair of double-strand DNA the 105 CN-driven SOXE-module correlates (i.e., those not part of npj Breast Cancer (2022) 57 Published in partnership with the Breast Cancer Research Foundation log kWithin brown black red eigencentrality magenta tan yellow blue SOXE log betweenness Correlation Mutation signature vs SOXE ME promoter log2 kWithin eigencentrality TSS1500 meth.β (case mean) log10 betweenness tSNE-y tSNE-y tSNE-y J.M. Saunus et al. the SOXE-module itself; Fig. 5a) were enriched with a transcription factor, chromatin remodelling and DNA repair genes (Fisher’s Exact p < 0.001). Furthermore, visualising SOXE-module strength relative to the overall methylome profile using t-SNE showed that SOXE-ME values were highest in the most epigenetically divergent tumours (Fig. 5d). To investigate this further, we then correlated SOXE-ME values with probe-level methylation data directly, in the following regional categories: CpG islands (CGIs), CGI shores, shelves or open sea regions at transcription start site (TSS) regions, untranslated regions (UTRs), gene bodies or intergenic regions (IGRs). We also quantified methylation at ‘solo-WCpGW’ sites at late-replicating, heterochromatic loci, which act as a biomarker of replicative senescence and are hypomethylated in breast tumours compared to hMECs (Supplementary Fig. 5b). There was no relationship with solo-WCpGW sites (Supplementary Fig. 5c), but there was a striking inverse correlation between SOXE-ME values and genome-wide promoter methylation; particularly at CGI shores, the substrate for lineage-specific methylation in adult tissues (Fig. 5e and Supplementary Fig. 5c). These data indicate that SOXE-module expression and connectivity are directly proportional to promoter demethylation in TNBC (Fig. 5e). There was no such relationship with any other module in TNBC (Supplementary Fig. 5d). Having established that SOXE-module levels correspond with loss of tissue-specific 5mC marks, we then built a correlation Fig. 6 Model summarising the study findings. Proposed links matrix from ME and genome-wide promoter methylation data between established drivers of TNBC progression, epigenome (TCGA) and performed unsupervised clustering to look for erosion and the emergence of a neural crest-like transcriptional evidence of epigenetic control. The SOXE-module had a distinct programme in de-differentiated TNBCs. promoter methylation signature—three clusters of genes that are hypomethylated when SOXE-module strength is highest, of which NCSC-like reprogramming and poor clinical outcomes in two were enriched with developmental ontologies (Fig. 5f and SOX10 + TNBCs (Fig. 6). Supplementary Table 14). Only 10% of these correspond to SOXE- module genes, but this 10% is enriched with hub genes (Fig. 5g), suggesting a higher level of epigenetic control over module DISCUSSION structure and information flow. We then used GSEA to test the Heterogeneity has emerged as a major bottleneck to effective enrichment of the SOXE-associated promoter methylome with sub-classification and treatment of cancer, and TNBC is no NCSC genesets. Like the transcriptome (Fig. 4d), the methylation exception. Post-treatment relapse occurs through clonal expan- landscape associated with the SOXE-module was also enriched sion of cells with pre-existing, advantageous mutations, but also with NCSC genes (NC terms: normalised enrichment score (NES) cell state changes brought about by adaptive epigenetic −03 −02 −1.5; q = 6.0E ; Ch.NCSC: NES −1.3; q = 3.6E ). remodelling—a phenomenon that unites the ‘cancer stem cell’ Finally, we investigated direct demethylation processes as 65 and ‘epigenetic progenitor’ models of cancer . The intrinsic potential enablers of SOXE-module formation by cross- plasticity of TNBC is problematic because existing therapies referencing SOXE-ME values from our three WGCNA datasets cannot eradicate a shifting target. Early evidence implies that (TCGA, ICGC, METABRIC) against the expression of demethylases in blocking this capability with epigenetic therapy may improve the EpiFactors database . There were direct associations with treatment efficacy, but this will require a deeper understanding of APOBEC3A/3B cytosine deaminases and TET1 (Supplementary Fig. how phenotypic plasticity evolves . TNBC exhibits genome-wide 5e). TET dioxygenase enzymes catalyse the first step of 5mC hypomethylation, which evidently drives de-differentiation by demethylation and are involved in processes requiring cell states destroying the state-defining epigenetic barcode of its normal 14–17,65,67 to be reset or adjusted, such as methylome erasure in cellular precursor, the LP cell . Differential methylation at preimplantation embryos, and epigenetic plasticity in brain certain genomic loci is prognostic in TNBC , and myriad studies regions that facilitate learning and memory. TET1 is a maintenance have helped to decipher the mechanistic contributions of demethylase that prevents methylation from spreading from individual writers, readers, and erasers of epigenetic marks, but 62,63 silenced loci, particularly at CGI shores . It has been causally the phenotypic manifestations of genome-wide 5mC loss have not implicated in TNBC metastasis and our findings suggest this may been extensively studied. be at least partly due to reinforcement of the SOXE-module. Consistent with functional analysis of Sox10 in experimental 29,32 In summary, the SOXE-module’s dominance over the TNBC mice , our human tumour network studies show that SOX10’s transcriptome is directly proportional to APOBEC activity, DSB TNBC-specific regulatory module confers similarity to highly repair and TET1 expression, which are all demethylating. Of all plastic NCSCs. We traced a cluster of super-connected SOXE- methylation domains across the genome, the module is most module genes back to the tissue-resident mammary stem and strongly correlated with hypomethylated promoter CGI shores— progenitor cells and found that in contrast to the normal breast the substrate for lineage-specific methylation. Kim et al. showed where it was associated with epithelial lineage differentiation, in that the minimal genetic requirements for reprogramming TNBC this core was connected to Wnt signalling, neuroglial postnatal fibroblasts with an NCSC identity are SOX10 expres- differentiation and embryo patterning genes. Critically, we found sion and the erasure of previous epigenetic memory .We that expression of the SOXE-module amongst TNBCs was postulate that progressive erosion of the epigenome in SOX10+ proportional to overall transcriptional similarity to Sox10+ tumour-initiating cells simulates these conditions, driving migratory NCSCs from chick embryos , despite there being Published in partnership with the Breast Cancer Research Foundation npj Breast Cancer (2022) 57 J.M. Saunus et al. 19–23 minimal direct overlap in member genes. We also identified SOXE- preclinical data on the potential for epigenetic modulators to module hub genes as points of maximum network vulnerability as combat phenotypic plasticity in TNBC. candidate therapeutic targets. In support of this approach, two of these—BBOX1 and BCL11A—have already been validated as such 68–72 METHODS in TNBC . Human tissue samples (also see Table 2) To better understand the evolution of NCSC-like transcriptional This study involved immuno-detection of SOX10 and other biomarkers in reprogramming, we investigated potential links to the established the following human tissue cohorts: drivers of TNBC development—genomic instability, large-scale CNAs, and defective DNA repair. We identified several processes 1. Reduction mammoplasty (RM) samples: obtained in collaboration that correlate significantly with the SOXE-module eigengene (DSB with Dr William Cockburn (Wesley Hospital, Brisbane) and the Royal repair, APOBEC and TET1 activity, which are all demethylating); but Brisbane and Women’s Hospital (RBWH) Plastics Unit. Nineteen RM specimens were used for IHC and IF analysis, and two for most discernibly, the loss of lineage-specific methylation marks at methylation arrays. Age, parity and menopausal status of these CGI shores. Several mechanisms have been postulated to patients were unknown. 30% of cases showed fibrocystic change contribute to widespread methylome erosion in cancer, including and 10% presented with columnar cell lesions (histopathology 58,59 DSB repair and reduced availability of 5mC substrates through review by SRL). metabolic reprogramming . Accepting that there are probably 2. Clinically annotated, primary breast tumour samples: multiple contributing factors in any individual tumour, our findings nevertheless suggest that NCSC-like reprogramming a. A cross-sectional primary breast tumour cohort comprising samples from Australia (treated by the RBWH Breast Unit) and occurs concomitantly with epithelial de-programming in TNBC. the UK (Nottingham University Hospital), from patients treated in The gene regulatory networks that operate in NCSCs are amongst the mid-1980s to mid-1990s. Tumour blocks were sampled as 25,74 the most evolutionarily conserved in vertebrates . We postu- 0.6 mm cores in tissue microarrays (TMAs). For baseline late that when the broadly open chromatin landscape of the early characteristics see Supplementary Table 2. embryo is simulated in epigenetically eroded tumours, dominant b. Metaplastic carcinomas (Asia-Pacific Metaplastic Breast Cancer fate specifiers like SOX10 may recreate their ancestral regulatory Consortium (whole sections). circuits by default. 3. Patient-matched primary TNBC and brain metastases (n = 19 pairs). Tumour blocks were sampled as 1.0 mm cores in TMAs. In summary, our data indicate that the extent of promoter methylation loss in SOX10+ breast tumours correlates with their transcriptomic similarity to NCSCs—the earliest developmental cell Ethics approval state programmed by SOX10 activity and one synonymous with Human research ethics approval was obtained from the Royal Brisbane and migration, multipotency and phenotypic plasticity. We propose that Women’s Hospital (2005000785), The University of Queensland (HREC/ during TNBC development, progressive erosion of the epigenome 2005/022) and North West Greater Manchester Central Health (15/NW/ drives de-differentiation while simultaneously making cells vulner- 0685). Written patient consent to use tissue for research purposes was able to NCSC-like reprogramming. Broadly, these findings support obtained where required under the conditions of these approvals and all Table 2. Biological resources. Resource Source, identifier and relevant citations Related figure(s) Tissue samples Histologically normal breast FFPE whole sections The Brisbane breast bank 1a–e 48,76 Fresh RM surgical samples The Brisbane breast bank 1f, Supp-1b, Supp-6a 48,89 Australian BC series, FFPE TMA sections & clinical data Pathology Qld & The Brisbane breast bank 2e, f, 3d–e, Supp-2h-k 90,91 UK breast cancer series, FFPE TMA sections & clinical data Nottingham Breast Cancer Research Centre 2e, f, Supp-2h-k 51,92 Metaplastic tumour series, FFPE sections & clinical data Asia-Pacific MBC consortium 2g 48,89 Patient-matched primary TNBCs and brain metastases Pathology Qld & The Brisbane breast bank 2h Cancer cell lines 293 T ATCC CRL-3216™ Supp-1a, Supp-2g MDA-MB-435S ATCC HTB-129™ Supp-1a, Supp-2e, Supp-2g HCC38 ATCC CRL-2314™ Supp-2e HCC1569 ATCC CRL-2330™ Supp-2e, Supp-2g Primary melanoma cells (D41, D05) Dr. Chris Schmidt, QIMR Berghofer Supp-2e TaqMan gene expression assays SOX10 ThermoFisher, Hs00366918_m1 Supp-2e RPL13A ThermoFisher, Hs03043885_g1 Supp-2e shRNA sequences SOX10_1 Sigma-Aldrich TRCN0000018984 Supp-1a, Supp-2g SOX10_2 Sigma-Aldrich TRCN0000018987 Supp-1a, Supp-2g SOX10_3 Sigma-Aldrich TRCN0000018988 Supp-1a, Supp-2g Non-targeted negative control (NTNC) Sigma-Aldrich SHC002 Supp-1a, Supp-2g Supp supplementary. npj Breast Cancer (2022) 57 Published in partnership with the Breast Cancer Research Foundation J.M. Saunus et al. Table 3. software, code, and published datasets. ResRource Source, identifier and relevant citations Related figure(s) Related table(s) Software packages and code ChAMP https://bioconductor.org/packages/release/bioc/html/ChAMP.html 5d–f – Clustergrammer https://maayanlab.cloud/clustergrammer/ 3b Supp-10 85,86 Community detection algorithms Refs. Supp-4a – Epifactors database https://epifactors.autosome.ru Supp-5e – FACSDiva™ BD Biosciences, licensed 1f, Supp-6a – FCS Express (v7) De Novo Software, licensed 1f, Supp-6a – GSEAPreranked https://genepattern.org 3c, 4d, 5f, Supp-3 1, Supp-4, Supp- Ingenuity Pathways Analysis (IPA) Ingenuity, licensed – 1 MATLAB Mathworks, licensed Supp-4a Supp-10 Princeton Generic GO term finder https://go.princeton.edu 5a Supp-13, 14 Prism (v8.4.3) GraphPad, licensed Multiple S2 R package, Cluster https://cran.r-project.org/web/packages/cluster/index.html 5f – R package, FlashClust https://cran.r-project.org/web/packages/flashClust/index.html 5f, g Supp-14 R package, Limma https://www.bioconductor.org/packages/release/bioc/html/limma. Supp-3 Supp-3 html R package, t-SNE https://CRAN.R-project.org/package=Rtsne 5d – 52,53 R package, WGNCA https://cran.r-project.org/web/packages/WGCNA/index.html Multiple Multiple REVIGO http://revigo.irb.hr Supp-3 Supp-4 Singscore https://www.bioconductor.org/packages/release/bioc/html/ 4c – singscore.html SPSS IBM, licensed – Supp-2 Tableau desktop (2020.4) Tableau, licensed 4a – Published datasets Cell line expression data https://www.ebi.ac.uk/arrayexpress (E-TABM-157) Supp-2e, f – Cell line expression, CNA and https://www.ncbi.nlm.nih.gov/gds (GSE42944; GSE48216) Supp-2e, f – methylation datasets Chicken embryo neural crest gene set Ref. , Supplementary Table 1 4b–d Supp-11 Gene ontology resource http://geneontology.org – Supp-11 Genomic locations of solo-WCpGW sites Ref. Supp-5c – hMEC ChIP-seq data www.epigenomes.ca; ref. 1f – hMEC gene expression array data Gene expression omnibus, https://www.ncbi.nlm.nih.gov/geo/ 1e – (GSE16997); and ref. (Tables S5–8) Human reference genome NCBI build 37 UCSC Genome Browser https://genome.ucsc.edu 2d, Supp-5a – (GRCh37/hg19) ICGC gene expression data Ref. , Supplementary Table 7 – Supp-8 ICGC HRDetect scores Ref. , Supplementary Table 3b 5c – ICGC mutational signatures (COSMIC, v2 Ref. , Supplementary Table 21B, S21E 5c – SigProfiler) Illumina Infinium Omni2.5 array data https://www.ncbi.nlm.nih.gov/geo/ (GSE199579) 1f, Supp-5b – METABRIC gene expression & EGAD00010000210, EGAD00010000211, EGAS00000000083; EGA 2a, 3f, g, Supp-3, Supp-4, Supp-7 clinical data portal, via data access committee Supp-4c, d MetaCore https://portal.genego.com Supp-3 Supp-4 SOXE-module network metrics This paper 4a, f, 5b, g Supp-10 TCGA clinicopathologic annotation Ref. 2a–d, 3a – TCGA gene copy-number data Gistic2.Level_4; TCGA Data Analysis Center Firehose https://gdac. 2b, 5a, b, Supp-5a – broadinstitute.org TCGA gene-level methylation data Preprocess/meth.by_min_expr_corr; TCGA Data Analysis Center 2b, c – Firehose https://gdac.broadinstitute.org TCGA Illumina HiSeq RNASeq-v2 RSEM illuminahiseq_rnaseqv2-RSEM_genes_normalized (MD5); TCGA Data 2a, c Supp-4 level-3 normalised datasets Analysis Center Firehose https://gdac.broadinstitute.org TCGA Illumina HiSeq RNASeq-v2 RSEM TCGA Data Analysis Center Firehose https://gdac.broadinstitute. 3a, S3 Supp-3, 5, 6, 9, level-3 raw counts org 10, 12, 13 TCGA probe-level methylation data Humanmethylation_450; TCGA Data Analysis Center Firehose 5d–f, Supp-5b–d – https://gdac.broadinstitute.org Published in partnership with the Breast Cancer Research Foundation npj Breast Cancer (2022) 57 J.M. Saunus et al. Table 3 continued ResRource Source, identifier and relevant citations Related figure(s) Related table(s) Triple-negative breast cancer subtypes Ref. , Supplementary Table 19 3f, Supp-2b – (Burstein et al) 81,95 Tumour purity for TCGA cases Supp data-1 (CPE metric) & infinium metric, refs. Multiple – WGCNA ME dataset, ICGC cases This paper Multiple Supp-8 WGCNA ME dataset, METABRIC cases This paper Multiple Supp-7 WGCNA ME dataset, TCGA normal cases This paper Multiple Supp-12 WGCNA ME dataset, TCGA tumour cases This paper Multiple Supp-6 WGCNA mod membership dataset This paper Multiple Supp-5 (TCGA cohort) Supp supplementary. samples were de-identified in the analytical database. This study complies Fluorescence data acquisition, gate placement and sorting were performed with the World Medical Association Declaration of Helsinki. on a BD FACS Aria II instrument with FACSDiva software (v6.1.3; QIMR Berghofer). Sorted cells were collected on ice before being pelleted (80×g, 2 min) and snap-frozen at −70 °C. Immunohistochemistry (IHC) Formalin-fixed, paraffin-embedded (FFPE) tissue samples or TMAs were Methylation array profiling and ChIP-seq meta-analysis sectioned, deparaffinised, subjected to antigen retrieval and chromogeni- cally stained as described in ref. and detailed in Supplementary Table 1. DNA was extracted from FACS-sorted hMEC samples using the QIAGEN Slides were scanned using the Aperio ScanScope T2 digital scanning AllPrep DNA/RNA mini kit, with bisulphite conversion using the EZ DNA system at 40x magnification. TMA images were segmented using Spectrum methylation Kit (Zymo Research) following the manufacturer’s protocol software (Aperio), and high-resolution images of individual cores were with modification for Illumina methylation arrays. Bisulphite-converted extracted and scored by two experienced observers in a blinded fashion DNA was amplified and hybridised to Infinium methylationEPIC 850k (hidden metadata tags corresponding to TMA position were used to link beadchips (Illumina) according to the manufacturer’s protocol. Arrays were clinical and sample data). Digital image files were scored according to the scanned on an iScan, and data were processed using GenomeStudio criteria set out in the legends to Figs. 2e and S2h. (Illumina) with BMIQ array normalisation to derive average methylation beta-values. Histone modification ChIP-seq data were obtained from Pellacani et al. . Immunofluorescence (IF) Bigwig format files were retrieved from www.epigenomes.ca, and the FFPE RM tissue sections (Table 2) were sectioned, deparaffinised, subjected 76 mean signal/bin was plotted across the region chr22:38365030-38396083 to antigen retrieval and stained as described in ref. (Supplementary for each histone mark in each cell type. Table 1). Briefly, primary antibodies diluted in tris-buffered saline (TBS) were incubated on tissue sections for 1 h at room temperature, washed in TBS then incubated with secondary antibodies for 30 min in the dark. To Analysis of SOX10 expression in cell lines minimise tissue autofluorescence, slides were stained with SUDAN Black MDA-MB-435, HCC1569 and HCC38 cells were from the American Type Cell for 20 min in the dark (Sigma #S-2380), then washed (0.1% TBS-Tween Culture Collection (ATCC; (Table 2); authenticated in our laboratory and (30 min), TBS (10 min). Slides were mounted using Vectashield (Vecta Labs) cultured according to ATCC recommendations . D41 and D05 melanoma with DAPI (Sigma-Aldrich), cover-slipped, sealed and imaged on a Carl cells were selected from the primary melanoma cell line bank of Dr Chris Zeiss MicroImaging system using Axio Vision LE version 4.8.2 (PerkinElmer). Schmidt and Prof Nick Hayward (QIMR Berghofer) based on having high and low baseline SOX10 expression, respectively . Cells were routinely Fresh reduction mammoplasty (RM) tissue processing and cultured at 37 °C in a humidified atmosphere with 5% CO and routinely fluorescence-activated cell sorting (FACS) screened for mycoplasma. RNA and protein were extracted from cells in the exponential phase of growth using standard Trizol and RIPA buffer RM samples were processed, and single-cell suspensions were prepared as 48,76 methods . SOX10 mRNA was quantified relative to RPL13A as previously previously described (Table 2 and refs. ). Briefly, tissue was cut into described (ref. and Table 2). For Western analysis (MDA-MB-435, small pieces (~5 mm ) and digested overnight with agitation at 37 °C in HCC1569, HCC38 cells), protein lysates (30 μg) were resolved by SDS-PAGE DMEM-F12 (Gibco), foetal bovine serum ((FBS), 5%, Gibco), antibiotic/ then SOX10 and β-actin were detected using standard chemiluminescence antimycotic (Gibco), Amphotericin B (2.5 μg/mL, Gibco), collagenase type I-A (200 U/mL, Sigma-Aldrich) and Hyaluronidase I-S (100 U/mL, Sigma- (Supplementary Table 1). Aldrich). Epithelial organoids were obtained by centrifugation (80 × g, 1 min), then dissociated to single-cell suspensions for 5–10 min in TrypLE Stable-shRNA knockdown of SOX10 in breast cancer cell lines (Gibco), followed by Dispase (5 mg/mL, Gibco) and DNAse-I (100 ug/mL, Three pre-validated SOX10-targeted shRNA constructs, and a non-targeting Invitrogen). Enzymatic activity was quenched in ice-cold Hank’s Balanced negative control (NTNC) construct (pLKO.1), were purchased from Sigma- Salt Solution ((HBSS), Gibco) with 2% FBS and cells were filtered through a Aldrich (Table 2). Plasmid DNA was isolated from overnight bacterial 40-μm cell strainer (BD Falcon). cultures, then lentiviral particles were produced by triple transient Cell concentration and viability were determined using a Countess transfection of HEK-293T (human embryonic kidney) packaging cells with automated counter (Invitrogen) with trypan blue and adjusted to 2.0E /mL. one of the four transfer plasmids (pLKO.1-puro; 2 μg), together with Single-cell suspensions (typically 30–60 mL) were labelled for 10 min on ice TM companion plasmids encoding lentiviral packaging and replication with Sytox green (Invitrogen) plus a cocktail of fluorescent antibody elements (2 μg pHR’8.2ΔR + 0.25 μg pCMV-VSV-G; donated by Dr Wei Shi, conjugates to discriminate hMEC subsets (negatively gated, non-epithelial QIMR Berghofer). Virus-containing supernatants (in target cell media) were ‘lineage’ markers: CD31, CD45, CD140b; positively gated hMEC markers: then collected over the following two days and filtered (0.45 μm). MDA- CD49f, EpCAM—see Supplementary Table 1 and Supplementary Fig 6a). 4 2 MB-435 target cells were seeded at 3.1 × 10 /cm in six-well plates, then Samples were washed (80×g, 2 min) and then resuspended in cold HBSS+ after 24–48 h (at ~50% confluence), cells were infected with filtered viral 2% FBS. For robust fluorescence compensation and gating of specific hMEC populations, we also tested in parallel small samples stained with supernatants, supplemented with 1 mg/mL polybrene (Sigma-Aldrich) for isotype control antibodies, and ‘fluorescence minus one’ negative controls 24 h. Stably transduced cells were then selected with 1 μg/mL puromycin (samples from which one of the main conjugates was omitted). (Sigma-Aldrich) for 2 weeks to eliminate uninfected cells. npj Breast Cancer (2022) 57 Published in partnership with the Breast Cancer Research Foundation J.M. Saunus et al. Datasets and processing Modules were identified using the TCGA RNAseq (n = 919 samples after quality filtering) and validated using METABRIC (n = 1278; TCGA level-3 normalised RNAseq data ('rnaseqv2 illuminahiseq rnaseqv2 unc expression array; Supplementary Fig 6b–d). A consensus set of eight edu Level 3 RSEM genes normalised data.data.txt') from the Data Analysis modules was determined according to satisfactory concordance Center Firehose (http://firebrowse.org/) were used for all single-gene between these two orthogonal networks and a third was generated analyses (Supplementary Figs. 2a, 5e; test group stratification for Supplementary Fig. 3; SOX10 heatstrips in Fig. 3a and Supplementary from the ICGC dataset (n = 342; RNAseq). We further validated the Fig. 6a, c). Scaled estimate columns of the 'rnaseqv2 illuminahiseq rnaseqv2 eight consensus modules using preservation analysis on a third breast unc edu Level 3 RSEM genes data.data.txt' were used for all other cancer expression dataset. For normal breast samples, WGCNA was algorithmic analyses. performed independently on TCGA normal breast samples (n = 97 after For methylation datasets, TCGA level-3 Illumina HM450k data were quality filtering). downloaded from the National Cancer Institute Genomics Data Commons Standard WGCNA outputs include the following (raw data in Supple- (GDC) data portal (https://portal.gdc.cancer.gov/) and processed using the mentary Tables 5–11): ChAMP package . We applied the champ.filter function to remove Module eigengene (ME): a theoretical gene that is the most strongly problematic probes (those mapping to X/Y chromosomes, mapping to connected to all other genes in the module and hence represents net multiple locations, located near an SNP and non-CG probes). Filtered data module expression and connectivity. Mathematically, the first principal were normalised using the champ.norm function, according to the Beta- component of each module’s adjacency matrix. Mixture Quantile (BMIQ) algorithm; is an intra-sample normalisation Module membership and connectivity: Each gene is ascribed k values procedure that corrects the bias of type-2 probe values. describing modular and network connectivity (kTotal, kWithin and Level-4 GISTIC-2 copy-number data for TCGA cases were downloaded kOut). These continuous variables are amenable to integrated analysis from the Data Analysis Center Firehose (http://firebrowse.org/) and used of overlapping transcriptional programmes, utilising the granularity in for correlative analyses with no further processing. To apply tumour purity expression datasets rather than levelling it as is done when assigning cutoffs (TCGA cases), we used a consensus measurement of four different 81 fixed phenotypes or categories. kME correlation and kME p values purity estimation methods . describe how tightly individual genes are linked to all other genes With permission from the METABRIC data access committee, normalised within each module. Illumina HT 12 expression array data were downloaded from the European To identify hub genes (Supplementary Fig. 6e), additional network Genome-phenome Archive (EGAD00010000210-211). For the ICGC RNAseq 45 connectivity and influence measures were calculated for each node in dataset, normalised data were downloaded as supplementary data and the SOXE-module topological overlap matrix using igraph toolkit used with no further processing. Mutational signature data (COSMIC, v2 45 functions in R: SigProfiler) were downloaded as raw event counts from ref. and 57 betweenness centrality: betweenness(graph, v = V(graph), directed= HRDetect probability scores for these cases from ref. . FALSE, weights= NULL, nobigint = TRUE, normalised= FALSE). eigencentrality: eigencentrality(graph, directed= FALSE, scale= TRUE, Differential expression analysis of SOX10-high and -low TNBCs weights= NULL, options = arpack defaults). (Supplementary Fig. 3) 85,86 Finally, we used community detection algorithms to examine the To characterise the transcriptomic phenotype associated with SOX10 substructure of the SOXE-module (MATLAB 2020a), using the adjacency expression in TNBC, we performed differential expression analysis of matrix as input. This revealed a hierarchical, sub-modular organisation, and SOX10-high versus SOX10-low (median split) TCGA and METABRIC datasets consistently discriminated two partitions (59 and 41% of nodes each). To using limma (differential expression was defined by a corrected p value identify the module ‘control centre’ and hub genes as points of structural cutoff of 0.01). vulnerability, submodule assignment was cross-referenced against clus- tered Cosine similarity data (Fig. 3b, Clustergrammer ) with the same input (Supplementary Fig. 4). Ontology enrichment analyses GO term enrichment analysis was performed using the Generic GO term finder hosted by Princeton University (Lewis-Sigler Institute for Integrative Neural crest genesets Genomics; https://go.princeton.edu). Gene set enrichment analysis (GSEA) Geneset-1 (NC terms) comprises 308 genes represented in at least two of was performed using the Prerank function of GenePattern using 1000 the 78 terms matching ‘neural crest’ and ‘human’ in the gene ontology permutations. For Supplementary Fig 3, GSEA inputs comprised differen- database (http://geneontology.org). Geneset-2 (ch.NCSC) comprises the tially expressed genes (q ≤ 0.01) ranked by fold-change in each dataset. top 200 transcripts statistically over-represented in Sox10+ chick neural The input for all other GSEA experiments was whole transcriptome gene crest cells compared to all other embryo cells (fold-change 3.9–23.3; false lists ranked by a Spearman correlation coefficient. Biological process −03 −15 55 discovery rate 9.3E –1.0E ) (Supplementary Table 11). The ch.NCSC genesets (Gene Ontology v7.2; gene set size 15-500) were mined for gene set represents genes coordinately expressed with Sox10 in a stem cell unsupervised analyses and neural crest genesets for supervised analyses state hence was also suiTable-for network analyses (see below). We used (Supplementary Table 11). Datasets and ranking metrics are indicated in the singscore algorithm to score RNAseq datasets against the neural crest the respective Figure legends. Normalised enrichment scores (NES) and ® genesets at the individual sample level. corrected p values are reported. GeneGo (Metacore Clarivate Analytics) and Ingenuity Pathway Analysis (Ingenuity) were also used to analyse pre- ranked gene lists. REVIGO was used to resolve semantic redundancy and Breast cancer methylation data analyses identify major themes amongst the enriched terms. Methylation beta-values were derived from TCGA level-3 Illumina HM450k data as outlined above. Beta-values for all probes corresponding to TSS1500, TSS200 and 5′UTR regions in each sample were first normalised to Weighted gene co-expression network analysis (WGCNA)— correct for their bimodal distribution (median absolute deviation (MAD): P module identification and validation – median(P – median(R )); where P= probe in the promoter region and R β β WGCNA is a powerful network analysis tool that identifies groups of = all probes in promoter region). After filtering out genes with >2 missing transcripts (modules) that fluctuate in a highly coordinated fashion, 52,53 probes and those for which >2% of samples were missing data, the final implying co-functionality . First, it iteratively correlates the expression dataset included average MAD-normalised promoter methylation beta- of every pair of transcripts in a test dataset, producing an adjacency matrix. values for 4482 genes (determined from a total of 518 samples with It then converts this to a topological overlap matrix that reflects net complete clinical annotation). Pairwise Spearman correlations were then connection weight, accounting for both direct connections and the calculated between each promoter region and each module eigengene impacts of shared neighbours. In this study, we created ‘signed’ networks, across the sample cohort. Unsupervised hierarchical clustering of correla- which reflect the overall topological overlap considering both positive and tion values was performed in R using the Flashclust package based on the negative correlations. Dynamic module identification and characterisation Euclidean distance method. Clusters were visualised and validated with the (derivation of network metrics, sample eigengene values and module preservation in orthogonal datasets, see below) were performed in the R cluster package, using the Silhouette coefficient to confirm distinct clusters. coding environment, and publication-quality figures were prepared from To generate t-distributed stochastic neighbour embedding (t-SNE) plots, we raw datasets using GraphPad Prism or Clustergrammer (Table 2). used the Rtsne package (https://cran.r-project.org/web/packages/Rtsne/)on Published in partnership with the Breast Cancer Research Foundation npj Breast Cancer (2022) 57 J.M. Saunus et al. normalised beta methylation values, with 5000 iterations and a perplexity 23. Deblois, G. et al. Epigenetic switch-induced viral mimicry evasion in che- parameter of 40. motherapy resistant breast cancer. Cancer Discov. 10, 1312–1329 (2020). 24. Dravis, C. et al. Epigenetic and transcriptomic profiling of mammary gland development and tumor models disclose regulators of cell state plasticity. Cancer Reporting summary Cell 34, 466–482 e6 (2018). Further information on research design is available in the Nature Research 25. Hu, N., Strobl-Mazzulla, P. H. & Bronner, M. E. Epigenetic regulation in neural crest Reporting Summary linked to this article. development. Dev. Biol. 396, 159–168 (2014). 26. Southard-Smith, E. M., Kos, L. & Pavan, W. J. Sox10 mutation disrupts neural crest development in Dom Hirschsprung mouse model. Nat. Genet. 18,60–64 (1998). DATA AVAILABILITY 27. Kim, J., Lo, L., Dormand, E. & Anderson, D. J. SOX10 maintains multipotency and inhibits neuronal differentiation of neural crest stem cells. Neuron 38,17–31 Published datasets used in this paper are outlined in Table 3. Network data generated by the study are also outlined in Table 3, and available as supplementary data. Raw (2003). 28. McKeown, S. J., Lee, V. M., Bronner-Fraser, M., Newgreen, D. F. & Farlie, P. G. Sox10 DNA methylation array data for FACS-sorted normal breast epithelial cell subsets are overexpression induces neural crest-like cells from all dorsoventral levels of the available from the Gene Expression Omnibus (GSE199579; Table 3). neural tube but inhibits differentiation. Dev. Dyn. 233, 430–444 (2005). 29. Dravis, C. et al. Sox10 regulates stem/progenitor and mesenchymal cell states in mammary epithelial cells. Cell Rep. 12, 2035–2048 (2015). CODE AVAILABILITY 30. Chen, Z. et al. FGF signaling activates a Sox9-Sox10 pathway for the formation This study used published code and/or publicly available tools (see Table 3). and branching morphogenesis of mouse ocular glands. Development 141, 2691–2701 (2014). 31. Athwal, H. K. et al. Sox10 regulates plasticity of epithelial progenitors toward Received: 9 October 2021; Accepted: 5 April 2022; secretory units of exocrine glands. Stem Cell Rep. 12, 366–380 (2019). 32. Guo, W. et al. Slug and Sox9 cooperatively determine the mammary stem cell state. Cell 148, 1015–1028 (2012). 33. Mertelmeyer, S. et al. The transcription factor Sox10 is an essential determinant of branching morphogenesis and involution in the mouse mammary gland. Sci. Rep. REFERENCES 10, 17807 (2020). 1. Fulford, L. G. et al. Basal-like grade III invasive ductal carcinoma of the breast: 34. Kim, Y. J. et al. Generation of multipotent induced neural crest by direct repro- patterns of metastasis and long-term survival. Breast Cancer Res. 9, R4 (2007). gramming of human postnatal fibroblasts with a single transcription factor. Cell 2. Prat, A. et al. Phenotypic and molecular characterization of the claudin-low Stem Cell 15, 497–506 (2014). intrinsic subtype of breast cancer. Breast cancer Res. 12, R68 (2010). 35. Ivanov, S. V. et al. Diagnostic SOX10 gene signatures in salivary adenoid cystic 3. Symmans, W. F. et al. Long-term prognostic risk after neoadjuvant chemotherapy and breast basal-like carcinomas. Br. J. Cancer 109, 444–451 (2013). associated with residual cancer burden and breast cancer subtype. J. Clin. Oncol. 36. Panaccione, A., Guo, Y., Yarbrough, W. G. & Ivanov, S. V. Expression profiling of 35, 1049–1060 (2017). clinical specimens supports the existence of neural progenitor-like stem cells in 4. Gao, R. et al. Punctuated copy number evolution and clonal stasis in triple- basal breast cancers. Clin. Breast Cancer 17, 298–306 e7 (2017). negative breast cancer. Nat. Genet. 48, 1119–1130 (2016). 37. Cimino-Mathews, A. et al. Neural crest transcription factor Sox10 is preferentially 5. Wang, Y. et al. Clonal evolution in breast cancer revealed by single nucleus expressed in triple-negative and metaplastic breast carcinomas. Hum. Pathol. 44, genome sequencing. Nature 512, 155–160 (2014). 959–965 (2013). 6. Yates, L. R. et al. Subclonal diversification of primary breast cancer revealed by 38. Jamidi, S. K. et al. SOX10 as a sensitive marker for triple negative breast cancer. multiregion sequencing. Nat. Med. 21, 751–759 (2015). Histopathology 77, 936–948 (2020). 7. Yang, F. et al. Intratumor heterogeneity predicts metastasis of triple-negative 39. Burstein, M. D. et al. Comprehensive genomic analysis identifies novel subtypes breast cancer. Carcinogenesis 38, 900–909 (2017). and targets of triple-negative breast cancer. Clin. Cancer Res 21, 1688–1698 (2015). 8. Lin, B. et al. Modulating cell fate as a therapeutic strategy. Cell Stem Cell 23, 40. Hu, N., Strobl-Mazzulla, P. H., Simoes-Costa, M., Sanchez-Vasquez, E. & Bronner, M. 329–341 (2018). E. DNA methyltransferase 3B regulates duration of neural crest production via 9. Nguyen, D. X., Bos, P. D. & Massagué, J. Metastasis: from dissemination to organ- repression of Sox10. Proc. Natl Acad. Sci. USA 111, 17911–17916 (2014). specific colonization. Nat. Rev. Cancer 9, 274–284 (2009). 41. Strobl-Mazzulla, P. H. & Bronner, M. E. A PHD12-Snail2 repressive complex epi- 10. Gupta, P. B., Pastushenko, I., Skibinski, A., Blanpain, C. & Kuperwasser, C. Pheno- genetically mediates neural crest epithelial-to-mesenchymal transition. J. Cell Biol. typic plasticity: driver of cancer initiation, progression, and therapy resistance. 198, 999–1010 (2012). Cell Stem Cell 24,65–78 (2019). 42. Pellacani, D. et al. Analysis of normal human mammary epigenomes reveals cell- 11. Hinohara, K. & Polyak, K. Intratumoral heterogeneity: more than just mutations. specific active enhancer states and associated transcription factor networks. Cell Trends Cell Biol. 29, 569–579 (2019). Rep. 17, 2060–2074 (2016). 12. Bell, C. C. & Gilan, O. Principles and mechanisms of non-genetic resistance in 43. Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast cancer. Br. J. Cancer 122, 465–472 (2020). tumours reveals novel subgroups. Nature 486, 346–352 (2012). 13. Granit, R. Z. et al. Regulation of cellular heterogeneity and rates of symmetric and 44. TCGA. Cancer Genome Atlas Network: Comprehensive molecular portraits of asymmetric divisions in triple-negative breast cancer. Cell Rep. 24, 3237–3250 human breast tumours. Nature 490,61–70 (2012). (2018). 45. Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole- 14. Keller, P. J. et al. Defining the cellular precursors to human breast cancer. Proc. genome sequences. Nature 534,47–54 (2016). Natl Acad. Sci. USA 109, 2772–2777 (2012). 46. Daemen, A. et al. Modeling precision treatment of breast cancer. Genome Biol. 14, 15. Lim, E. et al. Aberrant luminal progenitors as the candidate target population for R110 (2013). basal tumor development in BRCA1 mutation carriers. Nat. Med. 15, 907–913 47. Neve, R. M. et al. A collection of breast cancer cell lines for the study of func- (2009). tionally distinct cancer subtypes. Cancer Cell 10, 515–527 (2006). 16. Molyneux, G. et al. BRCA1 basal-like breast cancers originate from luminal epi- 48. McCart Reed, A. E. et al. The Brisbane breast bank. Open J. Bioresour. 5, 5 (2018). thelial progenitors and not from basal stem cells. Cell Stem Cell 7, 403–417 (2010). 49. Saunus, J. M. et al. Multidimensional phenotyping of breast cancer cell lines to 17. Proia, T. A. et al. Genetic predisposition directs breast cancer phenotype by guide preclinical research. Breast Cancer Res. Treat. 167, 289–301 (2018). dictating progenitor cell fate. Cell Stem Cell 8, 149–163 (2011). 50. Qi, J. et al. SOX10 - A novel marker for the differential diagnosis of breast 18. Chaffer, C. L. et al. Normal and neoplastic nonstem cells can spontaneously metaplastic squamous cell carcinoma. Cancer Manag. Res 12, 4039–4044 (2020). convert to a stem-like state. Proc. Natl Acad. Sci. USA 108, 7950–7955 (2011). 51. McCart Reed, A. E. et al. Phenotypic and molecular dissection of metaplastic 19. Hinohara, K. et al. KDM5 histone demethylase activity links cellular transcriptomic breast cancer and the prognostic implications. J. Pathol. 247, 214–227 (2019). heterogeneity to therapeutic resistance. Cancer Cell. 34, 939–953 e9 (2018). 52. Zhang, B. & Horvath, S. A general framework for weighted gene co-expression 20. Risom, T. et al. Differentiation-state plasticity is a targetable resistance mechan- network analysis. Stat. Appl. Genet. Mol. Biol. 4, Article17 (2005). ism in basal-like breast cancer. Nat. Commun. 9, 3815 (2018). 53. Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation 21. Flavahan, W. A., Gaskell, E. & Bernstein, B. E. Epigenetic plasticity and the hall- network analysis. BMC Bioinforma. 9, 559 (2008). marks of cancer. Science. 357, eaal2380 (2017). 54. Denkert, C. et al. Tumour-infiltrating lymphocytes and prognosis in different 22. Stirzaker, C. et al. Methylome sequencing in triple-negative breast cancer reveals subtypes of breast cancer: a pooled analysis of 3771 patients treated with distinct methylation clusters with prognostic value. Nat. Commun. 6, 5899 (2015). neoadjuvant therapy. Lancet Oncol. 19,40–50 (2018). npj Breast Cancer (2022) 57 Published in partnership with the Breast Cancer Research Foundation J.M. Saunus et al. 55. Simoes-Costa, M., Tan-Cabugao, J., Antoshechkin, I., Sauka-Spengler, T. & Bronner, 87. Fernandez, N. F. et al. Clustergrammer, a web-based heatmap visualization and M. E. Transcriptome analysis reveals novel players in the cranial neural crest gene analysis tool for high-dimensional biological data. Sci. Data 4, 170151 (2017). regulatory network. Genome Res. 24, 281–290 (2014). 88. Foroutan, M. et al. Single sample scoring of molecular phenotypes. BMC Bioin- 56. Pellacani, D., Tan, S., Lefort, S. & Eaves, C. J. Transcriptional regulation of normal forma. 19, 404 (2018). human mammary cell heterogeneity and its perturbation in breast cancer. EMBO 89. Kalita-de Croft, P. et al. Clinicopathologic significance of nuclear HER4 and J. 38, e100330 (2019). phospho-YAP(S(127)) in human breast cancers and matching brain metastases. 57. Davies, H. et al. HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on Ther. Adv. Med. Oncol. 12, 1758835920946259 (2020). mutational signatures. Nat. Med. 23, 517–525 (2017). 90. Tarek, M. A. et al. SPAG5 as a prognostic biomarker and chemotherapy sensitivity 58. Hayano, M. et al. DNA break-induced epigenetic drift as a cause of mammalian predictor in breast cancer: a retrospective integrated genomic transcriptomic and aging. Preprint at bioRxiv https://doi.org/10.1101/808659 (2019). protein analysis. Lancet Oncol 17, 1004–1018 (2016). 59. Yang, J.-H. et al. Erosion of the Epigenetic Landscape and Loss of Cellular Identity as 91. Tarek, M. A. et al. Association of Sperm-Associated Antigen 5 and Treatment a Cause of Aging in Mammals. BioRxiv preprint: https://doi.org/10.1101/808642. Response in Patients With Estrogen Receptor–Positive Breast Cancer. JAMA Network (2019). Open 3, e209486 (2020). 60. Zhou, W. et al. DNA methylation loss in late-replicating domains is linked to 92. Kalaw, E. et al. Metaplastic breast cancers frequently express immune checkpoint mitotic cell division. Nat. Genet. 50, 591–602 (2018). markers FOXP3 and PD-L1. Br J Cancer 123, 1665–1672 (2020). 61. Medvedeva, Y. A. et al. EpiFactors: a comprehensive database of human epige- 93. Boyle, E. I. et al. GO::TermFinder–open source software for accessing Gene netic factors and complexes. Database 2015, bav067 (2015). Ontology information and finding significantly enriched Gene Ontology terms 62. Jin, C. et al. TET1 is a maintenance DNA demethylase that prevents methylation associated with a list of genes. Bioinformatics 20, 3710–3715 (2004). spreading in differentiated cells. Nucleic Acids Res. 42, 6956–6971 (2014). 94. Liu, J. et al. An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High- 63. Putiri, E. L. et al. Distinct and overlapping control of 5-methylcytosine and Quality Survival Outcome Analytics. Cell 173, 400–416.e11 (2018). 5-hydroxymethylcytosine by the TET proteins in human cancer cells. Genome Biol. 95. Zheng, X., Zhang, N., Wu, H. J. & Wu, H. Estimating and accounting for tumor 15, R81 (2014). purity in the analysis of DNA methylation data from cancer studies. Genom Biol 64. Good, C. R. et al. TET1-mediated hypomethylation activates oncogenic signaling 18, https://doi.org/10.1186/s13059-016-1143-5 (2017). in triple-negative breast cancer. Cancer Res. 78, 4126–4137 (2018). 65. Feinberg, A. P., Ohlsson, R. & Henikoff, S. The epigenetic progenitor origin of human cancer. Nat. Rev. Genet. 7,21–33 (2006). ACKNOWLEDGEMENTS 66. Wahl, G. M. & Spike, B. T. Cell state plasticity, stem cells, EMT, and the generation We thank the many thousands of patients who have donated tissue for cancer of intra-tumoral heterogeneity. NPJ Breast Cancer 3, 14 (2017). research, and clinical staff who facilitate biobanking, particularly the Brisbane 67. Visvader, J. E. & Stingl, J. Mammary stem cells and the differentiation hierarchy: Breast Bank and Pathology Queensland. We acknowledge the support of Metro current status and perspectives. Genes Dev. 28, 1143–1158 (2014). North Hospital and Health Services for the collection of the clinical subject data 68. Liao, C. & Zhang, Q. BBOX1 promotes triple-negative breast cancer progression by controlling IP3R3 stability. Mol. Cell Oncol. 7, 1813526 (2020). and clinical subject materials. We are grateful to Dr Lynne Reid and Clay Winterford 69. Liao, C. et al. Identification of BBOX1 as a therapeutic target in triple-negative for valuable contributions; Dr Katia Nones (QIMR Berghofer) who supervised XMDL; breast cancer. Cancer Discov. 10, 1706–1721 (2020). Dr Chris Schmidt (QIMR Berghofer) and Prof. Alex Swarbrick (Garvan Institute) for 70. Zhu, L., Pan, R., Zhou, D., Ye, G. & Tan, W. BCL11A enhances stemness and donating cell lines; Dr William Cockburn and clinical staff (Wesley Hospital) for promotes progression by activating Wnt/beta-catenin signaling in breast cancer. normal breast tissue collections; Drs. Nic Waddell and Olga Kondrashova Cancer Manag. Res. 11, 2997–3007 (2019). (QIMR Berghofer) for supportive data analyses; and Drs. Juliet French (QIMR 71. Errico, A. Genetics: BCL11A-targeting triple-negative breast cancer? Nat. Rev. Clin. Berghofer) and Delphine Merino (Olivia Newton-John Cancer Research Institute) Oncol. 12, 127 (2015). for critical feedback. This study makes use of data generated by the Molecular 72. Khaled, W. T. et al. BCL11A is a triple-negative breast cancer gene with critical Taxonomy of Breast Cancer International Consortium, funded by Cancer Research functions in stem and progenitor cells. Nat. Commun. 6, 5987 (2015). UK, and the British Columbia Cancer Agency Branch. It was funded by NHRMC 73. Saggese, P. et al. Metabolic regulation of epigenetic modifications and cell dif- programme awards to S.R.L., G.C.-T. and K.K.K. (APP1017028 and APP1113867), ferentiation in cancer. Cancers 12, 3788 (2020). NHRMC project grants to PTS (APP1080985 and APP1164770) and an Australian 74. Simoes-Costa, M. & Bronner, M. E. Establishing neural crest identity: a gene Leadership Award to A.R. regulatory recipe. Development 142, 242–257 (2015). 75. Saunus, J. M. et al. Integrated genomic and transcriptomic analysis of human brain metastases identifies alterations of potential clinical significance. J. Pathol. 237, 363–378 (2015). AUTHOR CONTRIBUTIONS 76. Johnston, R. L. et al. High content screening application for cell-type specific Conception and design: J.M.S., X.M.D.L., K.N., A.R., D.V.N., P.T.S. and S.R.L. Data behaviour in heterogeneous primary breast epithelial subpopulations. Breast collection/contribution: J.M.S., X.M.D.L., K.N., A.R., A.H., A.E.M.R., M.L., A.C.V., J.R.K., Cancer Res. 18, 18 (2016). A.J.D., M.M., E.K., P.K.-d.C., I.G., F.A.-E., J.M.W.G., C.O., K.K.K., J.B., G.C.-T., A.R.G., E.A.R., 77. Pavey, S. et al. Microarray expression profiling in melanoma reveals a BRAF I.O.E., D.V.N. and P.T.S. Data analysis: J.M.S., X.M.D.L., K.N., A.R., A.H., S.L., D.V.N. mutation signature. Oncogene 23, 4060–4067 (2004). Manuscript drafting: J.M.S., A.E.M.R., D.V.N., P.T.S. and S.R.L. All authors read and 78. Momeny, M. et al. Heregulin-HER3-HER2 signaling promotes matrix approved the final manuscript. metalloproteinase-dependent blood-brain-barrier transendothelial migration of human breast cancer cell lines. Oncotarget 6, 3932–3946 (2015). 79. Vargas, A. C. et al. Gene expression profiling of tumour epithelial and stromal compartments during breast cancer progression. Breast Cancer Res. Treat. 135, COMPETING INTERESTS 153–165 (2012). The authors declare no competing interests. 80. Tian, Y. et al. ChAMP: updated methylation analysis pipeline for Illumina Bead- Chips. Bioinformatics 33, 3982–3984 (2017). 81. Aran, D., Sirota, M. & Butte, A. J. Systematic pan-cancer analysis of tumour purity. Nat. Commun. 6, 8971 (2015). ADDITIONAL INFORMATION 82. Ritchie, M. E. et al. limma powers differential expression analyses for RNA- Supplementary information The online version contains supplementary material sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015). available at https://doi.org/10.1038/s41523-022-00425-x. 83. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. Usa. 102, Correspondence and requests for materials should be addressed to Jodi M. Saunus 15545–15550 (2005). or Sunil R. Lakhani. 84. Supek, F., Bosnjak, M., Skunca, N. & Smuc, T. REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS ONE 6, e21800 (2011). Reprints and permission information is available at http://www.nature.com/ 85. Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of reprints communities in large networks. J. Stat. Mech. Theory Exp. 2008, P10008 (2008). 86. Lambiotte, R., Delvenne, J. C. & Barahona, M. IEEE Trans. Netw. Sci. Eng. 1, 76-90 Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims https://doi.org/10.1109/TNSE.2015.2391998 (2014). in published maps and institutional affiliations. Published in partnership with the Breast Cancer Research Foundation npj Breast Cancer (2022) 57 J.M. Saunus et al. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/. © The Author(s) 2022 npj Breast Cancer (2022) 57 Published in partnership with the Breast Cancer Research Foundation http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png npj Breast Cancer Springer Journals

Loading next page...
 
/lp/springer-journals/epigenome-erosion-and-sox10-drive-neural-crest-phenotypic-mimicry-in-mCUH04cY6v
Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2022
eISSN
2374-4677
DOI
10.1038/s41523-022-00425-x
Publisher site
See Article on Publisher Site

Abstract

www.nature.com/npjbcancer ARTICLE OPEN Epigenome erosion and SOX10 drive neural crest phenotypic mimicry in triple-negative breast cancer 1,2✉ 1 1 1 3 1 Jodi M. Saunus , Xavier M. De Luca , Korinne Northwood , Ashwini Raghavendra , Alexander Hasson , Amy E. McCart Reed , 1 1 1 1 1 4 1 Malcolm Lim , Samir Lal , A. Cristina Vargas , Jamie R. Kutasovic , Andrew J. Dalley , Mariska Miranda , Emarene Kalaw , 1 1 4 5 6 4 Priyakshi Kalita-de Croft , Irma Gresshoff , Fares Al-Ejeh , Julia M. W. Gee , Chris Ormandy , Kum Kum Khanna , 4 4 7 7 7 3,8 Jonathan Beesley , Georgia Chenevix-Trench , Andrew R. Green , Emad A. Rakha , Ian O. Ellis , Dan V. Nicolau Jr , 1 1,9 Peter T. Simpson and Sunil R. Lakhani Intratumoral heterogeneity is caused by genomic instability and phenotypic plasticity, but how these features co-evolve remains unclear. SOX10 is a neural crest stem cell (NCSC) specifier and candidate mediator of phenotypic plasticity in cancer. We investigated its relevance in breast cancer by immunophenotyping 21 normal breast and 1860 tumour samples. Nuclear SOX10 was detected in normal mammary luminal progenitor cells, the histogenic origin of most TNBCs. In tumours, nuclear SOX10 was almost exclusive to TNBC, and predicted poorer outcome amongst cross-sectional (p = 0.0015, hazard ratio 2.02, n = 224) and metaplastic (p = 0.04, n = 66) cases. To understand SOX10’sinfluence over the transcriptome during the transition from normal to malignant states, we performed a systems-level analysis of co-expression data, de-noising the networks with an eigen-decomposition method. This identified a core module in SOX10’s normal mammary epithelial network that becomes rewired to NCSC genes in TNBC. Crucially, this reprogramming was proportional to genome-wide promoter methylation loss, particularly at lineage-specifying CpG- island shores. We propose that the progressive, genome-wide methylation loss in TNBC simulates more primitive epigenome architecture, making cells vulnerable to SOX10-driven reprogramming. This study demonstrates potential utility for SOX10 as a prognostic biomarker in TNBC and provides new insights about developmental phenotypic mimicry—a major contributor to intratumoral heterogeneity. npj Breast Cancer (2022) 8:57 ; https://doi.org/10.1038/s41523-022-00425-x 10–12 INTRODUCTION other sources of heterogeneity are coming to light . For example, cellular heterogeneity is influenced by the differentiation Effective management of triple-negative breast cancer (TNBC) state of the normal cellular precursor(s) , which in TNBC is remains a significant challenge worldwide. These tumours lack 14–17 thought to be the luminal progenitor (LP) cell . expression of oestrogen and progesterone receptors (ER/PR) and ITH is also driven by phenotypic plasticity—the dynamic HER2, hence are not indicated for treatment with classical 10,11 reprogramming of cell state in response to extrinsic stimuli . molecular-targeted agents. Chemotherapy remains the most reliable Cancer cell state transitions can be de-differentiating (the loss of systemic treatment option, producing durable responses in ~60% of lineage commitment and acquisition of stem cell features) and/or patients, while the other ~40% typically present with lung, liver and/ 1–3 trans-differentiating (assuming the state of another cell type) . or brain metastases within 5 years . Second-line chemotherapy Compared to genomic and histogenic sources of ITH, how tumour can temporarily stabilise metastatic disease but is rarely curative, so cells invoke this capability is poorly understood, and yet these patients endure a heavy treatment burden for no lasting potentially more ominous for the patient, as cell state transitions benefit. Efforts to develop alternative treatments have been can be induced by treatment via heritable-epigenetic change. In hampered by molecular and cellular variability between, and within, controlled experimental conditions, drug-tolerant TNBC cell states individual tumours. Intra-tumoural heterogeneity (ITH) directly 19–23 can be averted by epigenome remodelling inhibitors , increases the probability of relapse because it diversifies the 4–7 substrate for clonal selection . It has been proposed that to suggesting these agents might reduce rates of relapse if used 8,11 further improve the prognosis for TNBC patients, we need to clinically . However, epigenetic therapies have genome-wide develop agents that target the drivers of heterogeneity itself . effects, so our ability to use them rationally requires a deeper TNBCs are characterised by defective DNA repair, mitotic understanding of the epigenome-driven features of treatment- spindle dysfunction, chromosomal aberrations, and a mutation refractory human tumours . 4,5 rate around 13 times that of other breast tumours . Genomic SOX10 is a transcription factor that was recently implicated in phenotypic plasticity in experimental models of TNBC .Itis first instability is a key driver of ITH, however only some cases can be explained by the selection of individual driver mutations , and expressed in embryonic neural crest stem cells (NCSCs), where its 1 2 The University of Queensland Faculty of Medicine, UQ Centre for Clinical Research, Herston, QLD, Australia. Mater Research Institute-The University of Queensland, Translational 3 4 Research Institute, Woolloongabba, QLD, Australia. School of Mathematical Sciences, Queensland University of Technology, Brisbane, QLD, Australia. QIMR Berghofer Medical 5 6 Research Institute, Brisbane, QLD, Australia. Breast Cancer Molecular Pharmacology Unit, School of Pharmacy and Pharmaceutical Sciences, Cardiff University, Cardiff, UK. The Kinghorn Cancer Centre, Garvan Institute of Medical Research and St. Vincent’s Hospital Clinical School, UNSW Sydney, Darlinghurst, NSW, Australia. Nottingham Breast Cancer Research Centre, Academic Unit for Translational Medical Sciences, School of Medicine, University of Nottingham Biodiscovery Institute, University Park, Nottingham, UK. 8 9 Mathematical Institute, University of Oxford, and Molecular Sense Ltd, Oxford, UK. Pathology Queensland, Royal Brisbane Women’s Hospital, Herston, QLD, Australia. email: j.saunus@uq.edu.au; s.lakhani@uq.edu.au Published in partnership with the Breast Cancer Research Foundation 1234567890():,; J.M. Saunus et al. self-reinforcing gene regulatory module facilitates multipotency + /CD49f + LP cells, moderate in the EpCAM-/CD49f + basal and cell migration, orchestrating the embryo patterning pro- compartment (myoepithelia and mammary stem cells (MaSCs)) 25–28 cess . Once patterning is complete, SOX10 is silenced in all and low in EpCAM + /CD49f- mature luminal (ML) cells (Fig. 1e). NCSC descendants except glial and melanocyte progenitors; and is SOX10 is epigenetically regulated in mouse mammary 40,41 nascently induced in ectoderm-derived epithelial progenitor cells gland , so we investigated this in human tissue. We isolated 29–33 of the salivary, lacrimal, and mammary glands . In the mouse, hMECs from two fresh RM samples using FACS with antibodies Sox10 is an obligate requirement for mammary gland develop- against CD49f and EpCAM, then performed high-density DNA ment. Its expression marks gland repopulating potential in the methylation array profiling. SOX10 was hypomethylated in LP and −06 basal (myoepithelial) compartment, while Sox10+ luminal cells basal samples (p < 1.0E ; Fig. 1f). Consistently, analysis of hMEC represent the committed progenitor fraction . Functional studies chromatin immunoprecipitation sequencing (ChIP-seq) data from have shown that Sox10 is one of several fate specifiers that six independent RM samples showed the SOX10 locus is regulates the equilibrium between mammary stem cell (MaSC) enriched with activating (H3K4me3, H3K27ac) and depleted of 29,32 and LP states . repressive H3K27me3 marks in LP and basal samples (Fig. 1f). In NCSCs where the genome is unmethylated and accessible, SOX10 facilitates a mesenchymal, migratory state, whereas its SOX10 is associated with poor clinical outcomes in TNBC function in adult tissues is influenced by the tissue-specific growth 43–45 Analysis of TCGA, METABRIC and ICGC breast tumour datasets factor milieu and lineage-specific DNA methylation. Remarkably, showed SOX10 mRNA is expressed almost exclusively in TNBC, ectopic expression of SOX10 reprogrammed postnatal fibroblasts with a bimodal distribution suggesting distinct SOX10 positive with multipotency and migration capabilities equivalent to NCSCs, and negative (+/−) subgroups (Fig. 2a and Supplementary Fig. providing they were also exposed to chromatin unpacking agents 2a). Consistent with other data , SOX10 mRNA is highest amongst and early morphogens (DNA methylation and histone deacetylase TNBCs classified as ‘basal-like, immune-suppressed’ (BLIS), though inhibitors plus Wnt activation) . This established that with the we noted that expression was heterogeneous amongst TNBC erasure of lineage-specific epigenetic marks and appropriate subtypes classified by gene expression profile (e.g. 23% of ‘basal- extrinsic cues, SOX10 can recreate its ‘default’ regulatory circuit like, immune-activated’ (BLIA) TNBCs also had SOX10 levels in the and that this is sufficient to phenocopy NCSCs. top quartile; Supplementary Fig. 2b). In terms of genomic drivers SOX10 expression in human breast cancer is associated with of SOX10 expression in breast cancer, copy-number (CN) TN, basal-like, metaplastic and neural progenitor-like pheno- amplification or gain at the SOX10 locus was evident in ~20% of 4,35–39 types . In transgenic mouse mammary tumour cells, it TNBCs (Fig. 2b) and was associated with higher mRNA levels in promoted invasiveness, expression of mammary stem/progenitor, both METABRIC and TCGA datasets (Fisher’s Exact p ≤ 0.001). EMT and NCSC genes and the repression of epithelial differentia- Analysis of TCGA HM450k methylation array data indicated that tion genes .These findings suggest that SOX10 could mediate SOX10 is frequently hypomethylated in TNBC (Fig. 2b) and that de-differentiation in TNBC; but the relevance is unclear, particu- this correlates strongly with expression (Fig. 2c and Figs. S2c, d), larly given there are no available inhibitors of SOX10 itself. We but does not extend to adjacent genes on chromosome 22 explored the significance of SOX10 in breast cancer development (Fig. 2d). Hence, like normal basal and luminal progenitor cells, and progression by immunophenotyping histologically normal gene-specific hypomethylation also underpins SOX10 expression breast tissue, and large breast tumour sample cohorts. To in a subset of TNBCs, and in some cases, this appears to be understand its contribution to phenotypic plasticity and identify reinforced by clonally selected CN gains. drivers of this capability, we performed systems-level analysis to Analysing published cell line gene expression and methylation map SOX10’s regulatory circuit in the broader TNBC transcrip- 46,47 48,49 array datasets and our cell line bank , we found that in tional network. contrast to tumours, TNBC cell lines express very low to undetectable levels of SOX10, and the SOX10 gene is hypermethy- lated (Fig. S2e, f). shRNA-mediated depletion of SOX10 in one of RESULTS the few positive lines (HCC1569) resulted in 100% cell death SOX10 is expressed in luminal progenitor cells of the human within a few passages (Supplementary Fig. 2g). mammary gland Next, we performed IHC studies to investigate the prognostic Functional studies have shown that SOX10 marks stem and significance of SOX10 expression at the protein level. Surveying a 29,32 luminal progenitor (LP) cells of the mouse mammary gland , large, cross-sectional cohort of invasive primary breast tumours but its expression pattern in the human breast has not been from Australia and the UK (n = 1330), we detected SOX10 almost established. Therefore, we performed immunohistochemical (IHC) exclusively in tumour cell nuclei of TN cases (Fig. 2e; see analysis of 19 histologically normal reduction mammoplasty (RM) Supplementary Table 2 for cohort characteristics). Approximately samples using a validated antibody (Supplementary Fig. 1a and 38% of TNBCs were classified as SOX10+, and another 11.5% Supplementary Table 1). SOX10 was detected in nuclei of ductal exhibited heterogeneous staining (see Fig. 2e and Supplementary and lobular epithelia, with individual terminal ducto-lobular units Fig. 2h for scoring thresholds). SOX10 positivity was associated (TDLUs) exhibiting either basal-restricted or combined baso- with histologic features typical of this group, such as high grade, luminal expression (Fig. 1a). Compared to ducts, lobules were metaplastic and medullary morphology, pushing margins and a more likely to exhibit luminal compartment expression of SOX10 larger size at diagnosis (Supplementary Table 2). Similar, though (Fig. 1b), consistent with a role in lobulogenesis. Indeed, TDLUs statistically weaker trends were found between these variables with basal-restricted SOX10 expressed high levels of luminal and heterogeneous SOX10 staining (Supplementary Fig. 2i). cytokeratins (CK)8/18, while TDLUs with dual-compartment SOX10 Rather than a simple correlate of the TN phenotype, SOX10 had low CK8/18. This was evident even in neighbouring structures positivity stratified TNBC-specific survival in both univariate (Fig. 2f of the same specimen (Fig. 1c and Supplementary Fig. 1b). and Supplementary Fig. 2j) and multivariate regression analyses, IHC analysis of serial sections showed SOX10+ luminal cells with a prognostic value greater than clinicopathologic indicators lacked ER and were positive for the LP marker c-Kit, with no used in current clinical practice: tumour size, grade, and the obvious relationship to proliferation marker Ki67 (Fig. 1d). We also density of tumour-infiltrating lymphocytes (TILs) (hazard ratio 1.8- analysed SOX10 mRNA in a published dataset from FACS-sorted 2.5; p = 0.02–0.002; Supplementary Table 2). Increased propensity human mammary epithelial cells (hMECs) . SOX10 levels were for brain metastasis is one of the factors underlying premature similar to established LP markers ELF5 and KIT: highest in EpCAM death in TNBC, so we also analysed patient-matched pairs of npj Breast Cancer (2022) 57 Published in partnership with the Breast Cancer Research Foundation 1234567890():,; **** **** **** **** **** **** **** **** ** J.M. Saunus et al. (i) basal (ii) basal + luminal c (i) (ii) SOX10 + (i) SOX10-het SOX10 – SOX10 (i) (ii) (iii) SOX10 0 50 100 50 100 % lobules % ducts CK8/18 (ii) lobules ducts c-kit 025 50 75 100 % SOX10+ lum cells merge Basal LP ML indistinct EGFR ESR1 CK8 CK18 SOX10 KIT ELF5 CK14 CK5 ER DNAme 1.00 0.50 0.00 H3K27me3 30 H3K27ac Basal Ki67 LP 10 ML Indistinct H3K4me3 0 exons UTR SOX10 TSS intron 383.7 383.8 383.9 Chrom 22 position (hg19 Mbp) Fig. 1 SOX10 is expressed in basal and luminal progenitor cells of the human mammary gland. a Representative SOX10 IHC analysis of reduction mammoplasty (RM) samples. Some terminal ducto-lobular units (TDLUs) had exclusive basal compartment expression (i) while others had expression in both basal and luminal compartments (ii). b (i) Analysis of SOX10 expression in ducts vs lobules of RM samples from 19 donors (whole sections). (ii) SOX10 expression in lobules was heterogeneous and more likely to occur in the luminal compartment (Mann–Whitney p = 0.011; n = 102 ducts and 102 lobules; median ± 95% confidence interval shown). c Representative immunofluorescent staining of SOX10 and CK8/18. Circled lobules and isolated cells (arrows) exhibited reciprocal expression of SOX10 (green) and CK8/18 (red) in structures with either (i) dual compartment (ii) or basal-restricted SOX10 expression. d IHC analysis of SOX10, c-kit, ER and Ki67 in serial RM sections. The three magnified regions represent major SOX10 staining patterns: (i) dual compartment, heterogeneous; (ii) dual compartment, homogeneous; and (iii) basal-restricted. Luminal SOX10 expression was directly associated with c-kit and inversely associated with ER, with no obvious relationship to Ki67 (e.g., cell cluster indicated with an arrow). e SOX10 mRNA levels in FACS-sorted human mammary epithelial cell (hMEC) subtypes . Differentiation markers were analysed for comparison: basal markers CK14 and CK5; luminal progenitor (LP) markers KIT and ELF5; and markers enriched in mature luminal (ML) cells: CK18 and ESR1 (isolates with significantly different marker levels according to paired ANOVA tests are indicated and colour-coded: ****p < 0.00001; ***p < 0.0001; **p < 0.001). Data shown were means ± standard error of the mean from three donors. f Average methylation beta-values of SOX10 probes in FACS-sorted hMEC samples (DNAme), aligned with histone modification signals in a published ChIP-seq dataset : H3K4me3, H3K27ac (activating) and H3K27me3 (repressive). Data were represented to scale on human chromosome 22. TSS transcription start site, UTR untranslated region. Indistinct = negative for CD45 (hematopoietic cells), CD31 (endothelia), CD140b (fibroblasts), EpCAM and CD49f (epithelia). Published in partnership with the Breast Cancer Research Foundation npj Breast Cancer (2022) 57 RM samples (n=19) Log z-score Signal J.M. Saunus et al. Fig. 2 Expression of SOX10 in human breast cancer. a Bimodal expression of SOX10 in TNBC compared to other breast cancers (nonTNBC) in the METABRIC cohort. b Frequency of copy-number alterations (CNAs) and DNA hypomethylation affecting SOX10 in TNBC and nonTNBC compared to the archetypal SOX10 + malignancy, melanoma (SKCM; TCGA datasets). c Correlation between SOX10 methylation and expression (normalised RNAseq counts) in SKCM, TNBC and nonTNBC (Spearman correlation coefficients (r) and p values are shown; derived from TCGA data). d Proportions of TNBC and nonTNBC cases with hypomethylation at each probe across the SOX10 locus (as defined in (b)). e Representative IHC showing SOX10-neg, heterogeneous and nuclear-positive (+) TNBCs. Tumours with absent or very weak nuclear staining in ≥50% of tumour cells were classified as SOX10-negative, while those with any one of replicate TMA cores exhibiting moderate-strong nuclear staining in <50% OR weak-moderate nuclear staining in ≥50% of tumour cells were classified as heterogeneous (see also Supplementary Fig. 2h). Survival curves of heterogeneous and negative categories overlapped (Supplementary Fig. 2j) and hence are grouped together here. f Kaplan–Meier analysis of the relationship between SOX10 nuclear positivity and breast cancer-specific survival (BCSS) in cross- sectional TNBCs. Log-rank test p value and hazard ratio (HR) are shown (95% confidence interval). g Kaplan–Meier analysis of the relationship between SOX10 nuclear positivity and BCSS in TNBCs classified as metaplastic breast cancers. Gehan–Breslow–Wilcoxon test p value shown. h SOX10 expression in brain-metastatic TNBC and matching brain metastases (BrM), compared to the frequency in cross-sectional TNBCs (Chi- square p value shown). primary TNBCs and brain metastases (n = 19 pairs). Compared to Considering all our IHC study findings, we concluded that cross-sectional TNBCs, SOX10 was over-represented in brain- strong nuclear expression of SOX10 is associated with TNBC metastatic cases, with SOX10 status concordant in ~90% of progression. matching brain tumours (Fig. 2h). Consistent with previous 37,50 reports , we also detected nuclear SOX10 in an independent SOX10’s TNBC regulatory module confers transcriptomic cohort of metaplastic breast cancers (MBC; Asia-Pacific Metaplastic 51 similarity to NCSCs Breast Cancer consortium ). Compared to cross-sectional cases, To investigate the basis of SOX10’s association with poor patient SOX10 staining was more heterogeneous in MBCs, and was not associated with TN status (Supplementary Fig. 2k); but was outcomes, we compared the expression profiles of TNBCs prognostic amongst MBCs with a TN phenotype (Fig. 2g). expressing high versus low levels of SOX10 mRNA and found that npj Breast Cancer (2022) 57 Published in partnership with the Breast Cancer Research Foundation J.M. Saunus et al. high SOX10 tumours were significantly enriched with the expression of mesenchymal, neural, and glial development genes (Supple- mentary Fig. 3 and Tables S3, S4). We then mapped SOX10’s regulatory neighbourhood within the breast cancer transcriptome using weighted gene co- expression network analysis (WGCNA). This approach quantifies co-variation in gene expression across a biological sample set to identify genes with highly coordinated regulation, which is 52,53 indicative of functional relatedness . We built a network from TCGA breast cancer RNAseq data (n = 919 cases) and validated it with datasets from METABRIC (n = 1278, expression array) and ICGC (n = 342, RNAseq). In this model, all genes expressed above a background threshold are connected (12,588 genes, 12,588 connections). The connection between each gene pair is based on a weighted correlation coefficient, and unsupervised cluster- ing can reveal groups of genes with a high probability of co- functionality (modular transcription programmes). The module eigengene (ME) is a centroid calculated for each module in each sample that represents both module expression and net connection strength. WGCNA partitioned ~20% of expressed genes into eight consensus modules that align with established hallmarks of breast cancer; for example, an ER/FOXA1-driven module expressed in luminal tumours, and a mitotic instability module in basal-like and luminal-B tumours (Table 1, Fig. 3a, Tables S5–S8 and Supp File 2). The remaining ~80% of genes were not linked to any one module. SOX10 was identified as one of the most interconnected genes in the ‘green’ module, which has a hierarchical structure (Fig. S4a, b) and is predominantly expressed in high-grade TNBCs (Supple- mentary Fig. 4c). In this module, SOX10’s co-expression profile was highly similar to genes implicated in Wnt signalling, neuroglial differentiation and embryo patterning (Fig. 3b). We named it the SOXE-module and ascribed ‘multipotency’ as its primary ontology, as the member gene list is enriched with developmental phenotypes, and includes all three SOXE family members (SOX8/ 9/10) and embryonic stem cell genes (LMO4, POU5F1) (Fig. 3c and Supplementary Table 9). IHC analysis of six other module members confirmed that their co-expression in TNBC holds true at the protein level (Fig. 3d), with staining often observed in the same cells within individual tumour-rich tissue cores (Fig. 3e). Consistent with the defining features of TNBCs—de-differentiation, genomic instability, high mitotic index and the presence of TILs—TNBCs express variable proportions of primarily three modules: green (SOXE), blue (mitotic instability) and yellow (TILs) (Fig. 3f). Kaplan–Meier analysis showed that cases expressing high levels of both SOXE and mitotic instability modules had shorter survival compared to those with predominant expression of one or the other, while co- expression of the yellow module was associated with better prognosis, consistent with the protective effect of TILs in TNBC (Fig. 3g and Supplementary Fig. 4d). The SOXE-module represents the shift from a luminal progenitor to an NCSC-like state Ontology analysis showed that the SOXE-module includes genes typically expressed in differentiating glia, cardiomyocytes, and odontoblasts, which all descend from NCSCs. In fact, develop- mental genes comprised a large proportion of SOXE-module hubs (genes with the highest network connectivity and centrality values; Fig. 4a and Supplementary Table 10), hence representing points of maximal module vulnerability. These include cell-fate regulators ELF5, FOXC1 and SOX10; Wnt/β-catenin signalling genes SFRP1, MAML2 and TRIM29; and embryonic cell migration and neuronal development genes RGMA, ROPN1, ROPN1B, MID1 and APCN. To directly investigate if the SOXE-module is associated with NCSC phenotypic mimicry, as has been reported for Sox10 in Published in partnership with the Breast Cancer Research Foundation npj Breast Cancer (2022) 57 Table 1. Key features of eight predominant gene co-expression modules extracted by WGCNA. a a b Modules Major functional ontologies Signalling pathways /intrinsic activators Size (no. genes) Top ten hub genes (Highest kWithin; see Supplementary Table 5) Tumour-centric Blue Mitotic instability FOXM1, MYBL2 1239 TPX2, BUB1, CEP55, HJURP, NCAPH, KIF4A, KIF2C, CCNB2, NCAPG, FOXM1 Green Multipotency (SOXE) Wnt signalling 487 ROPN1, SFRP1, FOXC1, RGMA, GABRP, CHST3, MAML2, APCN, ROPN1B, SOX10 Brown Primary cilium ER, FOXA1 1008 FOXA1, MLPH, ESR1, AGR3, XBP1, THSD4, GATA3, CA12, PRR15, ZMYND10 Tumour-stromal Magenta ECM-1 (structural) FBN1, RUNX2 186 COL5A2, COL1A2, COL3A1, COL5A1, COL6A3, FAP, THBS2, COL1A1, LUM, VCAN Black ECM2 (regulatory) – 207 OLFML1, RECK, FSTL1, DCN, MSRB3, ECM2, CCDC80, TCF4, ZEB1, GLT8D2 Red Fatty acid metabolism PPARγ 274 DIA1R, PDE2A, LHFP, LDB2, ARHGEF15, S1PR1, SDPR, EBF1, CD34, ERG Tan Type-I IFN response STAT1, IRF9 33 IFIT3, OAS2, CMPK2, IFI44L, IFI44, IFIT1, MX1, OASL, IFIT2, RSAD2 Stromal Yellow Adaptive immunity (TILs) CD40L, CD40, IFNγ, IRF1 712 SASH3, IL2RG, CD53, PTPN7, CD48, CD2, CD3E, ARHGAP9, CD5, CD3D, SIT1, SH2D1A ECM extracellular matrix. Gene set enrichment analysis (GSEA) of all BRCA genes ranked according to module eigengene correlation (Supplementary Table 9). Ingenuity pathways analysis upstream regulator prediction (p ≤ 1.0E-07) based on kWithin values for module genes. J.M. Saunus et al. Fig. 3 SOX10’s regulatory network is associated with multipotency, cell migration and poor prognosis in TNBC. a Relative expression of eight predominant transcription modules in human breast tumours, according to the PAM50 subtype (TCGA dataset). b SOXE-module co- expression profile similarity matrix, clustered to highlight genes with very highly coordinated expression. The similarity is based on cosine distance and has a maximum value of 1. SOX10 mapped to one of six module sub-clusters, the members of which are shown to the right of the matrix. See also Supplementary Fig. 4a, b. c Summary of results from unsupervised gene set enrichment analysis of the breast cancer transcriptome after ordering transcripts according to their correlations with SOXE-module expression (denoted by the ME value, TCGA dataset). d Tile plot showing overlapping expression of SOXE-module representatives. For each protein, significant co-expression with ≥2 other module members is indicated by a Fisher’s exact test result (*p < 0.05; ***p < 0.001; ****p < 0.0001). Refer to Supplementary Table 1 for scoring criteria. e IHC staining of representative SOXE-module nodes in serial sections from the same tumour. f Proportional expression of all eight modules (coloured as for (a)) in TNBCs annotated with PAM50 and TNBC subtypes (METABRIC dataset; LAR luminal androgen receptor- like, MES mesenchymal, BLIS basal-like immune-suppressed, BLIA basal-like immune-activated ). g Kaplan–Meier analysis of METABRIC TNBCs expressing different proportions of the three predominant TNBC modules. BCSS breast cancer-specific survival. ME fraction thresholds for classifying cases as high or low were 0.33 for SOXE/blue and 0.1 for yellow. 24 55 mouse mammary tumour cells , we performed expression and (‘ch.NCSC’; n = 200 genes) , representing Sox10’s most primitive enrichment analyses using two independent genesets: (1) 308 transcription programme (Supplementary Table 11). Except for genes represented in at least two of the 78 terms matching ‘neural SOX10, SOX8 and LMO4, there is minimal overlap between the crest’ in the gene ontology database (‘NC terms’); and (2) SOXE-module and these genesets (Fig. 4b), but their expression is transcripts specific to migratory, Sox10+ NCSCs in chick embryos strongly correlated (Fig. 4c). This was confirmed by geneset npj Breast Cancer (2022) 57 Published in partnership with the Breast Cancer Research Foundation J.M. Saunus et al. Fig. 4 The SOXE-module drives the transition from normal mammary epithelial stem/progenitor to NCSC-like phenotypic states. a Influence of SOXE-module genes over network architecture and information flow. kWithin: intramodular ‘connectivity’ based on weighted correlations with all other module genes; Eigencentrality: considers the connectivity of each node’s nearest neighbours as an indicator of ‘local influence’; Betweenness centrality: ‘conductivity’ based on each node’s position along the shortest paths between other nodes (genes with high betweenness are information conduits). Key hub genes are indicated (see Supplementary Table 10 for the full dataset). b Chick (ch.)NCSC and neural crest (NC) terms genesets are largely independent of each other and from the SOXE-module. c Correlations between SOXE-ME values and NCSC genesets (singscore values) in TNBC (n= 106 TCGA cases with tumour cellularity ≥0.6). Correlation coefficients (r) and p values are shown. d GSEA using three TNBC gene expression datasets (ICGC, METABRIC, TCGA). Normalised enrichment scores (NES) and corrected p values (q) shown. e Overlap between members of the SOXE-module and SOX10’s normal breast module (from de novo module identification on n = 97 TCGA normal breast samples; Supplementary Table 12). Generic ontology enrichment results are summarised (full GO term lists in Supplementary Table 13). f Comparison of network structure and information flow metrics (as for (a)) between shared and SOXE- module-exclusive genes. Groups were compared using Mann–Whitney tests (**p = 2.4E-03; ***p = 5.6E-04). Boxes show the 10–90th percentiles and median, with whiskers extending to the minimum and maximum values. Mean is indicated with ‘+’. g Model depicting the mammary epithelial progenitor gene regulatory network core being sustained through transformation and rewired as the SOXE-module in TNBC. Shared hub genes are listed. enrichment analysis (GSEA; Fig. 4d). Hence, the SOXE-module TNBC’s normal precursors are comparatively more important to confers transcriptomic similarity to NCSCs. the SOXE-module’s regulatory structure. Together, these data Since several SOXE-module genes (e.g. SOX10, SOX9, LGR6 and suggest that SOXE-module and its associated NCSC-like pheno- ELF5) are key regulators of normal hMEC states , we hypothesised type arise because a core set of epithelial differentiation and that the SOXE-module might evolve from the deregulation of a adhesion genes becomes rewired during TNBC development (Fig. 4g). lineage differentiation programme expressed in TNBC’s normal cellular precursors. Module preservation analysis using RNAseq Genomic and epigenomic determinants of the NCSC-like data from TCGA normal breast samples indicated that the SOXE- transcriptional shift in TNBC module does not exist as an interconnected unit in the normal To address the central question of what drives this transcriptomic breast transcriptome (Supplementary Fig. 4e). But after perform- shift, we analysed case-matched gene copy-number (CN), RNAseq ing de novo WGCNA module identification on this dataset and WGCNA data (TCGA cases). Candidate module drivers were (Supplementary Table 12), we found that SOX10’s normal breast defined as those for which both CN and expression correlated module overlaps with the TNBC-specific SOXE-module signifi- significantly with SOXE-ME values. About 182 genes met these cantly more than expected by chance (Fig. 4e; 109 shared genes, −26 criteria (130 gains and 52 losses), of which 140 (77%) are part of Chi-square p = 2.8E ). large chromosomal alterations: 6p21-22 (gained/amplified in Both ‘normal-exclusive’ and ‘shared’ genes were enriched with 56.7% of TNBC cases), 8q22-24 (gained/amplified in 78.7%), epithelial differentiation ontologies, with cell adhesion distinctly 9q34 (lost in 59.6%) (Supplementary Fig. 5a). SOXE-module genes over-represented in the shared set (Fig. 4e and Supplementary were over-represented amongst the positively correlated genes Table 13). According to network influence metrics, the shared high (25/130 (19.2%) and had increased CN and expression in SOXE genes were significantly more important to the SOXE-module than −31 TNBC; ChiSq p = 9.7E ; Fig. 5a). However, network influence SOXE-exclusive genes (Fig. 4f and Supplementary Fig. 4f). This suggests that while SOXE-exclusive genes are primarily respon- metrics for these 25 were no higher than other module genes sible for conferring NCSC-like attributes, genes ‘inherited’ from (Fig. 5b). Hence, the SOXE-module may be augmented by Published in partnership with the Breast Cancer Research Foundation npj Breast Cancer (2022) 57 J.M. Saunus et al. Yes No subst. c d Increased CN correlated 1387 16,307 sig-13: APOBEC 0.3 indels PAM50 with SOXE ME (p<0.01)? 7.8% 92.2% basal-like her2 sig-1 luminal A Expression also correlated 130 1257 0.2 rearrangements sig-3: HR-deficiency luminal B HRDetect with SOXE ME (p<0.01)? 9.4% 90.6% sig-8 normal-like tSNE-x RS3 RS2 sig-5: unknown Tx factors SOXE module RS5 25 105 Chromatin remodelling 0.1 member gene? 19.2% 80.1% SOX-E ME DNA repair RS4 max RS6 RS1 0.0 connectivity local influence conductivity min tSNE-x ns ns sig-2 sig-17 ns 1.0 4 -0.1 0.8 Median meth max (0.66) -log p (t-test) 0.6 0.4 min (0.15) tSNE-x -2 0.2 -4 -2 0.0 0.5 CGI shore shelf open sea CNA-correlated (n=25 genes) IGR av. meth-β correlation not correlated (n=463 genes) 3'UTR with SOXE eigengene 0.4 -0.3 gene body **** 5'UTR **** -0.5 0.3 TSS200 **** r -0.72 -0.7 TSS1500 Genome-wide promoter meth r 0.419 -13 f p 3.5E correlation with module exp 0.2 -0.03 0.00 0.03 0.06 -0.8 0.8 SOXE-ME Hypomethylated connectivity local influence conductivity clusters * ** * ns ns ns * *** ** 1.0 Developmental 0.8 ontology enrichment 2 0.6 0.4 meth-clust SOXE-mod genes -2 0.2 a n=106 b n=40 c n=35 -4 0.0 -2 other n=306 Fig. 5 The SOXE-module is driven by the erosion of lineage-specific epigenetic marks. a Decision tree for identifying candidate copy- number alteration (CNA) drivers of the SOXE-module. Of 17,694 genes with case-matched GISTIC, RNAseq and WGCNA data, CN, and expression of 130 correlated with the SOXE-module in TNBC, including 25 SOXE-module nodes. b Network influence metrics for SOXE-module nodes coloured according to candidate CN driver status (intramodular connectivity (kWithin), local influence (Eigencentrality) and conductivity (betweenness centrality) defined in Fig. 4a). Boxes show the 10–90th percentiles and median, with whiskers extending to the minimum and maximum values. Mean is indicated with ‘+’. No significant differences by ordinary ANOVA test. c Relationship between SOXE- module levels and mutation signatures in ICGC TNBCs (COSMIC v2 SigProfiler and HRDetect on n = 74 ICGC TNBCs) . Associations are depicted according to the correlation between SOXE-ME values and signature event count (y-axis); and by the significance of average SOXE- ME differences between ICGC TNBCs with low (quartile-1) vs higher (quartile 2–4) signature burden. d t-Distributed stochastic neighbour embedding (t-SNE) visualisation of genome methylation profile similarities amongst cases in the BRCA-TCGA 450k methylation array dataset. Panels are coloured according to PAM50 intrinsic subtype, SOXE-ME values or global median methylation-b values. Circled cases are epigenetically divergent, basal-like TNBCs that express high levels of the SOXE-module and have eroded methylomes. e Correlation analysis summary showing relationships between SOXE-ME values and region-specific methylation (n = 75 TCGA TNBCs, tumour cellularity ≥0.6; n = 215,323 probes after quality filtering); ****p < 1.0E-07. CGI CpG island, IGR intergenic region, TSS transcription start site, UTR untranslated region. Solo-WCpGW: consensus sequence for late-replicating loci demethylated via replicative senescence. f Unsupervised clustering of the BRCA-TCGA 450k methylation dataset according to ME correlation. Data shown were minimum correlation coefficients of ME values versus gene-averaged methylation-b data from promoter region probes (TSS1500, TSS200 and 5′UTR). Of three clusters inversely correlated with SOXE-module expression, two (a, b) were enriched with developmental ontologies (Supplementary Table 14). g Network influence metrics for SOXE-module genes in the hypomethylated clusters versus other SOXE-module genes, as for (b). Ordinary ANOVA p values: *p < 0.05; **p < 0.01; ***p < 0.001; ns not significant. increased CN of some of its component genes, but this seemed breaks (DSBs) and genome editing (sig-3: HR deficiency; HRDetect; unlikely to be an early or dominant driver of module evolution. sig13: APOBEC; Fig. 5c). Next, we investigated whether mutational processes that shape APOBEC activity and DSB repair are both indirectly demethylat- the breast cancer genome could be involved. To this end, we ing. For example, 5-methyl cytosine (5mC) loss occurs because of utilised case-matched mutational signature and WGCNA data for APOBEC-mediated genome editing and/or during the repair of 45,57 the ICGC cohort . There were direct relationships between the edited bases, and DSB repair has been causally linked to the 58,59 SOXE-module and overall mutation burden (substitutions and progressive loss of 5mC during cellular ageing . Therefore, we small insertion-deletion (indels)), as well as specific signatures of hypothesised that the evolution of the SOXE-module in TNBC may genome instability (rearrangement sigs (RS)3 and RS5), homo- be related to epigenetic dysregulation. Consistent with this idea, logous recombination (HR)-directed repair of double-strand DNA the 105 CN-driven SOXE-module correlates (i.e., those not part of npj Breast Cancer (2022) 57 Published in partnership with the Breast Cancer Research Foundation log kWithin brown black red eigencentrality magenta tan yellow blue SOXE log betweenness Correlation Mutation signature vs SOXE ME promoter log2 kWithin eigencentrality TSS1500 meth.β (case mean) log10 betweenness tSNE-y tSNE-y tSNE-y J.M. Saunus et al. the SOXE-module itself; Fig. 5a) were enriched with a transcription factor, chromatin remodelling and DNA repair genes (Fisher’s Exact p < 0.001). Furthermore, visualising SOXE-module strength relative to the overall methylome profile using t-SNE showed that SOXE-ME values were highest in the most epigenetically divergent tumours (Fig. 5d). To investigate this further, we then correlated SOXE-ME values with probe-level methylation data directly, in the following regional categories: CpG islands (CGIs), CGI shores, shelves or open sea regions at transcription start site (TSS) regions, untranslated regions (UTRs), gene bodies or intergenic regions (IGRs). We also quantified methylation at ‘solo-WCpGW’ sites at late-replicating, heterochromatic loci, which act as a biomarker of replicative senescence and are hypomethylated in breast tumours compared to hMECs (Supplementary Fig. 5b). There was no relationship with solo-WCpGW sites (Supplementary Fig. 5c), but there was a striking inverse correlation between SOXE-ME values and genome-wide promoter methylation; particularly at CGI shores, the substrate for lineage-specific methylation in adult tissues (Fig. 5e and Supplementary Fig. 5c). These data indicate that SOXE-module expression and connectivity are directly proportional to promoter demethylation in TNBC (Fig. 5e). There was no such relationship with any other module in TNBC (Supplementary Fig. 5d). Having established that SOXE-module levels correspond with loss of tissue-specific 5mC marks, we then built a correlation Fig. 6 Model summarising the study findings. Proposed links matrix from ME and genome-wide promoter methylation data between established drivers of TNBC progression, epigenome (TCGA) and performed unsupervised clustering to look for erosion and the emergence of a neural crest-like transcriptional evidence of epigenetic control. The SOXE-module had a distinct programme in de-differentiated TNBCs. promoter methylation signature—three clusters of genes that are hypomethylated when SOXE-module strength is highest, of which NCSC-like reprogramming and poor clinical outcomes in two were enriched with developmental ontologies (Fig. 5f and SOX10 + TNBCs (Fig. 6). Supplementary Table 14). Only 10% of these correspond to SOXE- module genes, but this 10% is enriched with hub genes (Fig. 5g), suggesting a higher level of epigenetic control over module DISCUSSION structure and information flow. We then used GSEA to test the Heterogeneity has emerged as a major bottleneck to effective enrichment of the SOXE-associated promoter methylome with sub-classification and treatment of cancer, and TNBC is no NCSC genesets. Like the transcriptome (Fig. 4d), the methylation exception. Post-treatment relapse occurs through clonal expan- landscape associated with the SOXE-module was also enriched sion of cells with pre-existing, advantageous mutations, but also with NCSC genes (NC terms: normalised enrichment score (NES) cell state changes brought about by adaptive epigenetic −03 −02 −1.5; q = 6.0E ; Ch.NCSC: NES −1.3; q = 3.6E ). remodelling—a phenomenon that unites the ‘cancer stem cell’ Finally, we investigated direct demethylation processes as 65 and ‘epigenetic progenitor’ models of cancer . The intrinsic potential enablers of SOXE-module formation by cross- plasticity of TNBC is problematic because existing therapies referencing SOXE-ME values from our three WGCNA datasets cannot eradicate a shifting target. Early evidence implies that (TCGA, ICGC, METABRIC) against the expression of demethylases in blocking this capability with epigenetic therapy may improve the EpiFactors database . There were direct associations with treatment efficacy, but this will require a deeper understanding of APOBEC3A/3B cytosine deaminases and TET1 (Supplementary Fig. how phenotypic plasticity evolves . TNBC exhibits genome-wide 5e). TET dioxygenase enzymes catalyse the first step of 5mC hypomethylation, which evidently drives de-differentiation by demethylation and are involved in processes requiring cell states destroying the state-defining epigenetic barcode of its normal 14–17,65,67 to be reset or adjusted, such as methylome erasure in cellular precursor, the LP cell . Differential methylation at preimplantation embryos, and epigenetic plasticity in brain certain genomic loci is prognostic in TNBC , and myriad studies regions that facilitate learning and memory. TET1 is a maintenance have helped to decipher the mechanistic contributions of demethylase that prevents methylation from spreading from individual writers, readers, and erasers of epigenetic marks, but 62,63 silenced loci, particularly at CGI shores . It has been causally the phenotypic manifestations of genome-wide 5mC loss have not implicated in TNBC metastasis and our findings suggest this may been extensively studied. be at least partly due to reinforcement of the SOXE-module. Consistent with functional analysis of Sox10 in experimental 29,32 In summary, the SOXE-module’s dominance over the TNBC mice , our human tumour network studies show that SOX10’s transcriptome is directly proportional to APOBEC activity, DSB TNBC-specific regulatory module confers similarity to highly repair and TET1 expression, which are all demethylating. Of all plastic NCSCs. We traced a cluster of super-connected SOXE- methylation domains across the genome, the module is most module genes back to the tissue-resident mammary stem and strongly correlated with hypomethylated promoter CGI shores— progenitor cells and found that in contrast to the normal breast the substrate for lineage-specific methylation. Kim et al. showed where it was associated with epithelial lineage differentiation, in that the minimal genetic requirements for reprogramming TNBC this core was connected to Wnt signalling, neuroglial postnatal fibroblasts with an NCSC identity are SOX10 expres- differentiation and embryo patterning genes. Critically, we found sion and the erasure of previous epigenetic memory .We that expression of the SOXE-module amongst TNBCs was postulate that progressive erosion of the epigenome in SOX10+ proportional to overall transcriptional similarity to Sox10+ tumour-initiating cells simulates these conditions, driving migratory NCSCs from chick embryos , despite there being Published in partnership with the Breast Cancer Research Foundation npj Breast Cancer (2022) 57 J.M. Saunus et al. 19–23 minimal direct overlap in member genes. We also identified SOXE- preclinical data on the potential for epigenetic modulators to module hub genes as points of maximum network vulnerability as combat phenotypic plasticity in TNBC. candidate therapeutic targets. In support of this approach, two of these—BBOX1 and BCL11A—have already been validated as such 68–72 METHODS in TNBC . Human tissue samples (also see Table 2) To better understand the evolution of NCSC-like transcriptional This study involved immuno-detection of SOX10 and other biomarkers in reprogramming, we investigated potential links to the established the following human tissue cohorts: drivers of TNBC development—genomic instability, large-scale CNAs, and defective DNA repair. We identified several processes 1. Reduction mammoplasty (RM) samples: obtained in collaboration that correlate significantly with the SOXE-module eigengene (DSB with Dr William Cockburn (Wesley Hospital, Brisbane) and the Royal repair, APOBEC and TET1 activity, which are all demethylating); but Brisbane and Women’s Hospital (RBWH) Plastics Unit. Nineteen RM specimens were used for IHC and IF analysis, and two for most discernibly, the loss of lineage-specific methylation marks at methylation arrays. Age, parity and menopausal status of these CGI shores. Several mechanisms have been postulated to patients were unknown. 30% of cases showed fibrocystic change contribute to widespread methylome erosion in cancer, including and 10% presented with columnar cell lesions (histopathology 58,59 DSB repair and reduced availability of 5mC substrates through review by SRL). metabolic reprogramming . Accepting that there are probably 2. Clinically annotated, primary breast tumour samples: multiple contributing factors in any individual tumour, our findings nevertheless suggest that NCSC-like reprogramming a. A cross-sectional primary breast tumour cohort comprising samples from Australia (treated by the RBWH Breast Unit) and occurs concomitantly with epithelial de-programming in TNBC. the UK (Nottingham University Hospital), from patients treated in The gene regulatory networks that operate in NCSCs are amongst the mid-1980s to mid-1990s. Tumour blocks were sampled as 25,74 the most evolutionarily conserved in vertebrates . We postu- 0.6 mm cores in tissue microarrays (TMAs). For baseline late that when the broadly open chromatin landscape of the early characteristics see Supplementary Table 2. embryo is simulated in epigenetically eroded tumours, dominant b. Metaplastic carcinomas (Asia-Pacific Metaplastic Breast Cancer fate specifiers like SOX10 may recreate their ancestral regulatory Consortium (whole sections). circuits by default. 3. Patient-matched primary TNBC and brain metastases (n = 19 pairs). Tumour blocks were sampled as 1.0 mm cores in TMAs. In summary, our data indicate that the extent of promoter methylation loss in SOX10+ breast tumours correlates with their transcriptomic similarity to NCSCs—the earliest developmental cell Ethics approval state programmed by SOX10 activity and one synonymous with Human research ethics approval was obtained from the Royal Brisbane and migration, multipotency and phenotypic plasticity. We propose that Women’s Hospital (2005000785), The University of Queensland (HREC/ during TNBC development, progressive erosion of the epigenome 2005/022) and North West Greater Manchester Central Health (15/NW/ drives de-differentiation while simultaneously making cells vulner- 0685). Written patient consent to use tissue for research purposes was able to NCSC-like reprogramming. Broadly, these findings support obtained where required under the conditions of these approvals and all Table 2. Biological resources. Resource Source, identifier and relevant citations Related figure(s) Tissue samples Histologically normal breast FFPE whole sections The Brisbane breast bank 1a–e 48,76 Fresh RM surgical samples The Brisbane breast bank 1f, Supp-1b, Supp-6a 48,89 Australian BC series, FFPE TMA sections & clinical data Pathology Qld & The Brisbane breast bank 2e, f, 3d–e, Supp-2h-k 90,91 UK breast cancer series, FFPE TMA sections & clinical data Nottingham Breast Cancer Research Centre 2e, f, Supp-2h-k 51,92 Metaplastic tumour series, FFPE sections & clinical data Asia-Pacific MBC consortium 2g 48,89 Patient-matched primary TNBCs and brain metastases Pathology Qld & The Brisbane breast bank 2h Cancer cell lines 293 T ATCC CRL-3216™ Supp-1a, Supp-2g MDA-MB-435S ATCC HTB-129™ Supp-1a, Supp-2e, Supp-2g HCC38 ATCC CRL-2314™ Supp-2e HCC1569 ATCC CRL-2330™ Supp-2e, Supp-2g Primary melanoma cells (D41, D05) Dr. Chris Schmidt, QIMR Berghofer Supp-2e TaqMan gene expression assays SOX10 ThermoFisher, Hs00366918_m1 Supp-2e RPL13A ThermoFisher, Hs03043885_g1 Supp-2e shRNA sequences SOX10_1 Sigma-Aldrich TRCN0000018984 Supp-1a, Supp-2g SOX10_2 Sigma-Aldrich TRCN0000018987 Supp-1a, Supp-2g SOX10_3 Sigma-Aldrich TRCN0000018988 Supp-1a, Supp-2g Non-targeted negative control (NTNC) Sigma-Aldrich SHC002 Supp-1a, Supp-2g Supp supplementary. npj Breast Cancer (2022) 57 Published in partnership with the Breast Cancer Research Foundation J.M. Saunus et al. Table 3. software, code, and published datasets. ResRource Source, identifier and relevant citations Related figure(s) Related table(s) Software packages and code ChAMP https://bioconductor.org/packages/release/bioc/html/ChAMP.html 5d–f – Clustergrammer https://maayanlab.cloud/clustergrammer/ 3b Supp-10 85,86 Community detection algorithms Refs. Supp-4a – Epifactors database https://epifactors.autosome.ru Supp-5e – FACSDiva™ BD Biosciences, licensed 1f, Supp-6a – FCS Express (v7) De Novo Software, licensed 1f, Supp-6a – GSEAPreranked https://genepattern.org 3c, 4d, 5f, Supp-3 1, Supp-4, Supp- Ingenuity Pathways Analysis (IPA) Ingenuity, licensed – 1 MATLAB Mathworks, licensed Supp-4a Supp-10 Princeton Generic GO term finder https://go.princeton.edu 5a Supp-13, 14 Prism (v8.4.3) GraphPad, licensed Multiple S2 R package, Cluster https://cran.r-project.org/web/packages/cluster/index.html 5f – R package, FlashClust https://cran.r-project.org/web/packages/flashClust/index.html 5f, g Supp-14 R package, Limma https://www.bioconductor.org/packages/release/bioc/html/limma. Supp-3 Supp-3 html R package, t-SNE https://CRAN.R-project.org/package=Rtsne 5d – 52,53 R package, WGNCA https://cran.r-project.org/web/packages/WGCNA/index.html Multiple Multiple REVIGO http://revigo.irb.hr Supp-3 Supp-4 Singscore https://www.bioconductor.org/packages/release/bioc/html/ 4c – singscore.html SPSS IBM, licensed – Supp-2 Tableau desktop (2020.4) Tableau, licensed 4a – Published datasets Cell line expression data https://www.ebi.ac.uk/arrayexpress (E-TABM-157) Supp-2e, f – Cell line expression, CNA and https://www.ncbi.nlm.nih.gov/gds (GSE42944; GSE48216) Supp-2e, f – methylation datasets Chicken embryo neural crest gene set Ref. , Supplementary Table 1 4b–d Supp-11 Gene ontology resource http://geneontology.org – Supp-11 Genomic locations of solo-WCpGW sites Ref. Supp-5c – hMEC ChIP-seq data www.epigenomes.ca; ref. 1f – hMEC gene expression array data Gene expression omnibus, https://www.ncbi.nlm.nih.gov/geo/ 1e – (GSE16997); and ref. (Tables S5–8) Human reference genome NCBI build 37 UCSC Genome Browser https://genome.ucsc.edu 2d, Supp-5a – (GRCh37/hg19) ICGC gene expression data Ref. , Supplementary Table 7 – Supp-8 ICGC HRDetect scores Ref. , Supplementary Table 3b 5c – ICGC mutational signatures (COSMIC, v2 Ref. , Supplementary Table 21B, S21E 5c – SigProfiler) Illumina Infinium Omni2.5 array data https://www.ncbi.nlm.nih.gov/geo/ (GSE199579) 1f, Supp-5b – METABRIC gene expression & EGAD00010000210, EGAD00010000211, EGAS00000000083; EGA 2a, 3f, g, Supp-3, Supp-4, Supp-7 clinical data portal, via data access committee Supp-4c, d MetaCore https://portal.genego.com Supp-3 Supp-4 SOXE-module network metrics This paper 4a, f, 5b, g Supp-10 TCGA clinicopathologic annotation Ref. 2a–d, 3a – TCGA gene copy-number data Gistic2.Level_4; TCGA Data Analysis Center Firehose https://gdac. 2b, 5a, b, Supp-5a – broadinstitute.org TCGA gene-level methylation data Preprocess/meth.by_min_expr_corr; TCGA Data Analysis Center 2b, c – Firehose https://gdac.broadinstitute.org TCGA Illumina HiSeq RNASeq-v2 RSEM illuminahiseq_rnaseqv2-RSEM_genes_normalized (MD5); TCGA Data 2a, c Supp-4 level-3 normalised datasets Analysis Center Firehose https://gdac.broadinstitute.org TCGA Illumina HiSeq RNASeq-v2 RSEM TCGA Data Analysis Center Firehose https://gdac.broadinstitute. 3a, S3 Supp-3, 5, 6, 9, level-3 raw counts org 10, 12, 13 TCGA probe-level methylation data Humanmethylation_450; TCGA Data Analysis Center Firehose 5d–f, Supp-5b–d – https://gdac.broadinstitute.org Published in partnership with the Breast Cancer Research Foundation npj Breast Cancer (2022) 57 J.M. Saunus et al. Table 3 continued ResRource Source, identifier and relevant citations Related figure(s) Related table(s) Triple-negative breast cancer subtypes Ref. , Supplementary Table 19 3f, Supp-2b – (Burstein et al) 81,95 Tumour purity for TCGA cases Supp data-1 (CPE metric) & infinium metric, refs. Multiple – WGCNA ME dataset, ICGC cases This paper Multiple Supp-8 WGCNA ME dataset, METABRIC cases This paper Multiple Supp-7 WGCNA ME dataset, TCGA normal cases This paper Multiple Supp-12 WGCNA ME dataset, TCGA tumour cases This paper Multiple Supp-6 WGCNA mod membership dataset This paper Multiple Supp-5 (TCGA cohort) Supp supplementary. samples were de-identified in the analytical database. This study complies Fluorescence data acquisition, gate placement and sorting were performed with the World Medical Association Declaration of Helsinki. on a BD FACS Aria II instrument with FACSDiva software (v6.1.3; QIMR Berghofer). Sorted cells were collected on ice before being pelleted (80×g, 2 min) and snap-frozen at −70 °C. Immunohistochemistry (IHC) Formalin-fixed, paraffin-embedded (FFPE) tissue samples or TMAs were Methylation array profiling and ChIP-seq meta-analysis sectioned, deparaffinised, subjected to antigen retrieval and chromogeni- cally stained as described in ref. and detailed in Supplementary Table 1. DNA was extracted from FACS-sorted hMEC samples using the QIAGEN Slides were scanned using the Aperio ScanScope T2 digital scanning AllPrep DNA/RNA mini kit, with bisulphite conversion using the EZ DNA system at 40x magnification. TMA images were segmented using Spectrum methylation Kit (Zymo Research) following the manufacturer’s protocol software (Aperio), and high-resolution images of individual cores were with modification for Illumina methylation arrays. Bisulphite-converted extracted and scored by two experienced observers in a blinded fashion DNA was amplified and hybridised to Infinium methylationEPIC 850k (hidden metadata tags corresponding to TMA position were used to link beadchips (Illumina) according to the manufacturer’s protocol. Arrays were clinical and sample data). Digital image files were scored according to the scanned on an iScan, and data were processed using GenomeStudio criteria set out in the legends to Figs. 2e and S2h. (Illumina) with BMIQ array normalisation to derive average methylation beta-values. Histone modification ChIP-seq data were obtained from Pellacani et al. . Immunofluorescence (IF) Bigwig format files were retrieved from www.epigenomes.ca, and the FFPE RM tissue sections (Table 2) were sectioned, deparaffinised, subjected 76 mean signal/bin was plotted across the region chr22:38365030-38396083 to antigen retrieval and stained as described in ref. (Supplementary for each histone mark in each cell type. Table 1). Briefly, primary antibodies diluted in tris-buffered saline (TBS) were incubated on tissue sections for 1 h at room temperature, washed in TBS then incubated with secondary antibodies for 30 min in the dark. To Analysis of SOX10 expression in cell lines minimise tissue autofluorescence, slides were stained with SUDAN Black MDA-MB-435, HCC1569 and HCC38 cells were from the American Type Cell for 20 min in the dark (Sigma #S-2380), then washed (0.1% TBS-Tween Culture Collection (ATCC; (Table 2); authenticated in our laboratory and (30 min), TBS (10 min). Slides were mounted using Vectashield (Vecta Labs) cultured according to ATCC recommendations . D41 and D05 melanoma with DAPI (Sigma-Aldrich), cover-slipped, sealed and imaged on a Carl cells were selected from the primary melanoma cell line bank of Dr Chris Zeiss MicroImaging system using Axio Vision LE version 4.8.2 (PerkinElmer). Schmidt and Prof Nick Hayward (QIMR Berghofer) based on having high and low baseline SOX10 expression, respectively . Cells were routinely Fresh reduction mammoplasty (RM) tissue processing and cultured at 37 °C in a humidified atmosphere with 5% CO and routinely fluorescence-activated cell sorting (FACS) screened for mycoplasma. RNA and protein were extracted from cells in the exponential phase of growth using standard Trizol and RIPA buffer RM samples were processed, and single-cell suspensions were prepared as 48,76 methods . SOX10 mRNA was quantified relative to RPL13A as previously previously described (Table 2 and refs. ). Briefly, tissue was cut into described (ref. and Table 2). For Western analysis (MDA-MB-435, small pieces (~5 mm ) and digested overnight with agitation at 37 °C in HCC1569, HCC38 cells), protein lysates (30 μg) were resolved by SDS-PAGE DMEM-F12 (Gibco), foetal bovine serum ((FBS), 5%, Gibco), antibiotic/ then SOX10 and β-actin were detected using standard chemiluminescence antimycotic (Gibco), Amphotericin B (2.5 μg/mL, Gibco), collagenase type I-A (200 U/mL, Sigma-Aldrich) and Hyaluronidase I-S (100 U/mL, Sigma- (Supplementary Table 1). Aldrich). Epithelial organoids were obtained by centrifugation (80 × g, 1 min), then dissociated to single-cell suspensions for 5–10 min in TrypLE Stable-shRNA knockdown of SOX10 in breast cancer cell lines (Gibco), followed by Dispase (5 mg/mL, Gibco) and DNAse-I (100 ug/mL, Three pre-validated SOX10-targeted shRNA constructs, and a non-targeting Invitrogen). Enzymatic activity was quenched in ice-cold Hank’s Balanced negative control (NTNC) construct (pLKO.1), were purchased from Sigma- Salt Solution ((HBSS), Gibco) with 2% FBS and cells were filtered through a Aldrich (Table 2). Plasmid DNA was isolated from overnight bacterial 40-μm cell strainer (BD Falcon). cultures, then lentiviral particles were produced by triple transient Cell concentration and viability were determined using a Countess transfection of HEK-293T (human embryonic kidney) packaging cells with automated counter (Invitrogen) with trypan blue and adjusted to 2.0E /mL. one of the four transfer plasmids (pLKO.1-puro; 2 μg), together with Single-cell suspensions (typically 30–60 mL) were labelled for 10 min on ice TM companion plasmids encoding lentiviral packaging and replication with Sytox green (Invitrogen) plus a cocktail of fluorescent antibody elements (2 μg pHR’8.2ΔR + 0.25 μg pCMV-VSV-G; donated by Dr Wei Shi, conjugates to discriminate hMEC subsets (negatively gated, non-epithelial QIMR Berghofer). Virus-containing supernatants (in target cell media) were ‘lineage’ markers: CD31, CD45, CD140b; positively gated hMEC markers: then collected over the following two days and filtered (0.45 μm). MDA- CD49f, EpCAM—see Supplementary Table 1 and Supplementary Fig 6a). 4 2 MB-435 target cells were seeded at 3.1 × 10 /cm in six-well plates, then Samples were washed (80×g, 2 min) and then resuspended in cold HBSS+ after 24–48 h (at ~50% confluence), cells were infected with filtered viral 2% FBS. For robust fluorescence compensation and gating of specific hMEC populations, we also tested in parallel small samples stained with supernatants, supplemented with 1 mg/mL polybrene (Sigma-Aldrich) for isotype control antibodies, and ‘fluorescence minus one’ negative controls 24 h. Stably transduced cells were then selected with 1 μg/mL puromycin (samples from which one of the main conjugates was omitted). (Sigma-Aldrich) for 2 weeks to eliminate uninfected cells. npj Breast Cancer (2022) 57 Published in partnership with the Breast Cancer Research Foundation J.M. Saunus et al. Datasets and processing Modules were identified using the TCGA RNAseq (n = 919 samples after quality filtering) and validated using METABRIC (n = 1278; TCGA level-3 normalised RNAseq data ('rnaseqv2 illuminahiseq rnaseqv2 unc expression array; Supplementary Fig 6b–d). A consensus set of eight edu Level 3 RSEM genes normalised data.data.txt') from the Data Analysis modules was determined according to satisfactory concordance Center Firehose (http://firebrowse.org/) were used for all single-gene between these two orthogonal networks and a third was generated analyses (Supplementary Figs. 2a, 5e; test group stratification for Supplementary Fig. 3; SOX10 heatstrips in Fig. 3a and Supplementary from the ICGC dataset (n = 342; RNAseq). We further validated the Fig. 6a, c). Scaled estimate columns of the 'rnaseqv2 illuminahiseq rnaseqv2 eight consensus modules using preservation analysis on a third breast unc edu Level 3 RSEM genes data.data.txt' were used for all other cancer expression dataset. For normal breast samples, WGCNA was algorithmic analyses. performed independently on TCGA normal breast samples (n = 97 after For methylation datasets, TCGA level-3 Illumina HM450k data were quality filtering). downloaded from the National Cancer Institute Genomics Data Commons Standard WGCNA outputs include the following (raw data in Supple- (GDC) data portal (https://portal.gdc.cancer.gov/) and processed using the mentary Tables 5–11): ChAMP package . We applied the champ.filter function to remove Module eigengene (ME): a theoretical gene that is the most strongly problematic probes (those mapping to X/Y chromosomes, mapping to connected to all other genes in the module and hence represents net multiple locations, located near an SNP and non-CG probes). Filtered data module expression and connectivity. Mathematically, the first principal were normalised using the champ.norm function, according to the Beta- component of each module’s adjacency matrix. Mixture Quantile (BMIQ) algorithm; is an intra-sample normalisation Module membership and connectivity: Each gene is ascribed k values procedure that corrects the bias of type-2 probe values. describing modular and network connectivity (kTotal, kWithin and Level-4 GISTIC-2 copy-number data for TCGA cases were downloaded kOut). These continuous variables are amenable to integrated analysis from the Data Analysis Center Firehose (http://firebrowse.org/) and used of overlapping transcriptional programmes, utilising the granularity in for correlative analyses with no further processing. To apply tumour purity expression datasets rather than levelling it as is done when assigning cutoffs (TCGA cases), we used a consensus measurement of four different 81 fixed phenotypes or categories. kME correlation and kME p values purity estimation methods . describe how tightly individual genes are linked to all other genes With permission from the METABRIC data access committee, normalised within each module. Illumina HT 12 expression array data were downloaded from the European To identify hub genes (Supplementary Fig. 6e), additional network Genome-phenome Archive (EGAD00010000210-211). For the ICGC RNAseq 45 connectivity and influence measures were calculated for each node in dataset, normalised data were downloaded as supplementary data and the SOXE-module topological overlap matrix using igraph toolkit used with no further processing. Mutational signature data (COSMIC, v2 45 functions in R: SigProfiler) were downloaded as raw event counts from ref. and 57 betweenness centrality: betweenness(graph, v = V(graph), directed= HRDetect probability scores for these cases from ref. . FALSE, weights= NULL, nobigint = TRUE, normalised= FALSE). eigencentrality: eigencentrality(graph, directed= FALSE, scale= TRUE, Differential expression analysis of SOX10-high and -low TNBCs weights= NULL, options = arpack defaults). (Supplementary Fig. 3) 85,86 Finally, we used community detection algorithms to examine the To characterise the transcriptomic phenotype associated with SOX10 substructure of the SOXE-module (MATLAB 2020a), using the adjacency expression in TNBC, we performed differential expression analysis of matrix as input. This revealed a hierarchical, sub-modular organisation, and SOX10-high versus SOX10-low (median split) TCGA and METABRIC datasets consistently discriminated two partitions (59 and 41% of nodes each). To using limma (differential expression was defined by a corrected p value identify the module ‘control centre’ and hub genes as points of structural cutoff of 0.01). vulnerability, submodule assignment was cross-referenced against clus- tered Cosine similarity data (Fig. 3b, Clustergrammer ) with the same input (Supplementary Fig. 4). Ontology enrichment analyses GO term enrichment analysis was performed using the Generic GO term finder hosted by Princeton University (Lewis-Sigler Institute for Integrative Neural crest genesets Genomics; https://go.princeton.edu). Gene set enrichment analysis (GSEA) Geneset-1 (NC terms) comprises 308 genes represented in at least two of was performed using the Prerank function of GenePattern using 1000 the 78 terms matching ‘neural crest’ and ‘human’ in the gene ontology permutations. For Supplementary Fig 3, GSEA inputs comprised differen- database (http://geneontology.org). Geneset-2 (ch.NCSC) comprises the tially expressed genes (q ≤ 0.01) ranked by fold-change in each dataset. top 200 transcripts statistically over-represented in Sox10+ chick neural The input for all other GSEA experiments was whole transcriptome gene crest cells compared to all other embryo cells (fold-change 3.9–23.3; false lists ranked by a Spearman correlation coefficient. Biological process −03 −15 55 discovery rate 9.3E –1.0E ) (Supplementary Table 11). The ch.NCSC genesets (Gene Ontology v7.2; gene set size 15-500) were mined for gene set represents genes coordinately expressed with Sox10 in a stem cell unsupervised analyses and neural crest genesets for supervised analyses state hence was also suiTable-for network analyses (see below). We used (Supplementary Table 11). Datasets and ranking metrics are indicated in the singscore algorithm to score RNAseq datasets against the neural crest the respective Figure legends. Normalised enrichment scores (NES) and ® genesets at the individual sample level. corrected p values are reported. GeneGo (Metacore Clarivate Analytics) and Ingenuity Pathway Analysis (Ingenuity) were also used to analyse pre- ranked gene lists. REVIGO was used to resolve semantic redundancy and Breast cancer methylation data analyses identify major themes amongst the enriched terms. Methylation beta-values were derived from TCGA level-3 Illumina HM450k data as outlined above. Beta-values for all probes corresponding to TSS1500, TSS200 and 5′UTR regions in each sample were first normalised to Weighted gene co-expression network analysis (WGCNA)— correct for their bimodal distribution (median absolute deviation (MAD): P module identification and validation – median(P – median(R )); where P= probe in the promoter region and R β β WGCNA is a powerful network analysis tool that identifies groups of = all probes in promoter region). After filtering out genes with >2 missing transcripts (modules) that fluctuate in a highly coordinated fashion, 52,53 probes and those for which >2% of samples were missing data, the final implying co-functionality . First, it iteratively correlates the expression dataset included average MAD-normalised promoter methylation beta- of every pair of transcripts in a test dataset, producing an adjacency matrix. values for 4482 genes (determined from a total of 518 samples with It then converts this to a topological overlap matrix that reflects net complete clinical annotation). Pairwise Spearman correlations were then connection weight, accounting for both direct connections and the calculated between each promoter region and each module eigengene impacts of shared neighbours. In this study, we created ‘signed’ networks, across the sample cohort. Unsupervised hierarchical clustering of correla- which reflect the overall topological overlap considering both positive and tion values was performed in R using the Flashclust package based on the negative correlations. Dynamic module identification and characterisation Euclidean distance method. Clusters were visualised and validated with the (derivation of network metrics, sample eigengene values and module preservation in orthogonal datasets, see below) were performed in the R cluster package, using the Silhouette coefficient to confirm distinct clusters. coding environment, and publication-quality figures were prepared from To generate t-distributed stochastic neighbour embedding (t-SNE) plots, we raw datasets using GraphPad Prism or Clustergrammer (Table 2). used the Rtsne package (https://cran.r-project.org/web/packages/Rtsne/)on Published in partnership with the Breast Cancer Research Foundation npj Breast Cancer (2022) 57 J.M. Saunus et al. normalised beta methylation values, with 5000 iterations and a perplexity 23. Deblois, G. et al. Epigenetic switch-induced viral mimicry evasion in che- parameter of 40. motherapy resistant breast cancer. Cancer Discov. 10, 1312–1329 (2020). 24. Dravis, C. et al. Epigenetic and transcriptomic profiling of mammary gland development and tumor models disclose regulators of cell state plasticity. Cancer Reporting summary Cell 34, 466–482 e6 (2018). Further information on research design is available in the Nature Research 25. Hu, N., Strobl-Mazzulla, P. H. & Bronner, M. E. Epigenetic regulation in neural crest Reporting Summary linked to this article. development. Dev. Biol. 396, 159–168 (2014). 26. Southard-Smith, E. M., Kos, L. & Pavan, W. J. Sox10 mutation disrupts neural crest development in Dom Hirschsprung mouse model. Nat. Genet. 18,60–64 (1998). DATA AVAILABILITY 27. Kim, J., Lo, L., Dormand, E. & Anderson, D. J. SOX10 maintains multipotency and inhibits neuronal differentiation of neural crest stem cells. Neuron 38,17–31 Published datasets used in this paper are outlined in Table 3. Network data generated by the study are also outlined in Table 3, and available as supplementary data. Raw (2003). 28. McKeown, S. J., Lee, V. M., Bronner-Fraser, M., Newgreen, D. F. & Farlie, P. G. Sox10 DNA methylation array data for FACS-sorted normal breast epithelial cell subsets are overexpression induces neural crest-like cells from all dorsoventral levels of the available from the Gene Expression Omnibus (GSE199579; Table 3). neural tube but inhibits differentiation. Dev. Dyn. 233, 430–444 (2005). 29. Dravis, C. et al. Sox10 regulates stem/progenitor and mesenchymal cell states in mammary epithelial cells. Cell Rep. 12, 2035–2048 (2015). CODE AVAILABILITY 30. Chen, Z. et al. FGF signaling activates a Sox9-Sox10 pathway for the formation This study used published code and/or publicly available tools (see Table 3). and branching morphogenesis of mouse ocular glands. Development 141, 2691–2701 (2014). 31. Athwal, H. K. et al. Sox10 regulates plasticity of epithelial progenitors toward Received: 9 October 2021; Accepted: 5 April 2022; secretory units of exocrine glands. Stem Cell Rep. 12, 366–380 (2019). 32. Guo, W. et al. Slug and Sox9 cooperatively determine the mammary stem cell state. Cell 148, 1015–1028 (2012). 33. Mertelmeyer, S. et al. The transcription factor Sox10 is an essential determinant of branching morphogenesis and involution in the mouse mammary gland. Sci. Rep. REFERENCES 10, 17807 (2020). 1. Fulford, L. G. et al. Basal-like grade III invasive ductal carcinoma of the breast: 34. Kim, Y. J. et al. Generation of multipotent induced neural crest by direct repro- patterns of metastasis and long-term survival. Breast Cancer Res. 9, R4 (2007). gramming of human postnatal fibroblasts with a single transcription factor. Cell 2. Prat, A. et al. Phenotypic and molecular characterization of the claudin-low Stem Cell 15, 497–506 (2014). intrinsic subtype of breast cancer. Breast cancer Res. 12, R68 (2010). 35. Ivanov, S. V. et al. Diagnostic SOX10 gene signatures in salivary adenoid cystic 3. Symmans, W. F. et al. Long-term prognostic risk after neoadjuvant chemotherapy and breast basal-like carcinomas. Br. J. Cancer 109, 444–451 (2013). associated with residual cancer burden and breast cancer subtype. J. Clin. Oncol. 36. Panaccione, A., Guo, Y., Yarbrough, W. G. & Ivanov, S. V. Expression profiling of 35, 1049–1060 (2017). clinical specimens supports the existence of neural progenitor-like stem cells in 4. Gao, R. et al. Punctuated copy number evolution and clonal stasis in triple- basal breast cancers. Clin. Breast Cancer 17, 298–306 e7 (2017). negative breast cancer. Nat. Genet. 48, 1119–1130 (2016). 37. Cimino-Mathews, A. et al. Neural crest transcription factor Sox10 is preferentially 5. Wang, Y. et al. Clonal evolution in breast cancer revealed by single nucleus expressed in triple-negative and metaplastic breast carcinomas. Hum. Pathol. 44, genome sequencing. Nature 512, 155–160 (2014). 959–965 (2013). 6. Yates, L. R. et al. Subclonal diversification of primary breast cancer revealed by 38. Jamidi, S. K. et al. SOX10 as a sensitive marker for triple negative breast cancer. multiregion sequencing. Nat. Med. 21, 751–759 (2015). Histopathology 77, 936–948 (2020). 7. Yang, F. et al. Intratumor heterogeneity predicts metastasis of triple-negative 39. Burstein, M. D. et al. Comprehensive genomic analysis identifies novel subtypes breast cancer. Carcinogenesis 38, 900–909 (2017). and targets of triple-negative breast cancer. Clin. Cancer Res 21, 1688–1698 (2015). 8. Lin, B. et al. Modulating cell fate as a therapeutic strategy. Cell Stem Cell 23, 40. Hu, N., Strobl-Mazzulla, P. H., Simoes-Costa, M., Sanchez-Vasquez, E. & Bronner, M. 329–341 (2018). E. DNA methyltransferase 3B regulates duration of neural crest production via 9. Nguyen, D. X., Bos, P. D. & Massagué, J. Metastasis: from dissemination to organ- repression of Sox10. Proc. Natl Acad. Sci. USA 111, 17911–17916 (2014). specific colonization. Nat. Rev. Cancer 9, 274–284 (2009). 41. Strobl-Mazzulla, P. H. & Bronner, M. E. A PHD12-Snail2 repressive complex epi- 10. Gupta, P. B., Pastushenko, I., Skibinski, A., Blanpain, C. & Kuperwasser, C. Pheno- genetically mediates neural crest epithelial-to-mesenchymal transition. J. Cell Biol. typic plasticity: driver of cancer initiation, progression, and therapy resistance. 198, 999–1010 (2012). Cell Stem Cell 24,65–78 (2019). 42. Pellacani, D. et al. Analysis of normal human mammary epigenomes reveals cell- 11. Hinohara, K. & Polyak, K. Intratumoral heterogeneity: more than just mutations. specific active enhancer states and associated transcription factor networks. Cell Trends Cell Biol. 29, 569–579 (2019). Rep. 17, 2060–2074 (2016). 12. Bell, C. C. & Gilan, O. Principles and mechanisms of non-genetic resistance in 43. Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast cancer. Br. J. Cancer 122, 465–472 (2020). tumours reveals novel subgroups. Nature 486, 346–352 (2012). 13. Granit, R. Z. et al. Regulation of cellular heterogeneity and rates of symmetric and 44. TCGA. Cancer Genome Atlas Network: Comprehensive molecular portraits of asymmetric divisions in triple-negative breast cancer. Cell Rep. 24, 3237–3250 human breast tumours. Nature 490,61–70 (2012). (2018). 45. Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole- 14. Keller, P. J. et al. Defining the cellular precursors to human breast cancer. Proc. genome sequences. Nature 534,47–54 (2016). Natl Acad. Sci. USA 109, 2772–2777 (2012). 46. Daemen, A. et al. Modeling precision treatment of breast cancer. Genome Biol. 14, 15. Lim, E. et al. Aberrant luminal progenitors as the candidate target population for R110 (2013). basal tumor development in BRCA1 mutation carriers. Nat. Med. 15, 907–913 47. Neve, R. M. et al. A collection of breast cancer cell lines for the study of func- (2009). tionally distinct cancer subtypes. Cancer Cell 10, 515–527 (2006). 16. Molyneux, G. et al. BRCA1 basal-like breast cancers originate from luminal epi- 48. McCart Reed, A. E. et al. The Brisbane breast bank. Open J. Bioresour. 5, 5 (2018). thelial progenitors and not from basal stem cells. Cell Stem Cell 7, 403–417 (2010). 49. Saunus, J. M. et al. Multidimensional phenotyping of breast cancer cell lines to 17. Proia, T. A. et al. Genetic predisposition directs breast cancer phenotype by guide preclinical research. Breast Cancer Res. Treat. 167, 289–301 (2018). dictating progenitor cell fate. Cell Stem Cell 8, 149–163 (2011). 50. Qi, J. et al. SOX10 - A novel marker for the differential diagnosis of breast 18. Chaffer, C. L. et al. Normal and neoplastic nonstem cells can spontaneously metaplastic squamous cell carcinoma. Cancer Manag. Res 12, 4039–4044 (2020). convert to a stem-like state. Proc. Natl Acad. Sci. USA 108, 7950–7955 (2011). 51. McCart Reed, A. E. et al. Phenotypic and molecular dissection of metaplastic 19. Hinohara, K. et al. KDM5 histone demethylase activity links cellular transcriptomic breast cancer and the prognostic implications. J. Pathol. 247, 214–227 (2019). heterogeneity to therapeutic resistance. Cancer Cell. 34, 939–953 e9 (2018). 52. Zhang, B. & Horvath, S. A general framework for weighted gene co-expression 20. Risom, T. et al. Differentiation-state plasticity is a targetable resistance mechan- network analysis. Stat. Appl. Genet. Mol. Biol. 4, Article17 (2005). ism in basal-like breast cancer. Nat. Commun. 9, 3815 (2018). 53. Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation 21. Flavahan, W. A., Gaskell, E. & Bernstein, B. E. Epigenetic plasticity and the hall- network analysis. BMC Bioinforma. 9, 559 (2008). marks of cancer. Science. 357, eaal2380 (2017). 54. Denkert, C. et al. Tumour-infiltrating lymphocytes and prognosis in different 22. Stirzaker, C. et al. Methylome sequencing in triple-negative breast cancer reveals subtypes of breast cancer: a pooled analysis of 3771 patients treated with distinct methylation clusters with prognostic value. Nat. Commun. 6, 5899 (2015). neoadjuvant therapy. Lancet Oncol. 19,40–50 (2018). npj Breast Cancer (2022) 57 Published in partnership with the Breast Cancer Research Foundation J.M. Saunus et al. 55. Simoes-Costa, M., Tan-Cabugao, J., Antoshechkin, I., Sauka-Spengler, T. & Bronner, 87. Fernandez, N. F. et al. Clustergrammer, a web-based heatmap visualization and M. E. Transcriptome analysis reveals novel players in the cranial neural crest gene analysis tool for high-dimensional biological data. Sci. Data 4, 170151 (2017). regulatory network. Genome Res. 24, 281–290 (2014). 88. Foroutan, M. et al. Single sample scoring of molecular phenotypes. BMC Bioin- 56. Pellacani, D., Tan, S., Lefort, S. & Eaves, C. J. Transcriptional regulation of normal forma. 19, 404 (2018). human mammary cell heterogeneity and its perturbation in breast cancer. EMBO 89. Kalita-de Croft, P. et al. Clinicopathologic significance of nuclear HER4 and J. 38, e100330 (2019). phospho-YAP(S(127)) in human breast cancers and matching brain metastases. 57. Davies, H. et al. HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on Ther. Adv. Med. Oncol. 12, 1758835920946259 (2020). mutational signatures. Nat. Med. 23, 517–525 (2017). 90. Tarek, M. A. et al. SPAG5 as a prognostic biomarker and chemotherapy sensitivity 58. Hayano, M. et al. DNA break-induced epigenetic drift as a cause of mammalian predictor in breast cancer: a retrospective integrated genomic transcriptomic and aging. Preprint at bioRxiv https://doi.org/10.1101/808659 (2019). protein analysis. Lancet Oncol 17, 1004–1018 (2016). 59. Yang, J.-H. et al. Erosion of the Epigenetic Landscape and Loss of Cellular Identity as 91. Tarek, M. A. et al. Association of Sperm-Associated Antigen 5 and Treatment a Cause of Aging in Mammals. BioRxiv preprint: https://doi.org/10.1101/808642. Response in Patients With Estrogen Receptor–Positive Breast Cancer. JAMA Network (2019). Open 3, e209486 (2020). 60. Zhou, W. et al. DNA methylation loss in late-replicating domains is linked to 92. Kalaw, E. et al. Metaplastic breast cancers frequently express immune checkpoint mitotic cell division. Nat. Genet. 50, 591–602 (2018). markers FOXP3 and PD-L1. Br J Cancer 123, 1665–1672 (2020). 61. Medvedeva, Y. A. et al. EpiFactors: a comprehensive database of human epige- 93. Boyle, E. I. et al. GO::TermFinder–open source software for accessing Gene netic factors and complexes. Database 2015, bav067 (2015). Ontology information and finding significantly enriched Gene Ontology terms 62. Jin, C. et al. TET1 is a maintenance DNA demethylase that prevents methylation associated with a list of genes. Bioinformatics 20, 3710–3715 (2004). spreading in differentiated cells. Nucleic Acids Res. 42, 6956–6971 (2014). 94. Liu, J. et al. An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High- 63. Putiri, E. L. et al. Distinct and overlapping control of 5-methylcytosine and Quality Survival Outcome Analytics. Cell 173, 400–416.e11 (2018). 5-hydroxymethylcytosine by the TET proteins in human cancer cells. Genome Biol. 95. Zheng, X., Zhang, N., Wu, H. J. & Wu, H. Estimating and accounting for tumor 15, R81 (2014). purity in the analysis of DNA methylation data from cancer studies. Genom Biol 64. Good, C. R. et al. TET1-mediated hypomethylation activates oncogenic signaling 18, https://doi.org/10.1186/s13059-016-1143-5 (2017). in triple-negative breast cancer. Cancer Res. 78, 4126–4137 (2018). 65. Feinberg, A. P., Ohlsson, R. & Henikoff, S. The epigenetic progenitor origin of human cancer. Nat. Rev. Genet. 7,21–33 (2006). ACKNOWLEDGEMENTS 66. Wahl, G. M. & Spike, B. T. Cell state plasticity, stem cells, EMT, and the generation We thank the many thousands of patients who have donated tissue for cancer of intra-tumoral heterogeneity. NPJ Breast Cancer 3, 14 (2017). research, and clinical staff who facilitate biobanking, particularly the Brisbane 67. Visvader, J. E. & Stingl, J. Mammary stem cells and the differentiation hierarchy: Breast Bank and Pathology Queensland. We acknowledge the support of Metro current status and perspectives. Genes Dev. 28, 1143–1158 (2014). North Hospital and Health Services for the collection of the clinical subject data 68. Liao, C. & Zhang, Q. BBOX1 promotes triple-negative breast cancer progression by controlling IP3R3 stability. Mol. Cell Oncol. 7, 1813526 (2020). and clinical subject materials. We are grateful to Dr Lynne Reid and Clay Winterford 69. Liao, C. et al. Identification of BBOX1 as a therapeutic target in triple-negative for valuable contributions; Dr Katia Nones (QIMR Berghofer) who supervised XMDL; breast cancer. Cancer Discov. 10, 1706–1721 (2020). Dr Chris Schmidt (QIMR Berghofer) and Prof. Alex Swarbrick (Garvan Institute) for 70. Zhu, L., Pan, R., Zhou, D., Ye, G. & Tan, W. BCL11A enhances stemness and donating cell lines; Dr William Cockburn and clinical staff (Wesley Hospital) for promotes progression by activating Wnt/beta-catenin signaling in breast cancer. normal breast tissue collections; Drs. Nic Waddell and Olga Kondrashova Cancer Manag. Res. 11, 2997–3007 (2019). (QIMR Berghofer) for supportive data analyses; and Drs. Juliet French (QIMR 71. Errico, A. Genetics: BCL11A-targeting triple-negative breast cancer? Nat. Rev. Clin. Berghofer) and Delphine Merino (Olivia Newton-John Cancer Research Institute) Oncol. 12, 127 (2015). for critical feedback. This study makes use of data generated by the Molecular 72. Khaled, W. T. et al. BCL11A is a triple-negative breast cancer gene with critical Taxonomy of Breast Cancer International Consortium, funded by Cancer Research functions in stem and progenitor cells. Nat. Commun. 6, 5987 (2015). UK, and the British Columbia Cancer Agency Branch. It was funded by NHRMC 73. Saggese, P. et al. Metabolic regulation of epigenetic modifications and cell dif- programme awards to S.R.L., G.C.-T. and K.K.K. (APP1017028 and APP1113867), ferentiation in cancer. Cancers 12, 3788 (2020). NHRMC project grants to PTS (APP1080985 and APP1164770) and an Australian 74. Simoes-Costa, M. & Bronner, M. E. Establishing neural crest identity: a gene Leadership Award to A.R. regulatory recipe. Development 142, 242–257 (2015). 75. Saunus, J. M. et al. Integrated genomic and transcriptomic analysis of human brain metastases identifies alterations of potential clinical significance. J. Pathol. 237, 363–378 (2015). AUTHOR CONTRIBUTIONS 76. Johnston, R. L. et al. High content screening application for cell-type specific Conception and design: J.M.S., X.M.D.L., K.N., A.R., D.V.N., P.T.S. and S.R.L. Data behaviour in heterogeneous primary breast epithelial subpopulations. Breast collection/contribution: J.M.S., X.M.D.L., K.N., A.R., A.H., A.E.M.R., M.L., A.C.V., J.R.K., Cancer Res. 18, 18 (2016). A.J.D., M.M., E.K., P.K.-d.C., I.G., F.A.-E., J.M.W.G., C.O., K.K.K., J.B., G.C.-T., A.R.G., E.A.R., 77. Pavey, S. et al. Microarray expression profiling in melanoma reveals a BRAF I.O.E., D.V.N. and P.T.S. Data analysis: J.M.S., X.M.D.L., K.N., A.R., A.H., S.L., D.V.N. mutation signature. Oncogene 23, 4060–4067 (2004). Manuscript drafting: J.M.S., A.E.M.R., D.V.N., P.T.S. and S.R.L. All authors read and 78. Momeny, M. et al. Heregulin-HER3-HER2 signaling promotes matrix approved the final manuscript. metalloproteinase-dependent blood-brain-barrier transendothelial migration of human breast cancer cell lines. Oncotarget 6, 3932–3946 (2015). 79. Vargas, A. C. et al. Gene expression profiling of tumour epithelial and stromal compartments during breast cancer progression. Breast Cancer Res. Treat. 135, COMPETING INTERESTS 153–165 (2012). The authors declare no competing interests. 80. Tian, Y. et al. ChAMP: updated methylation analysis pipeline for Illumina Bead- Chips. Bioinformatics 33, 3982–3984 (2017). 81. Aran, D., Sirota, M. & Butte, A. J. Systematic pan-cancer analysis of tumour purity. Nat. Commun. 6, 8971 (2015). ADDITIONAL INFORMATION 82. Ritchie, M. E. et al. limma powers differential expression analyses for RNA- Supplementary information The online version contains supplementary material sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015). available at https://doi.org/10.1038/s41523-022-00425-x. 83. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. Usa. 102, Correspondence and requests for materials should be addressed to Jodi M. Saunus 15545–15550 (2005). or Sunil R. Lakhani. 84. Supek, F., Bosnjak, M., Skunca, N. & Smuc, T. REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS ONE 6, e21800 (2011). Reprints and permission information is available at http://www.nature.com/ 85. Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of reprints communities in large networks. J. Stat. Mech. Theory Exp. 2008, P10008 (2008). 86. Lambiotte, R., Delvenne, J. C. & Barahona, M. IEEE Trans. Netw. Sci. Eng. 1, 76-90 Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims https://doi.org/10.1109/TNSE.2015.2391998 (2014). in published maps and institutional affiliations. Published in partnership with the Breast Cancer Research Foundation npj Breast Cancer (2022) 57 J.M. Saunus et al. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/. © The Author(s) 2022 npj Breast Cancer (2022) 57 Published in partnership with the Breast Cancer Research Foundation

Journal

npj Breast CancerSpringer Journals

Published: May 2, 2022

References