Access the full text.
Sign up today, get DeepDyve free for 14 days.
(2018)
Publisher's NoteAnaesthesia, 73
Steffen Klamt, U. Haus, Fabian Theis (2009)
Hypergraphs and Cellular NetworksPLoS Computational Biology, 5
E. Schmidt, E. Birney, David Croft, B. Bono, P. D’Eustachio, M. Gillespie, Gopal Gopinath, B. Jassal, S. Lewis, L. Matthews, L. Stein, Imre Vastrik, Guanming Wu (2004)
Reactome: a knowledgebase of biological pathwaysNucleic Acids Research, 33
M. Chin, K. Maemura, S. Fukumoto, M. Jain, M. Layne, Masafumi Watanabe, C. Hsieh, Mu-En Lee (2000)
Cardiovascular Basic Helix Loop Helix Factor 1, a Novel Transcriptional Repressor Expressed Preferentially in the Developing and Adult Cardiovascular System*The Journal of Biological Chemistry, 275
Michael Schwob, J. Zhan, A. Dempsey (2019)
Modeling Cell Communication with Time-Dependent Signaling HypergraphsIEEE/ACM Transactions on Computational Biology and Bioinformatics, 18
V. Acuña, P. Milreu, Ludovic Cottret, A. Marchetti-Spaccamela, L. Stougie, M. Sagot (2012)
Algorithms and complexity of enumerating minimal precursor sets in genome-wide metabolic networksBioinformatics, 28 19
Zhenjun Hu, J. Mellor, Jie Wu, M. Kanehisa, Joshua Stuart, C. DeLisi (2007)
Towards zoomable multidimensional maps of the cellNature Biotechnology, 25
Maya Huguenin, E. Müller, Sandra Trachsel-Rösmann, B. Oneda, Daniel Ambort, E. Sterchi, D. Lottaz (2008)
The Metalloprotease Meprinβ Processes E-Cadherin and Weakens Intercellular AdhesionPLoS ONE, 3
G. Ausiello, L. Laura (2017)
Directed hypergraphs: Introduction and fundamental algorithms - A surveyTheor. Comput. Sci., 658
Lenwood Heath, A. Sioson (2009)
Semantics of Multimodal Network ModelsIEEE/ACM Transactions on Computational Biology and Bioinformatics, 6
Wanding Zhou, Luay Nakhleh (2011)
Properties of metabolic graphs: biological organization or representation artifacts?BMC Bioinformatics, 12
S. Miravet, J. Piedra, J. Castaño, Imma Raurell, C. Francı́, M. Duñach, A. Herreros (2003)
Tyrosine Phosphorylation of Plakoglobin Causes Contrary Effects on Its Association with Desmosomes and Adherens Junction Components and Modulates β-Catenin-Mediated TranscriptionMolecular and Cellular Biology, 23
(Cottret L, Vieira Milreu P, Acuña V, Marchetti-Spaccamela A, Viduani Martinez F, Sagot M-F, Stougie L. Enumerating precursor sets of target metabolites in a metabolic network. In: Proceedings of the 8th Workshop on Algorithms in Bioinformatics (WABI). 2008. p. 233–244.)
Cottret L, Vieira Milreu P, Acuña V, Marchetti-Spaccamela A, Viduani Martinez F, Sagot M-F, Stougie L. Enumerating precursor sets of target metabolites in a metabolic network. In: Proceedings of the 8th Workshop on Algorithms in Bioinformatics (WABI). 2008. p. 233–244.Cottret L, Vieira Milreu P, Acuña V, Marchetti-Spaccamela A, Viduani Martinez F, Sagot M-F, Stougie L. Enumerating precursor sets of target metabolites in a metabolic network. In: Proceedings of the 8th Workshop on Algorithms in Bioinformatics (WABI). 2008. p. 233–244., Cottret L, Vieira Milreu P, Acuña V, Marchetti-Spaccamela A, Viduani Martinez F, Sagot M-F, Stougie L. Enumerating precursor sets of target metabolites in a metabolic network. In: Proceedings of the 8th Workshop on Algorithms in Bioinformatics (WABI). 2008. p. 233–244.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations
L. Nielsen, D. Pretolani (2001)
A remark on the definition of B-hyperpath
Anna Ritz, T. Murali (2014)
Pathway analysis with signaling hypergraphsProceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
H. Hamacher, M. Queyranne (1985)
K best solutions to combinatorial optimization problemsAnnals of Operations Research, 4
Pablo Carbonell, Davide Fichera, S. Pandit, J. Faulon (2012)
Enumerating metabolic pathways for the production of heterologous target chemicals in chassis organismsBMC Systems Biology, 6
Spencer Krieger, J. Kececioglu (2021)
Fast Approximate Shortest Hyperpaths for Inferring Pathways in Cell Signaling Hypergraphs
B. Jiang, E. Rue, Guang-lan Wang, Rick Roe, G. Semenza (1996)
Dimerization, DNA Binding, and Transactivation Properties of Hypoxia-inducible Factor 1*The Journal of Biological Chemistry, 271
(Krieger S, Kececioglu J. Hhugin: hypergraph heuristic for general shortest source-sink hyperpaths, version 1.0. 2021 http://hhugin.cs.arizona.edu)
Krieger S, Kececioglu J. Hhugin: hypergraph heuristic for general shortest source-sink hyperpaths, version 1.0. 2021 http://hhugin.cs.arizona.eduKrieger S, Kececioglu J. Hhugin: hypergraph heuristic for general shortest source-sink hyperpaths, version 1.0. 2021 http://hhugin.cs.arizona.edu, Krieger S, Kececioglu J. Hhugin: hypergraph heuristic for general shortest source-sink hyperpaths, version 1.0. 2021 http://hhugin.cs.arizona.edu
(2021)
Hhugin: hypergraph heuristic for general shortest source-sink hyperpaths, version
E. Demir, Michael Cary, S. Paley, Ken Fukuda, C. Lemer, Imre Vastrik, Guanming Wu, P. D’Eustachio, C. Schaefer, Joanne Luciano, F. Schacherer, Irma Martínez-Flores, Zhenjun Hu, V. Jiménez-Jacinto, G. Joshi-Tope, K. Kandasamy, Alejandra López-Fuentes, H. Mi, E. Pichler, I. Rodchenkov, A. Splendiani, S. Tkachev, Jeremy Zucker, Gopal Gopinath, H. Rajasimha, R. Ramakrishnan, I. Shah, M. Syed, Nadia Anwar, O. Babur, M. Blinov, Erik Brauner, D. Corwin, S. Donaldson, F. Gibbons, Robert Goldberg, P. Hornbeck, Augustin Luna, Peter Murray-Rust, Eric Neumann, Oliver Reubenacker, M. Samwald, Martijn Iersel, S. Wimalaratne, Keith Allen, Burk Braun, M. Whirl‐Carrillo, K. Dahlquist, A. Finney, M. Gillespie, E. Glass, L. Gong, R. Haw, Michael Honig, Olivier Hubaut, D. Kane, Shiva Krupa, M. Kutmon, Julie Leonard, D. Marks, D. Merberg, V. Petri, A. Pico, Dean Ravenscroft, Liya Ren, N. Shah, M. Sunshine, Rebecca Tang, R. Whaley, Stan Letovksy, K. Buetow, A. Rzhetsky, V. Schachter, B. Sobral, U. Dogrusoz, S. McWeeney, M. Aladjem, E. Birney, J. Collado-Vides, S. Goto, M. Hucka, N. Novère, N. Maltsev, A. Pandey, P. Thomas, E. Wingender, P. Karp, C. Sander, Gary Bader (2010)
BioPAX – A community standard for pathway data sharingNature biotechnology, 28
Nicholas Franzese, Nicholas Franzese, Adam Groce, T. Murali, Anna Ritz (2019)
Hypergraph-based connectivity measures for signaling pathway topologiesPLoS Computational Biology, 15
Ludovic Cottret, P. Milreu, V. Acuña, A. Marchetti-Spaccamela, F. Martinez, M. Sagot, L. Stougie (2008)
Enumerating Precursor Sets of Target Metabolites in a Metabolic Network
S. Dolomatov, Elizaveta Ageeva, Walery Zukow (2018)
MOLECULAR BIOLOGY OF THE CELLColor Atlas of Clinical Hematology
Tobias Christensen, A. Oliveira, J. Nielsen (2009)
Reconstruction and logical modeling of glucose repression signaling pathways in Saccharomyces cerevisiaeBMC Systems Biology, 3
X. Hou, Jessica Liu, W. Liu, C.-Y. Liu, Z. Liu, Z.-Y. Sun (2011)
A new role of NUAK1: directly phosphorylating p53 and regulating cell proliferationOncogene, 30
T. Iso, Gene Chung, Y. Hamamori, L. Kedes (2002)
HERP1 Is a Cell Type-specific Primary Target of Notch*The Journal of Biological Chemistry, 277
Ublishing Roup, Site Icenses (2008)
Nature Publishing Group
Xin-She Yang (2021)
Introduction to AlgorithmsNature-Inspired Optimization Algorithms
M. Vidal, M. Cusick, A. Barabási (2011)
Interactome Networks and Human DiseaseCell, 144
Ricardo Andrade, Martin Wannagat, C. Klein, V. Acuña, A. Marchetti-Spaccamela, P. Milreu, L. Stougie, M. Sagot (2016)
Enumeration of minimal stoichiometric precursor sets in metabolic networksAlgorithms for Molecular Biology : AMB, 11
(Ramadan E, Tarafdar A, Pothen A. A hypergraph model for the yeast protein complex network. In: Proceedings of the 18th Parallel and Distributed Processing Symposium. 2004. p. 189–196.)
Ramadan E, Tarafdar A, Pothen A. A hypergraph model for the yeast protein complex network. In: Proceedings of the 18th Parallel and Distributed Processing Symposium. 2004. p. 189–196.Ramadan E, Tarafdar A, Pothen A. A hypergraph model for the yeast protein complex network. In: Proceedings of the 18th Parallel and Distributed Processing Symposium. 2004. p. 189–196., Ramadan E, Tarafdar A, Pothen A. A hypergraph model for the yeast protein complex network. In: Proceedings of the 18th Parallel and Distributed Processing Symposium. 2004. p. 189–196.
G. Italiano, U. Nanni (1989)
Online Maintenance of Minimal Directed Hypergraphs
E. Demir, O. Babur, I. Rodchenkov, B. Aksoy, Ken Fukuda, Benjamin Gross, Selçuk Sümer, Gary Bader, C. Sander (2013)
Using Biological Pathway Data with PaxtoolsPLoS Computational Biology, 9
R. Sharan, T. Ideker (2006)
Modeling cellular machinery through biological network comparisonNature Biotechnology, 24
(Ramadan E, Perincheri S, Tuck D. A hyper-graph approach for analyzing transcriptional networks in breast cancer. In: Proceedings of the 1st ACM Conference on Bioinformatics and Computational Biology (ACM-BCB). 2010:556–562.)
Ramadan E, Perincheri S, Tuck D. A hyper-graph approach for analyzing transcriptional networks in breast cancer. In: Proceedings of the 1st ACM Conference on Bioinformatics and Computational Biology (ACM-BCB). 2010:556–562.Ramadan E, Perincheri S, Tuck D. A hyper-graph approach for analyzing transcriptional networks in breast cancer. In: Proceedings of the 1st ACM Conference on Bioinformatics and Computational Biology (ACM-BCB). 2010:556–562., Ramadan E, Perincheri S, Tuck D. A hyper-graph approach for analyzing transcriptional networks in breast cancer. In: Proceedings of the 1st ACM Conference on Bioinformatics and Computational Biology (ACM-BCB). 2010:556–562.
Anna Ritz, A. Tegge, Hyunju Kim, Christopher Poirel, T. Murali (2014)
Signaling hypergraphs.Trends in biotechnology, 32 7
(Ritz A, Murali TM. Pathway analysis with signaling hypergraphs. In: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB). 2014. p. 249–258.)
Ritz A, Murali TM. Pathway analysis with signaling hypergraphs. In: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB). 2014. p. 249–258.Ritz A, Murali TM. Pathway analysis with signaling hypergraphs. In: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB). 2014. p. 249–258., Ritz A, Murali TM. Pathway analysis with signaling hypergraphs. In: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB). 2014. p. 249–258.
E. Ramadan, S. Perincheri, D. Tuck (2010)
A hyper-graph approach for analyzing transcriptional networks in breast cancer
Felipe Palacios, J. Tushir, Yasuyuki Fujita, C. D’Souza-Schorey (2005)
Lysosomal Targeting of E-Cadherin: a Unique Mechanism for the Down-Regulation of Cell-Cell Adhesion during Epithelial to Mesenchymal TransitionsMolecular and Cellular Biology, 25
(2017)
Pathway analysis with signaling hypergraphsIEEE/ACM Trans Comput Biol Bioinform, 14
(2009)
PID: the pathway interaction databaseNucl Acids Res, 37
G. Gallo, Giustino Longo, S. Pallottino (1993)
Directed Hypergraphs and ApplicationsDiscret. Appl. Math., 42
Yongsheng Li, D. McGrail, N. Latysheva, S. Yi, M. Babu, Nidhi Sahni, Nidhi Sahni (2020)
Pathway perturbations in signaling networks: Linking genotype to phenotype.Seminars in cell & developmental biology
(Nielsen LR, Pretolani D. A remark on the definition of a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B$$\end{document}B-hyperpath. Technical Report, Department of Operations Research, University of Aarhus. 2001.)
Nielsen LR, Pretolani D. A remark on the definition of a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B$$\end{document}B-hyperpath. Technical Report, Department of Operations Research, University of Aarhus. 2001.Nielsen LR, Pretolani D. A remark on the definition of a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B$$\end{document}B-hyperpath. Technical Report, Department of Operations Research, University of Aarhus. 2001., Nielsen LR, Pretolani D. A remark on the definition of a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B$$\end{document}B-hyperpath. Technical Report, Department of Operations Research, University of Aarhus. 2001.
(Italiano GF, Nanni U. Online maintenance of minimal directed hypergraphs. Technical Report, Department of Computer Science, Columbia University. 1989.)
Italiano GF, Nanni U. Online maintenance of minimal directed hypergraphs. Technical Report, Department of Computer Science, Columbia University. 1989.Italiano GF, Nanni U. Online maintenance of minimal directed hypergraphs. Technical Report, Department of Computer Science, Columbia University. 1989., Italiano GF, Nanni U. Online maintenance of minimal directed hypergraphs. Technical Report, Department of Computer Science, Columbia University. 1989.
B Alberts (2007)
10.1201/9780203833445
E. Ramadan, Arijit Tarafdar, A. Pothen (2004)
A hypergraph model for the yeast protein complex network18th International Parallel and Distributed Processing Symposium, 2004. Proceedings.
(Krieger S, Kececioglu J. Fast approximate shortest hyperpaths for inferring pathways in cell signaling hypergraphs. In: Proceedings of the 21st ISCB Workshop on Algorithms in Bioinformatics (WABI). Leibniz International Proceedings in Informatics, vol 201. 2021. p. 1–20.)
Krieger S, Kececioglu J. Fast approximate shortest hyperpaths for inferring pathways in cell signaling hypergraphs. In: Proceedings of the 21st ISCB Workshop on Algorithms in Bioinformatics (WABI). Leibniz International Proceedings in Informatics, vol 201. 2021. p. 1–20.Krieger S, Kececioglu J. Fast approximate shortest hyperpaths for inferring pathways in cell signaling hypergraphs. In: Proceedings of the 21st ISCB Workshop on Algorithms in Bioinformatics (WABI). Leibniz International Proceedings in Informatics, vol 201. 2021. p. 1–20., Krieger S, Kececioglu J. Fast approximate shortest hyperpaths for inferring pathways in cell signaling hypergraphs. In: Proceedings of the 21st ISCB Workshop on Algorithms in Bioinformatics (WABI). Leibniz International Proceedings in Informatics, vol 201. 2021. p. 1–20.
Background: Cell signaling pathways, which are a series of reactions that start at receptors and end at transcription factors, are basic to systems biology. Properly modeling the reactions in such pathways requires directed hypergraphs, where an edge is now directed between two sets of vertices. Inferring a pathway by the most parsimonious series of reactions corresponds to finding a shortest hyperpath in a directed hypergraph, which is NP-complete. The current state-of-the-art for shortest hyperpaths in cell signaling hypergraphs solves a mixed-integer linear program to find an optimal hyperpath that is restricted to be acyclic, and offers no efficiency guarantees. Results: We present, for the first time, a heuristic for general shortest hyperpaths that properly handles cycles, and is guaranteed to be efficient. We show the heuristic finds provably optimal hyperpaths for the class of singleton-tail hypergraphs, and also give a practical algorithm for tractably generating all source-sink hyperpaths. The accuracy of the heuristic is demonstrated through comprehensive experiments on all source-sink instances from the standard NCI-PID and Reactome pathway databases, which show it finds a hyperpath that matches the state-of-the-art mixed- integer linear program on over 99% of all instances that are acyclic. On instances where only cyclic hyperpaths exist, the heuristic surpasses the state-of-the-art, which finds no solution; on every such cyclic instance, enumerating all source-sink hyperpaths shows the solution found by the heuristic was in fact optimal. Conclusions: The new shortest hyperpath heuristic is both fast and accurate. This makes finding source-sink hyper - paths, which in general may contain cycles, now practical for real cell signaling networks. Availability: Source code for the hyperpath heuristic in a new tool we call Hhugin (as well as for hyperpath enu- meration, and all dataset instances) is available free for non-commercial use at http://hhugin.cs.arizona. edu. Keywords: Systems biology, cell signaling networks, reaction pathways, directed hypergraphs, shortest hyperpaths, efficient heuristics, hyperpath enumeration modeled as ordinary graphs, using directed or undirected Background edges to link pairs of interacting molecules [2, 3], both Cell signaling pathways are cornerstones of molecular Klamt, Haus and Theis [4] and Ritz, Tegge, Kim, Poirel and cellular biology. They underly cellular communica - and Murali [5] have shown that ordinary graphs cannot tion, govern environmental response, and their pertur- adequately represent cellular activity that involves the bation has been implicated in the cause of many diseases assembly and disassembly of protein complexes, or mul- [1]. While signaling pathways have classically been tiway reactions among such complexes. Directed hypergraphs are generalizations of ordinary *Correspondence: spencer.krieger@gmail.com graphs where an edge, now called a hyperedge, is directed from one set of vertices, called its tail, to another set of Department of Computer Science, The University of Arizona, Tucson, Arizona 85721, USA vertices, called its head. Hypergraphs have been used © The Author(s) 2022, corrected publication 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. The Creative Commons Public Domain Dedication waiver (http:// creat iveco mmons. org/ publi cdoma in/ zero/1. 0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Krieger and Kececioglu Algorithms for Molecular Biology (2022) 17:12 Page 2 of 24 to model many cellular processes [4–12]. In particular, reduction [14] for shortest hyperpaths from the set cover a biochemical reaction that involves multiple reactants, problem is that, unless P=NP , no approximation algo- all of which must be present for the reaction to proceed, rithm can exist for shortest hyperpaths on hypergraphs and that results in multiple products, all of which are of n vertices with approximation ratio 1−o(1) ln n. produced upon its completion, is correctly captured by In metabolic networks, Cottret, Milreu and Acuña et al. a single hyperedge directed from its set of reactants to [17] examine the minimum precursor problem: given a its set of products. Despite hypergraphs affording more hypergraph G, a set of sources S, and a set of targets T, faithful models of reaction networks, the lack of practi- find a source subset P ⊆ S of minimum cardinality that cal hypergraph algorithms has hindered their potential has a factory from P to T, where a factory is a set of hyper- for properly representing and reasoning about molecular edges that produce targets T from precursor set P while reactions. satisfying weaker ordering constraints on hyperedges Biologically, a typical cell-signaling pathway consists than required by hyperpaths. They show this problem is of membrane-bound receptors that bind to extracellular NP-complete, and give an algorithm that enumerates all ligands, triggering intracellular cascades of reactions, cul- minimal precursor sets whose factory is acyclic. Acuña, minating in the activation of transcriptional regulators Milreu and Cottret et al. [18] subsequently enumerate all and factors [13]. Computationally, treating receptors as minimal precursor sets allowing cycles. Andrade, Wan- sources, and transcription factors as targets, finding the nagat and Klein et al. [19] extend these algorithms to most efficient way to synthesize a particular transcription accommodate stoichiometry and conserve intermediate factor from a set of receptors maps to the shortest hyper- metabolites within the factory. Carbonell, Fichera, Pan- path problem we consider here: Given a cell-signaling dit and Faulon [20] give an efficient algorithm to find a network whose reactants and reactions are modeled by source-sink hyperpath if one exists—irrespective of its the vertices and weighted hyperedges of a directed hyper- length—and prove that finding any hyperpath that must graph, together with a set of sources and a target, find a contain a specified set of hyperedges is NP-complete. hyperpath consisting of hyperedges from the sources to They also offer an approach to hyperpath enumeration the target of minimum total weight. We briefly summa - that relies on solutions to this NP-complete problem, for rize prior work on related problems next. which they employ a heuristic. In cell-signaling networks, Ritz, Avent and Murali [12, Related work 21] were the first to solve the shortest acyclic hyperpath Hypergraphs have been studied in the algorithms com- problem by formulating it as a mixed-integer linear pro- munity [14–16], and applied within systems biology to gram (MILP)—the current state-of-the-art for shortest metabolic networks [17–20] and cell-signaling networks hyperpaths—and showed that in practice, optimal acy- [12, 21–23]. clic hyperpaths can be found even for large cell-signal- In the field of algorithms, Italiano and Nanni [14] first ing hypergraphs. Their formulation does not extend to proved that finding a shortest source-sink hyperpath is hyperpaths with cycles, and requires exponential time NP-complete, even when hyperedges have a single head in the worst-case (which may be unavoidable, as the acy- vertex. In a seminal paper that is the source for much clic problem remains NP-complete). Recently, Franzese, of the subsequent work on hypergraphs, Gallo, Longo, Groce, Murali and Ritz [22] defined a parameterized Pallottino and Nguyen [15] explore special cases of notion of connectivity that interpolates between hyper- hypergraphs, and define several versions of hyperpaths, path- and ordinary-path-connectivity, while Schwob, including what they call a B-path (though see the cor- Zhan and Dempsey [23] modified the acyclic MILP rection of Nielsen and Pretolani [24]), which is essen- of Ritz et al. [21] to include time-dependence among tially equivalent to our definition of hyperpath (given in reactions. the following section on shortest hyperpaths in directed hypergraphs). They show the vertices reachable from a Our contributions source vertex in a hypergraph can be found in time lin- In contrast to prior work, we present a heuristic for short- ear in the total size of the tail and head sets of all hyper- est hyperpaths that handles cycles, is worst-case efficient, edges, give an efficient algorithm for a variant of shortest and finds hyperpaths that are demonstrably optimal or hyperpaths with a so-called additive cost function, and close to optimal in real cell-signaling hypergraphs. In prove that finding a minimum cut in a hypergraph is more detail, we make the following contributions. NP-complete. Ausiello and Laura [16] survey results on hypergraphs whose hyperedges have singleton head sets, • We present an efficient heuristic for shortest hyper - and note that a consequence of the NP-completeness paths, that on a hypergraph of size ℓ , which measures K rieger and Kececioglu Algorithms for Molecular Biology (2022) 17:12 Page 3 of 24 the total cardinality of all hyperedge tail and head is available free for non-commercial use at http:// sets, with m hyperedges that are doubly-reachable hhugin.cs.arizona.edu. from the source and sink vertices, and k defined anal - ogously to ℓ over these doubly-reachable hyperedges, Plan of the paper runs in O(ℓ + m k) time. The next section defines the general shortest hyperpath problem, allowing cycles. The following section then • We prove that the heuristic finds an optimal shortest presents our heuristic for shortest hyperpaths, analyzes hyperpath for the class of singleton-tail hypergraphs, its time complexity, shows it returns a feasible solu- where the tails of all hyperedges in the hypergraph tion whenever one exists, and proves it finds optimal are single vertices. solutions for singleton-tail hypergraphs. The next sec - tion gives our algorithm for generating all source-sink • We also give a practical algorithm for hyperpath hyperpaths, proves its correctness, and analyzes its time enumeration that generates all possible source-sink complexity. The subsequent section compares the heu - hyperpaths, allowing us to tractably measure how ristic, through experiments on all source-sink instances close our heuristic is to the optimum. from standard databases, to the state-of-the-art MILP for acyclic instances, or to the optimum of all enumer- • Our heuristic matches the state-of-the-art MILP ated hyperpaths for cyclic instances, and discusses three for shortest acyclic hyperpaths on over 99% of all examples of cyclic shortest hyperpaths in cell signaling instances from two standard databases of cell-signal- networks. Finally, the last section concludes, and pro- ing pathways. vides directions for further research. • Our heuristic surpasses the state-of-the-art on Shortest hyperpaths in directed hypergraphs instances where every source-sink hyperpath is A directed hypergraph is a generalization of an ordinary cyclic, and hence the MILP finds no solution. On all directed graph, where an edge, instead of touching two such cyclic biological instances, our hyperpath enu- vertices, now connects two subsets of vertices. Formally, meration algorithm verified that the heuristic was in a directed hypergraph is a pair (V, E), where V is a set of fact optimal. vertices, and E is a set of directed hyperedges. (The lit - erature sometimes uses the term hyperarc for an edge To our knowledge, this heuristic is the rfi st in the lit - in a directed hypergraph, but we prefer the simpler term erature for shortest source-sink hyperpaths in general hyperedge—just as the term edge is conventionally used directed hypergraphs, where hyperedges have arbitrary for both directed and undirected ordinary graphs. We tail and head sets, and the length of a hyperpath is the will occasionally abbreviate the term hyperedge to simply sum of the weights of its hyperedges. edge, when it is clear that the context is with respect to a We note that the worst-case efficiency and subclass opti - directed hypergraph.) Each hyperedge e ∈ E is an ordered mality of the heuristic highlighted in the first two points pair (X, Y), where both X , Y ⊆ V are vertex subsets. above show that the shortest hyperpaths problem is poly- Edge e is directed from set X to set Y. We call set X the tail nomial-time solvable for singleton-tail hypergraphs— in contrast to its NP-completeness for singleton-head hypergraphs [14]—which does not appear to have been observed before in the literature [16]. Furthermore, while prior work has developed specialized algorithms that are tailored to shortest hyperpaths under so-called addi- tive cost functions [15]—which also handle singleton-tail hypergraphs—in distinction, we give a general heuris- tic for arbitrary hypergraphs under the non-additive cost function of total weight of the hyperpath, that as a con- sequence is optimal for the special case of singleton-tail hypergraphs. Fig. 1 Hyperedge. A hyperedge e with tail(e) = {v , ... , v } 1 k Source code for an implementation of the short- and head(e) = {w , ... , w } . T o use e in a hyperpath P, every 1 ℓ est hyperpath heuristic in a new tool we call Hhugin vertex v ∈ tail(e) must have a preceding hyperedge f in P with [25] (short for “hypergraph heuristic for general short- v ∈ head(f ) est source-sink hyperpaths”), as well as the hyper- path enumeration algorithm and all dataset instances, Krieger and Kececioglu Algorithms for Molecular Biology (2022) 17:12 Page 4 of 24 of e, and set Y the head of e, and refer to these sets by the occurs in tail(e) for an earlier hyperedge e in the ordering. functions tail(e) = X and head(e) = Y . We also refer to While in ordinary graphs a minimal s,t-path can never the in-edges of vertex v by in(v) := {e ∈E : v ∈ head(e)} , contain a cycle, in hypergraphs an s,t-hyperpath can in and the out-edges of v by out(v) := {e ∈E : v ∈ tail(e)} . fact contain cycles, as shown in our later section on bio- Figure 1 shows a directed hyperedge. logical examples. In ordinary directed graphs, a path from a vertex s to a We can now define the shortest hyperpath problem. vertex t is a sequence of edges starting from s and ending For an edge weight function ω(e) , we extend ω to edge at t, where for consecutive edges e and f in the sequence, subsets F ⊆ E by ω(F) := ω(e). e∈F the preceding edge e must enter the vertex that the fol- lowing edge f leaves. We say t is reachable from s when Definition 3 (Shortest Hyperpaths) The Short - there is such a path from s to t. est Hyperpaths problem is the following. Given a In generalizing these notions to directed hypergraphs, directed hypergraph (V, E), a positive edge weight func- the conditions both for when a hyperedge can follow tion ω : E→R , source s ∈ V and sink t ∈ V , find an s,t- another in a hyperpath, and when a vertex is reachable hyperpath P ⊆ E of minimum total weight ω(P). from another, become more involved. A hyperpath is again a sequence of hyperedges, but now for hyperedge f Note that for positive edge weights, Shortest Hyper- in a hyperpath, for every vertex v ∈ tail(f ) , there must be paths is equivalent to finding an s,t-superpath of mini - some hyperedge e that precedes f in the hyperpath for mum total weight. which v ∈ head(e) . Reachability is captured by the follow- Shortest Hyperpaths with a single source and sink ing notion of superpath. vertex also captures more general versions of the prob- lem with multiple sources and multiple sinks, as fol- Definition 1 (Superpath) In a directed hyper- lows. To find a hyperpath that starts from a set of graph (V, E), an s,t-superpath, for vertices s, t ∈ V , is an sources S ⊆ V , simply add a new source vertex s to the edge subset F ⊆ E such that the hyperedges of F can be hypergraph together with a single hyperedge ({s}, S) of ordered e , e , . . . , e , where zero weight, and equivalently find a hyperpath from the 1 2 k single source s. To find a hyperpath that reaches all ver- (i) tail(e ) ={s}, tices in a set of sinks T ⊆ V , add a new sink vertex t, a (ii) for each 1 < i ≤ k , zero-weight hyperedge (T , {t}) , and equivalently find a hyperpath to the single sink t. To find a hyperpath that tail(e ) ⊆{s}∪ head(e ) , i j reaches some vertex in a set of sinks T ⊆ V , add new sink 1≤j<i vertex t, zero-weight hyperedges ({v}, {t}) from all v ∈ T , and again equivalently find a hyperpath to the single (iii) and t ∈ head(e ). sink t. Thus versions of shortest hyperpaths with multiple sources and sinks can be reduced to the problem with a For an s,t-superpath, we call s its source, t its sink, and we single source and sink. say t is reachable from s. Shortest Hyperpaths is NP-complete [14] (even for acy- clic hypergraphs with singleton head sets), so we likely We can now define hyperpaths in terms of superpaths. cannot efficiently compute shortest hyperpaths in the Recall that a set S is minimal with respect to some prop- worst-case. The next section presents an efficient heuris - erty X if S satisfies X, but no proper subset of S satisfies X. tic for shortest hyperpaths that is highly accurate at find - ing demonstrably optimal or near-optimal hyperpaths in Definition 2 (Hyperpath) An s,t-hyperpath is a real cell-signaling hypergraphs. minimal s,t-superpath. In other words, a hyperpath P is a superpath for which An efficient shortest hyperpath heuristic removing any edge e ∈ P leaves a subset P −{e} that is We now give a fast heuristic for Shortest Hyperpaths no longer a superpath. Essentially, hyperpaths eliminate that always finds an s,t-hyperpath if one exists. While unnecessary edges from superpaths. Figures 7, 8, and 9 the heuristic is not guaranteed to find a shortest s,t- later show examples of hyperpaths. hyperpath in general, our later experiments on real cell- We say a hyperpath P contains a cycle if, for every signaling hypergraphs show it quickly finds a hyperpath ordering e , . . . , e of its hyperedges satisfying proper- that is optimal or remarkably close to optimal on the ties (i)–(iii) in the definition of superpath, P contains vast majority of instances in comprehensive experiments some hyperedge f with a vertex in head(f ) that also over the two standard cell-signaling databases in the K rieger and Kececioglu Algorithms for Molecular Biology (2022) 17:12 Page 5 of 24 literature. Furthermore, we will prove that the heuristic subgraph of the input hypergraph that only contains is guaranteed to find a shortest s,t-hyperpath for the class hyperedges that are reachable both from source s and in of singleton-tail hypergraphs, where the tail-sets of all reverse from sink t. Hence at base, the heuristic builds hyperedges are single vertices. upon fast algorithms for computing reachability in a We present the heuristic by providing detailed pseu- hypergraph. docode at a level that can be directly implemented, as the Accordingly, to present the heuristic, we first give pseu - heuristic is carefully designed and many of its component docode for these fundamental algorithms for directed algorithms are surprisingly tricky to implement correctly. reachability. These algorithms use the following termi - After describing the heuristic, we give a time analysis nology of forward-reachable, backward-traceable, and that shows it is always efficient, prove its feasibility, and doubly-reachable, which we define next. then show that it finds optimal hyperpaths for singleton- tail hypergraphs. Definition 4 (Reachability and Traceability) Ver- While at a high level the heuristic has some aspects tex v is forward reachable from source s in hypergraph G in common with Dijkstra’s algorithm for single-source if there is an s,v-superpath in G. Hyperedge e is forward shortest paths in an ordinary directed graph (see [26, reachable from s if all vertices v ∈ tail(e) are forward pp. 658–659])—in that the heuristic maintains a heap of reachable from s. elements prioritized by estimated path lengths—it has Vertex v is backward traceable from sink t if v = t , significant differences. In contrast to Dijkstra’s algorithm, or recursively if v ∈ tail(e) for an edge e where the heuristic is edge-based, rather than vertex-based, some w ∈ head(e) is backward traceable from t. Hyper- and the heap maintains hyperedges e prioritized by the edge e is backward traceable from t if some v ∈ head(e) is length of the shortest known hyperpath from the source s backward traceable from t. to edge e, which will be formally defined later. Also in A vertex v or hyperedge e is doubly reachable if v or e, contrast to Dijkstra’s algorithm, maintaining a single in- respectively, is both forward reachable from s and back- edge to a vertex no longer suffices for recovering a path ward traceable from t. back to source s; instead, recovering an s,t-hyperpath now requires the heuristic to maintain a set of in-edges To describe the heuristic, it will also be convenient to to each hyperedge e that are candidates for the final edges extend the definitions of superpath and hyperpath to a on the path from s to e. Furthermore, the total length of path from a source s to a hyperedge e. a hyperpath P to e is no longer a simple function (like a minimum or a sum) of the lengths of hyperpaths to the Definition 5 (Superpath and Hyperpath from Source in-edges of e in P that cover the tail of e, since the constit- to Hyperedge) An s,e-superpath is an edge subset S with uent hyperpaths within P to these in-edges of e can have e ∈ S where all vertices in tail(e) are forward reachable arbitrarily-complicated sharing of hyperedges. Simply from source s using hyperedges in S. An s,e-hyperpath is determining the length of the best recovered hyperpath a minimal s,e-superpath. for a hyperedge e on the heap, using these stored in-edges to each hyperedge, is itself now a hard combinatorial The pseudocode that we present accesses a hypergraph G problem, which the heuristic tackles by a carefully-con- through the fields G.vertices and G.edges. We access the structed greedy procedure. tail-set and head-set of a hyperedge e through the fields The overall structure of the heuristic is a breadth-first e.head and e.tail. We access the set of in-edges and out- search over the hyperedges e reachable from source s, edges of a vertex v through the fields v.in and v.out. For a ordered by the estimated length of the shortest hyper- list Q that is handled as a queue, the operation Q.Put(x) path from s to e. (Admittedly a shortest s,t-hyperpath P is appends item x to the rear of Q, while the operation not necessarily composed of shortest hyperpaths from s Q.Get() removes and returns the item at the front of Q. to individual hyperedges e in P, which is partly why this For a min-heap H, the operation H.Insert(x, k) inserts approach is a heuristic.) The search repeatedly invokes item x with key k into H, and returns a reference p to the a greedy procedure to recover the currently best-known heap node containing this pair (x, k) in H; the operation hyperpath to e in order to evaluate its length. As hyper- H.Extract() removes and returns the item in H with mini- paths are by definition minimal superpaths, to determine mum key; and the operation H.Decrease(p, k) takes a ref- minimality this greedy recovery procedure repeatedly erence p to a heap node in H and decreases its key to k tests reachability of hyperedges. Moreover, for efficiency, if k is smaller than its current key. All functions assume the overall breadth-first search proceeds over a smaller hypergraph G is passed by reference. Krieger and Kececioglu Algorithms for Molecular Biology (2022) 17:12 Page 6 of 24 function ForwardReachable (s, G) begin • Find all edges forward-reachable... • ... from source s in G Create queue Q • Initializea queue of reached vertices Q.Put(s) s.reached := true (F, R):=(∅, ∅) while not Q.Empty() do begin • Process the reached vertices v v := Q.Get() R ∪ := {v} for e ∈ v.out do begin • Detectwhich out-edges e of v arenow reached e.count −:= 1 if e.count= 0 then begin • All vertices in tail(e) have been reached... F ∪ := {e} • ...so e is reached for w ∈ e.head do if not w.reached thenbegin Q.Put(w) w.reached := true end end end end for v ∈ R do begin • Restore fields and return the reachable hyperedges v.reached:= false for e ∈ v.out do e.count+:= 1 end return F end function BackwardTraceable (t, G) begin • Find all edges backward-traceable ... • ... from sink t in G Create queue Q • Initializea queue of reached vertices Q.Put(t) t.reached := true (F, B):=(∅, ∅) while not Q.Empty() do begin • Process the reached vertices v v := Q.Get() B ∪ := {v} for e ∈ v.in do • Collect the traceable hyperedges e if not e.marked then begin F ∪ := {e} e.marked := true for w ∈ e.tail do if not w.reached thenbegin Q.Put(w) w.reached:= true end end end for v ∈ B do v.reached := false • Restore fields and return the traceablehyperedges for e ∈ F do e.marked:= false return F end Fig. 2 Reachability computations. Function ForwardReachable, given source vertex s in hypergraph G, returns all hyperedges e for which tail(e) is reachable by a hyperpath from s. Function BackwardTraceable, given sink vertex t in G, returns all hyperedges e for which some vertex v ∈ head(e) is backward-traceable from t. These functions assume fields v.reached, e.marked, and e.count have been initialized to false, false, and |tail(e)| , respectively, for all v and e in G K rieger and Kececioglu Algorithms for Molecular Biology (2022) 17:12 Page 7 of 24 Figure 2 gives pseudocode for the two functions backward-traversing what we call in-edges: those hyper- ForwardReachable and BackwardTraceable. edges whose head-sets intersect the tail-set of a given Function ForwardReachable returns the set of all hyperedge. Function ShortestHyperpathHeuris- hyperedges that are forward reachable from source s, tic maintains for a hyperedge e the field e.inedges, while function BackwardTraceable returns the set which stores the subset of in-edges f to e whose field of hyperedges that are backward traceable from sink t. f.length has been determined. Function ForwardReachable uses the Boolean ver- For the second step, function RecoverShortHyper- tex field v.reached, and the integer edge field e.count, path attempts to find the minimum weight subset of S which it assumes have already been initialized to the val- that is still a superpath by greedily considering hyper- ues v.reached = false for all v ∈ V and e.count = tail(e) edges f ∈ S for removal in decreasing order of f.length, for all e ∈ E . Function BackwardTraceable also which is the estimated total length of a shortest s,f-hyper- uses the Boolean edge field e.marked, which it similarly path. (Note this is more sophisticated than a naive greedy assumes is initialized to false for all e. (This initializa - approach that simply removes hyperedges f in decreasing tion will be done once for hypergraph G in the shortest order of their edge-weight ω(f ) , which would degener- hyperpath heuristic, which allows these functions when ate to removing edges in random order in real cell-sign- called repeatedly to run in time bounded by just the size aling networks where hyperedges typically all have unit of the forward-reachable or backward-traceable sub- weight, and hence would all be tied for removal.) This graphs.) Function ForwardReachable uses the field greedy process for trimming superpath S repeatedly tests e.count to detect when all vertices in tail(e) have been whether tail(e) is still reachable from s on removing f by reached from s, and hence e is now reached from s. Func- calling Boolean function IsReachable. Pseudocode tion BackwardTraceable performs a similar but sim- for IsReachable is not given, but it simply implements pler computation in reverse from sink t. The worst-case a version of function ForwardReachable that halts time for both these functions is linear in the size of the and returns true as soon as it adds e to the set of hyper- subgraph they explore, as analyzed in the following sec- edges reachable from s, or returns false after collecting tion on the time-complexity of the heuristic. the entire reachable set without encountering e. Figure 3 gives pseudocode for the function Short- We note that most of the computation of the short- estHyperpathHeuristic, our heuristic. Like Dijk- est hyperpath heuristic proceeds over a much smaller stra’s shortest path algorithm for ordinary graphs, this subgraph of the input hypergraph G: namely the sub- function maintains a heap H, but in contrast to Dijkstra’s graph induced by the hyperedges D ⊆ E that are doubly algorithm this is now a heap of hyperedges e (rather than reachable (both forward reachable from s and backward a heap of vertices), which are prioritized by keys that traceable from t). This preserves correctness, since are the best known estimate of the length of a shortest hyperedges that are not doubly reachable cannot be on s,e-hyperpath. We refer to this estimate as the current an s,t-hyperpath and can safely be ignored (as argued in path length for e. The heuristic starts from the out-edges the later section on feasibility of the heuristic in the proof of source s, and in a reaching computation repeatedly of Theorem 2). extracts from heap H the hyperedge e with minimum key. To summarize, the shortest hyperpath heuristic pro- When hyperedge e is removed from H, the estimated path ceeds greedily like Dijkstra’s algorithm, but with some length for e is evaluated, and stored in field e.length. To important differences: it maintains a heap of hyperedges compute this length estimate, it must construct the best prioritized by estimated shortest path lengths to tail- s,e-hyperpath it can find, and evaluate its total weight. Of sets, records a set of potential in-edges to a given hyper- course, computing an optimal s,e-hyperpath is NP-com- edge used for recovering a hyperpath to the hyperedge, plete, so it uses a greedy heuristic to construct this path and recovering such a hyperpath now involves another by the function RecoverShortPath. This greedy path- greedy heuristic to find a minimal superpath of small construction heuristic consists of two steps: (1) recover- total weight. ing an s,e-superpath by recursively backward-traversing Our later section on experimental results shows this hyperedges that enter tail(e) , followed by (2) finding a heuristic is remarkably close to optimal on real cell- minimal subset of this superpath that is an s,e-hyperpath signaling hypergraphs. Given that no practical exact while attempting to minimize its total weight. algorithm exists for general shortest hyperpaths, we Figure 4 gives pseudocode for the function Recov- determine the optimum by enumerating all s,t-hyper- erShortHyperpath that implements this greedy paths and taking the minimum of their lengths, using path-construction heuristic. For the first step, recov - an algorithm we develop in the later section on tractably ering the s,e-superpath S is done by recursively generating all source-sink hyperpaths. Krieger and Kececioglu Algorithms for Molecular Biology (2022) 17:12 Page 8 of 24 function ShortestHyperpathHeuristic (s, t, G, ω) begin • Finda short s, t-hyperpath for v ∈ G.vertices do • Initialize fields (v.reached,v.removed) := (false, false) for e ∈ G.edges do (e.count,e.marked,e.node,e.inedges) := (|e.tail|, false, nil, ∅) D := ForwardReachable(s, G) • Restrict G to doubly-reachable edges D ∩ BackwardTraceable (t, G) Remove from G all edges notin D Create min-heap H • Initialize edge heap H for e ∈ s.out with e.tail= {s} do e.node := H.Insert(e, ω(e)) s.reached:= true while not H.Empty() do begin • Process reached hyperedges by their path lengths e := H.Extract() e.removed:= true P := RecoverShortHyperpath(s, e, G) • Recovera shorthyperpath to e ... e.length := ω(P ) • ... and itspathlength F := ∅ • Collect the out-edges F of e ... for v ∈ e.head do begin • ... and detect whichare reached for f ∈ v.out do begin if not v.reached then f.count −:= 1 if not f.marked thenbegin F ∪ := {f} f.marked := true end end v.reached := true end for f ∈ F do f.marked := false for f ∈ F do begin • Update path lengths, in-edges, and add reached edges to H f.inedges ∪ := {e} if f.node = nil and not f.removed then • Update path length to edge on H H.Decrease(f.node,ω(RecoverShortHyperpath(s, f, G))) else if f.node = nil and f.count= 0 then • Add reached edge to H f.node:= H.Insert(f, ω(RecoverShortHyperpath(s, f, G))) end end ∗ ∗ ∗ (P ,L ):=(∅, ∞) • Recover the best s, t-hyperpath P for e ∈ t.in do if e.node = nil then begin P := RecoverShortHyperpath(s, e, G) if ω(P ) <L then ∗ ∗ (P ,L ):=(P, ω(P )) end Restore to G all edges notin D • Unrestrict G and return the best hyperpath return P end Fig. 3 Efficient heuristic for shortest source-sink hyperpaths. Given source s , sink t, and edge weights ω , function ShortestHyperpathHeuristic finds an s,t-hyperpath in hypergraph G, attempting to minimize its length. If no s,t-hyperpath exists, the empty path is returned. For doubly- reachable hyperedges e, the heuristic maintains fields e.length (the total weight of the shortest hyperpath found to e), and e.inedges (the subset of edges f with head(f ) touching tail(e) where f.length is known), which are used in RecoverShortHyperpath to recover a short hyperpath to e K rieger and Kececioglu Algorithms for Molecular Biology (2022) 17:12 Page 9 of 24 function RecoverShortHyperpath (s, e, G) begin • Recovera short s, e-hyperpath in G Create queue Q • Initialize aqueue with the in-edges entering e for f ∈ e.inedges do begin Q.Put(f) f.marked := true end S := {e} • (I) Recover s, e-superpath S tracing backward from e while not Q.Empty() do begin f := Q.Get() S ∪ := {f} for g ∈ f.inedges do if not g.marked thenbegin Q.Put(g) g.marked := true end end for f ∈ S do f.marked := false Remove from G all edges notin S • (II) Trim S greedilytoan s, e-hyperpath P S −:= {e} P := {e} for f ∈ S in decreasing order of f.length do begin Remove f from G if not IsReachable(s, e, G) thenbegin Restore f back to G P ∪:= {f} end end Restore backto G all edges removed • Restore G and returnhyperpath P return P end Fig. 4 Recovering a short hyperpath from the source to a hyperedge. Given source vertex s and hyperedge e, function RecoverShortHyper- path returns an s,e-hyperpath P in hypergraph G, attempting to minimize its length. The edges of hyperpath P are greedily selected from an s,e-superpath S that is guaranteed to exist in G, where S is recovered by tracing backward from e We note for this heuristic that the inapproximability of n := |V | , the shortest hyperpath problem [16], together with the m := |E| . polynomial time analysis of the next subsection, imply We also use the size parameter that unless P = NP , the heuristic cannot be a constant- factor approximation algorithm for shortest hyperpaths. ℓ := tail(e) + head(e) , In the following subsections, we first analyze the run - e ∈ E ning time of the heuristic; then show it always finds a feasible solution whenever one exists; and finally prove it and degree parameter actually finds an optimal solution for the class of single - ton-tail hypergraphs. d := max in(v) , out(v) . v ∈ V Note that in general, the space required to represent all Time complexity of the heuristic hyperedges is �(ℓ) . We assume all tail and head sets are We now bound the time complexity of the shortest nonempty, and every vertex is touched by a hyperedge, hyperpath heuristic. Our analysis is in terms of the fol- which implies m + n = O(ℓ) . When we need to refer to lowing parameters measured on a hypergraph, or an these measures for a particular hypergraph G, such as on induced subgraph. For a hypergraph G with vertices V and hyperedges E, we denote its number of vertices and hyperedges by Krieger and Kececioglu Algorithms for Molecular Biology (2022) 17:12 Page 10 of 24 an induced subgraph, we explicitly subscript the parame- For the function RecoverShortHyperpath (in Fig. 4), ters by the specific hypergraph, such as n , . . . , d , where when it is called by ShortestHyperpathHeuristic, all G G these parameters are then measured in terms of the verti- its computations are performed on G restricted to the edge ces and edges of the subscripted hypergraph G. subset D ⊆ E of doubly-reachable hyperedges. We denote by The running time of the shortest hyperpath heuristic hypergraph H the doubly-reachable subgraph induced by D. may be expressed as a function of parameters measured on In RecoverShortHyperpath, the time to recover s,e- both the input hypergraph and its doubly-reachable sub- superpath S by tracing back from e is at most graph (induced by the hyperedges that are simultaneously forward reachable from the source and backward traceable O in(v) = O d ℓ . H H from the sink). f ∈ S v ∈ tail(f ) The time to greedily trim superpath S to s,e-hyper- Theorem 1 (Time complexity of the heuristic) The path P ⊆ S , in terms of cardinality k =|S| , is at most time complexity of the shortest hyperpath heuristic, in terms of the number of hyperedges m and size parameter ℓ O m + k log k + k ℓ = O k ℓ . H H H for both the input hypergraph G and its doubly-reachable subgraph H, is u Th s the total time for RecoverShortHyperpath is O ℓ + ℓ m . G H O d ℓ + O k ℓ = O ℓ m . H H H H H Proof To bound the running time of the func- For the function ShortestHyperpathHeuris- tion ShortestHyperpathHeuristic, we analyze in tic (in Fig. 3), we break its time down into the fol- turn its component functions ForwardReachable, lowing components. The time for the initialization, BackwardTraceable, and RecoverShortHyper- collecting the doubly-reachable edges D by calling For- path. The running time of the reachability computations wardReachable and BackwardTraceable, and ForwardReachable and BackwardTraceable (in restricting G to its subgraph H induced by D, is O ℓ . Fig. 2) can be expressed in an output-sensitive way in G The main while-loop executes for m iterations, and terms of the size of the edge sets they return. H spends O m log m time for all Extracts. The total For ForwardReachable, let R ⊆ V be the set of H H time across all iterations to compute s,e-hyperpath P vertices reachable from source s, and F ⊆ E be the set for all extracted edges e by calling RecoverShort- of hyperedges reachable from s that are returned. The Hyperpath is O ℓ m . The total time to collect the total time for ForwardReachable is dominated H out-edges F for the extracted e across all iterations is by the time for its main while-loop, which takes time O out(v) = O d ℓ . The total � out(v) + head(e) , or equivalently, H H e∈D v∈head(e) v∈R e∈F time across all iterations for Decrease and Insert, which take O(1) amortized time per edge in F using a Fibonacci � tail(e) ∩ R + head(f ) = O ℓ . heap (see [26, pp. 510–522]), is also O d ℓ . The time H H e ∈ E f ∈ F to recover the best s,t-hyperpath P is O d ℓ m . H H H For BackwardTraceable, let B ⊆ V be the set of Finally, adding up the bounds for the above compo- vertices it reaches from sink t, and F ⊆ E be the set of nents, the total time for the shortest hyperpath heuristic hyperedges traceable from t that are returned. A similar is analysis shows the time for BackwardTraceable is O ℓ + O m log m + O ℓ m G H H H � head(e) ∩ B + tail(f ) = O ℓ . + O d ℓ + O d ℓ m , G H H H H H e ∈ E f ∈ F which is in turn O ℓ + ℓ m . G H So the time for both ForwardReachable and BackwardTraceable on the input hypergraph G Notice that the overall running time of the heuristic is is O ℓ — but can be bounded more tightly in terms of G dominated by the total time to recover short hyperpaths, the subgraph of G they actually explore. which requires invoking RecoverShortHyperpath whenever the path length to a hyperedge is updated. This K rieger and Kececioglu Algorithms for Molecular Biology (2022) 17:12 Page 11 of 24 is necessary in hypergraphs, since in contrast to ordinary traceability immediately holds; otherwise, in the ordering graphs the length of the hyperpath to a hyperedge can no of P there must be a hyperedge f following e with non- longer be expressed as a simple function (such as a min- empty head(e) ∩ tail(f ) (else e can be removed from P, imum or a sum) of the lengths of the hyperpaths to its contradicting minimality); applying this same process in-edges. again at f yields a subsequence of the ordering of P that As demonstrated in our later section on experimental ends in a hyperedge whose head contains t; considering results, for real biological instances the size of the dou- this subsequence in reverse order satisfies Definition 4 bly-reachable subgraph H is significantly smaller than for backward traceability of e from t. Hence restricting to the full input hypergraph G, so designing the heuristic the doubly-reachable subgraph G is safe. to compute mainly over the much smaller hypergraph H To show the implication of the theorem, notice yields a significant performance speedup in practice. ShortestHyperpathHeuristic explores all hyper- Next we show the heuristic always finds a feasible edges that are forward reachable from s in G , inser ting solution. hyperedge e into heap H when e is initially reached, again detecting when traversing e causes another hyperedge f to be first reached using counter f.count, and recording Feasibility of the heuristic in field f.inedges all such e that have reached f. So if an The most basic property that a heuristic for a combinato - s,t-hyperpath exists in G, which implies sink t has an in- rial optimization problem should satisfy is feasibility: that edge e that is forward reachable from s in G , this e will it always returns a feasible solution whenever one exists. eventually be inserted into H, making e.node non-nil, In the context of Shortest Hyperpaths, a feasible solution and at the end of the heuristic causing RecoverShort- is any s,t-hyperpath, while an optimal solution is a feasi- Hyperpath to be called on e. ble solution of minimum total edge-weight. We claim that when function RecoverShortHy- For the hyperpath heuristic, we now show feasibility. perpath (in Fig. 4) is ultimately called on an in-edge to sink t, phase (I) first recovers an edge set S that is an s,t-superpath in G. Considering the hyperedges of S in Theorem 2 (Feasibility of the heuristic) The shortest reverse order of their removal from queue Q, they satisfy hyperpath heuristic finds a source-sink hyperpath when - the three conditions for an s,t-superpath in Definition 1: ever one exists. the last hyperedge removed from Q solely has s in its tail, each hyperedge in S (other than this last one) has its tail Proof Function ShortestHyperpathHeuristic set covered by hyperedges removed later from Q, and the (in Fig. 3) first restricts the input hypergraph G to its first edge removed has t in its head. doubly-reachable subgraph, consisting of the hyper- Function RecoverShortHyperpath in phase (II) edges D that are both forward reachable from source s then trims S to a minimal s,t-superpath, yielding an s,t- and backward traceable from sink t. Note that functions hyperpath. Finally, ShortestHyperpathHeuristic ForwardReachable and BackwardTraceable (in returns the shortest such hyperpath found. Fig. 2) together correctly collect these doubly-reachable u Th s whenever a source-sink hyperpath exists, the heu - hyperedges D: function ForwardReachable explores ristic finds one. breadth-first the hyperedges that are forward reachable from s, maintaining a counter for each hyperedge e that Next we prove the heuristic actually solves Short- records the number of vertices in its tail that have not est Hyperpaths when the input is a singleton-tail yet been reached from s, and detecting when e is reached hypergraph. by this counter hitting zero; while function Back- wardTraceable directly implements Definition 4 of backward traceability from t. Furthermore, we claim that when restricting to the Optimality of the heuristic for singleton‑tail hypergraphs doubly-reachable subgraph G , the heuristic does not While our heuristic does not necessarily find shortest lose any hyperedges on source-sink hyperpaths. Note hyperpaths in general hypergraphs, we can prove that that any hyperedge e on an s,t-hyperpath P in the input it does find optimal solutions for the following class of hypergraph G is forward reachable from s: consider hypergraphs. the ordering of hyperedges in P from Definition 1, and A singleton-tail hypergraph is a directed hypergraph G take the prefix of this ordering up through e; this prefix where every hyperedge e in G has tail(e) = 1 . (The head is an s,e-superpath, so e is by definition forward reach - sets of hyperedges can be arbitrary.) In other words, in able from s. Note also that any e on P in G is back- singleton-tail hypergraphs, the tails of hyperedges are ward traceable from t as well: if t ∈ head(e) , back ward single vertices. Krieger and Kececioglu Algorithms for Molecular Biology (2022) 17:12 Page 12 of 24 At a high level, the optimality argument for singleton- In the following, the distance of hyperedge e from tail hypergraphs first shows that shortest source-sink source s is the total weight of a shortest s,e-hyperpath, hyperpaths are composed of shortest s,e-hyperpaths; which we denote by d(e). Recall that function Short- then argues that the heuristic’s greedy superpath trim- estHyperpathHeuristic (in Fig. 3) maintains the ming recovers shortest s,e-hyperpaths when the hyper- field e.length, that holds the total weight of the best- edge fields hold shortest hyperpath lengths; and finally known s,e-hyperpath, which upper bounds d(e). proves that the heuristic computes exact shortest s,e- The next lemma states that in singleton-tail hyper - hyperpath lengths. graphs, given two key conditions, the greedy super- The following characterization states that in singleton- path trimming that is used by the heuristic to recover tail hypergraphs, a shortest s,t-hyperpath is composed a hyperpath to hyperedge e in fact finds a shortest of shortest s,e-hyperpaths to its constituent hyperedges. s,e-hyperpath. This does not hold for general hypergraphs, and is partly why the special case of shortest singleton-tail hyperpaths Lemma 2 (Recovering hyperpaths in singleton-tail is polynomial-time solvable. hypergraphs) In a singleton-tail hypergraph with non- negative edge weights, when the hyperpath heuristic recov- Lemma 1 (Characterizing shortest singleton-tail ers a hyperpath from source s to hyperedge e, suppose hyperpaths) In singleton-tail hypergraphs with non- negative edge weights, every shortest s,t-hyperpath can be (i) field e.inedges contains among its hyperedges an in- ordered as a sequence e ··· e of hyperedges where edge to e from a shortest s,e-hyperpath, and 1 k (ii) in the s,e-superpath S found when recovering a (i) each head(e ) ⊇ tail(e ), and hyperpath to e, for all hyperedges f ∈ S−{e} , i i+1 (ii) every prefix e ··· e is a shortest s, e -hyperpath. field f.length holds distance d(f). 1 i i Proof Consider a shortest s,t-hyperpath P in a sin- Then the hyperpath to e that the heuristic recovers is a gleton-tail hypergraph. By definition, P is a minimal shortest s,e-hyperpath. s,t-superpath, so its edges can be ordered as a sequence e ··· e where tail(e ) ={s} , head(e ) ⊇{t} , and since 1 k 1 k Proof We first claim that under the assumptions of the tail sets contain a single vertex, for every hyperedge e lemma, when the hyperpath heuristic calls Recover- in this sequence other than the first one, there is a prior ShortHyperpath (in Fig. 4) on a hyperedge e, its first hyperedge e with head(e ) ⊇ tail(e ). i i j phase recovers an s,e-superpath S that contains a short- Starting from the last hyperedge e , and repeatedly est s,e-hyperpath. By assumption (i), field e.inedges con- picking a prior hyperedge whose head covers the tail of tains a hyperedge f on a shortest s,e-hyperpath, and f will the current hyperedge until reaching tail {s} , yields a sub- be in superpath S, hence by assumption (ii), the value sequence f ··· f specifying subset Q ={f , . . . , f }⊆ P , 1 ℓ 1 ℓ of f.length is d(f). This value came from a shortest s,f- where again tail(f ) = {s} , head(f ) ⊇{t} , and now 1 ℓ hyperpath Q that was found in a prior call to Recov- head(f ) ⊇ tail(f ) for 1≤ i<ℓ . Fur thermore Q = P , i i+1 erShortHyperpath on f, by trimming an s,f-super- otherwise P is not minimal. So subsequence f ··· f is 1 ℓ path T. Notice that Q followed by e is an s,e-superpath P , exactly sequence e ··· e . 1 k as head(f ) ⊇ tail(e) . Now trim P to an s,e-hyperpath P, Clearly every prefix e ··· e is an s, e -superpath. More- 1 i i and let P be a shortest s,e-hyperpath containing f that over this prefix must be a minimal s, e -superpath, other- exists by assumption (i). By Lemma 1 and minimality of ∗ ∗ wise P is not minimal. Thus every prefix ending in e is an hyperpaths, P must consist of a shortest s,f-hyperpath Q s, e -hyperpath. followed by e. Under nonnegative edge weights, Finally, every prefix e ··· e must be a shortest s, e 1 i i ω(P) ≤ ω(P) -hyperpath. Otherwise, replacing this prefix by a shortest s, e -hyperpath yields an s,t-superpath S of total weight = ω(Q) + ω(e) less than P. Furthermore, trimming S to a minimal s,t- = ω(Q ) + ω(e) superpath under nonnegative edge weights yields an s,t- = ω(P ) . hyperpath of total weight less than P, contradicting the optimality of P. K rieger and Kececioglu Algorithms for Molecular Biology (2022) 17:12 Page 13 of 24 Thus P is also a shortest s,e-hyperpath. Since f is in We now show that the hyperpath heuristic solves e.inedges, tracing back from e recovers the superpath Shortest Hyperpaths for singleton-tail hypergraphs. S ⊇ T ∪{e}⊇ Q ∪ {e}= P ⊇ P , Theorem 3 (Optimality of the heuristic on single- ton-tail hypergraphs) For singleton-tail hypergraphs so the claim holds. with nonnegative edge weights, the hyperpath heuristic We next claim that when RecoverShortHyper- nd fi s a shortest source-sink hyperpath. path in its second phase greedily trims superpath S, the resulting superpath T ⊆ S still contains a shortest hyper- Proof The key to proving optimality is showing that path. To show this, we prove that each superpath S that in singleton-tail hypergraphs, the estimates that the remains after i iterations of greedy trimming contains a heuristic computes for shortest hyperpath lengths are shortest s,e-hyperpath, by induction on i. For the basis exact. Recall that when function ShortestHyper- at i=0 , the initial superpath S before any trimming con- pathHeuristic (in Fig. 3) removes hyperedge e from tains a shortest hyperpath by our first claim on S. For the heap H, it calls RecoverShortHyperpath on e to induction at i>0 , let P be a shortest s,e-hyperpath that recover an s,e-hyperpath P, and sets the field e.length superpath S contains by our hypothesis, and let f be the i−1 to ω(P) , the total weight of P. hyperedge removed from S at iteration i. If f ∈P , then i−1 We claim that when this assignment occurs, field S = S − {f } trivially contains P. So we assume f ∈ P . i i−1 e.length holds distance d(e), the total weight of a short- In the following, the core of hyperpath P consists of the est s,e-hyperpath. We now prove this claim by induction tail vertices of its hyperedges. on the number of heap extractions. At a high level, the In an ordering of shortest hyperpath P that satis- argument is similar to that for Dijkstra’s shortest-path fies Lemma 1, consider the hyperedges in the suffix algorithm (see [26, pp. 659–661]) on ordinary directed of P that begins with f. As edge weights are nonnega- graphs. tive, by Lemma 1 the distances of these hyperedges For the basis, the first hyperedge extracted has must be at least d(f), so by assumption (ii) the values of tail(e) = {s} and e.key = ω(e) , which equals d(e), as e the length field for these hyperedges must be at least itself is a shortest s,e-hyperpath (since all edge weights f.length. Greedy trimming proceeds in decreasing order are nonnegative). The recovered s,e-hyperpath will con - of length-field values, so the hyperedges in this suffix of P sist of e (as e.inedges is empty), so after the assignment must either have been already considered for trimming field e.length holds d(e). before f, or not yet considered due to being tied with f For the induction, let e be the next hyperedge to be (from having zero edge-weight). If they were considered removed from the heap, and assume for all hyperedges h before f, then since they were not trimmed, there must be extracted prior to e that h.length holds d(h). Now con- no alternate s,e-hyperpath in S that enters their head i−1 sider a shortest s,e-hyperpath P, and in the ordering of P vertices on the core of P. If they were not considered given by Lemma 1, let f be the first hyperedge in P that yet, then since f can be removed from S , there must i−1 has not yet been removed from the heap. Note that f be an alternate s,e-hyperpath Q ⊆ S distinct from P that exists, as e has not been removed yet. enters one of the core head-vertices of the hyperedges in We first show f .key = d(f ) . In the special case where f this suffix of P whose length field is tied with f. Moreover, is the first edge of P, notice d(f ) = ω(f ) by the same this alternate hyperpath Q must enter P with the same reasoning as in the basis. Furthermore f .key = ω(f ) , a s length-field value as the edge of P sharing this core head- f.key starts at ω(f ) , never increases, and cannot decrease vertex. (If Q enters P at a smaller length-value, then P below this minimum value. So f .key = d(f ) in this spe- is not a shortest s,e-hyperpath; if Q enters at a greater cial case. length-value, hyperedge f would not be the next hyper- In the general case where f is not the first edge of P, edge removed, as instead a hyperedge from Q of greater let g be the in-edge to f on P, and Q ⊆ P be the prefix of P length would be.) Since Q enters P at the same length- ending in f, as illustrated in Fig. 5. Notice g has already value, hyperpath Q is also a shortest s,e-hyperpath. been extracted from the heap (by the definition of f), so g Hence S ⊇ Q still contains a shortest hyperpath, which is in f.inedges (as when a hyperedge is extracted, for all its proves the second claim. out-edges h it is added to h.inedges). Furthermore Q is a So the final trimmed s,e-superpath T returned by shortest s,f-hyperpath by Lemma 1, so g is on a shortest RecoverShortHyperpath contains a shortest s,e- hyperpath to f. For all hyperedges h extracted before e, hyperpath P ⊆ T . Since T is minimal (as no further edges by the induction hypothesis h.length = d(h) , and only could be trimmed), and P by definition is minimal, we extracted h add themselves to the field inedges of their must have T = P , which proves the lemma. Krieger and Kececioglu Algorithms for Molecular Biology (2022) 17:12 Page 14 of 24 induction hypothesis had h.length = d(h) . Fur thermore, hyperedges are never removed from the field inedges, and h.length never changes after h is extracted. Thus the assumptions in Lemma 2 are still met upon extrac- tion of e, so when ShortestHyperpathHeuristic assigns to e.length the total weight of the hyperpath P recovered for e, by Lemma 2 this recovered P will again be a shortest s,e-hyperpath, hence e.length = d(e) . This completes the inductive proof of our claim. So for every hyperedge h in the doubly-reachable sub- graph explored by ShortestHyperpathHeuris- tic, after extracting h from the heap, the relation h.length = d(h) holds. Finally, when recovering the best s,t-hyperpath at the end of the heuristic by examin- ing the in-edges e to sink t, for each such hyperedge e the assumptions of Lemma 2 are still met (by the same Fig. 5 Hyperpath from the proof of optimality for singleton-tail reasoning as above), so the hyperpaths P obtained from hypergraphs. Hyperedges inside the dashed circle have been calling RecoverShortHyperpath on these sink in- extracted from the heap; those outside have not. The next hyperedge edges e are again shortest s,e-hyperpaths. Since a shortest to be extracted is e, and P is a shortest s,e-hyperpath. The first s,t-hyperpath consists of doubly-reachable hyperedges hyperedge of P not yet extracted is f, and Q is the prefix of P up through f (by the proof of Theorem 2), and is a shortest s,e-hyper- path for some in-edge e to sink t, the best of these recov- ered hyperpaths P, which is the hyperpath returned by out-edges. Hence when g was extracted, added itself to the heuristic, is a shortest s,t-hyperpath. f.inedges, and updated f.key by recovering an s,f-hyper- path, in the s,f-superpath S first found during recovery, all Theorem 3 (in combination with Theorem 1) shows hyperedges h ∈ S had h.length = d(h) . Thus by Lemma 2, that, while Shortest Hyperpaths is NP-complete for sin- the recovered s,f-hyperpath was a shortest hyperpath, so gleton-head hypergraphs [14], it is polynomial-time solv- this updated f.key to d(f), and as argued before in the spe- able for singleton-tail hypergraphs. cial case, this key will not change. So again f .key = d(f ). We next show, Generating all source‑sink hyperpaths In this section, we give a practical algorithm for gener- e.key ≤ f . key (1) ating all s,t-hyperpaths in a given hypergraph for a fixed = d(f ) (2) source s and sink t. In our later experimental results, we ≤ d(e) (3) use this algorithm on specific source-sink instances from ≤ e.key . (4) real cell-signaling networks to tractably measure how close our heuristic is to optimal. In the above, inequality (1) holds since e and f are both In general, the technique of inclusion and exclusion on the heap (as f was inserted in the heap either during of Hamacher and Queyranne [27] provides a widely- initialization or when g was extracted), but e is removed applicable method for generating all the solutions to before f. Equation (2) is from our prior analysis of any combinatorial optimization problem whose fea- f. Inequality (3) holds as Q and P are shortest s,f- and s,e- sible solutions are subsets of a ground set—where in hyperpaths respectively, while Q ⊆ P and edge weights our context, hyperpaths are subsets of hyperedges are nonnegative. Lastly, inequality (4) holds since the key from a hypergraph—but it relies on the ability to effi- of e while it is on the heap is the total weight of some s,e- ciently compute a feasible solution that is constrained hyperpath. Thus relations (1)–(4) must all be equalities, to include a given in-set and exclude a given out-set. so e.key = d(e). Interestingly, for hyperpaths, Carbonell et al. [20] have We now argue e.length = d(e) after e is extracted. shown that just determining whether an s,t-hyperpath Since e.key = d(e) is the weight of a hyperpath recovered exists that contains a specified in-set of hyperedges earlier for e, notice (i) there was an in-edge to e on a short- (regardless of the length of the hyperpath) is already est s,e-hyperpath in e.inedges; moreover (ii) all hyper- NP-complete. Consequently, we cannot generate all edges h in the s,e-superpath collected while recovering a s,t-hyperpaths using the standard inclusion-exclusion hyperpath for e were extracted earlier, and hence by the K rieger and Kececioglu Algorithms for Molecular Biology (2022) 17:12 Page 15 of 24 technique, as we cannot tractably solve the resulting hyperedges from the set Keep (though their solutions subproblem that has both in- and out-set constraints. are not required to actually use edges from Keep). The Instead, we generate all hyperpaths through a sim- purpose of this set Keep is to ensure that all subprob- ple and practical algorithm that only involves out-sets, lems ever placed on the queue have distinct Out sets. given in Fig. 6. Function AllHyperpaths returns a (So any given subproblem described by an out-set is list of all s,t-hyperpaths in hypergraph G, leveraging only ever solved once, as argued in the later section a function OneHyperpath that just has to return on the time complexity of the hyperpath enumeration one s,t-hyperpath P in G that does not contain any algorithm in the proof of Theorem 5.) A subproblem hyperedges from set Out (so P ∩ Out =∅ ), or deter- that directly arises from a given one we call a child mine that no such hyperpath exists. This constrained subproblem (as the entire collection of subproblems hyperpath problem with only out-sets is easy to solve: conceptually forms a tree that is explored breadth-first remove all hyperedges in set Out from G, collect all using the queue). Each child subproblem excludes one vertices R and hyperedges F reachable from s in this edge from the hyperpath found for its parent subprob- reduced hypergraph, and if t ∈ R , then find any mini- lem; in this way, the children will generate hyperpaths mal subset P ⊆ F in which t is still reachable from s; that are distinct from their parent hyperpath, if they otherwise if t ∈R , no such hyperpath exists. Func- have a solution. (Once a subproblem becomes infea- tion OneHyperpath can efficiently find such an s,t- sible due to its out-set eliminating any s,t-hyperpath hyperpath P excluding set Out using repeated calls as a solution, it also does not generate further sub- to ForwardReachable (given earlier in Fig. 2). problems.) Though the whole approach never repeat - Function AllHyperpaths uses a queue of subprob- edly solves the same subproblem, in contrast to the lems. A subproblem is described by a pair (Out, Keep) , inclusion-exclusion technique it can generate the same which corresponds to finding an s ,t-hyperpath exclud- hyperpath from different subproblems, so we check ing Out, where any subsequent subproblems that arise whether hyperpath P is distinct from those already from this given subproblem must not exclude any found before adding it to the list A of all hyperpaths. function AllHyperpaths (s, t, G) begin • Generate all s, t-hyperpaths in G Create queue Q • Initialize a queue of subproblems,anda set A of hyperpaths Q.Put (∅, ∅) A := ∅ while not Q.Empty() do begin • Process all subproblemson the queue (Out, Keep) := Q.Get() P := OneHyperpath(s, t, Out,G) • Find an s, t-hyperpath excluding edges in Out if P = ∅ and P ∈ A thenbegin A∪ := {P} • Save the new hyperpath K := Keep • Addall child subproblemstothe queue for e ∈ P with e ∈ Keep do begin Q.Put (Out ∪{e},K) • Children cannot excludeedges in Keep ... K ∪ := {e} • ...or edges excluded by prior siblings end end end return A • Return the set A of all hyperpaths end Fig. 6 Generating all source-sink hyperpaths. Function AllHyperpaths, given source vertex s, sink vertex t, and hypergraph G, returns the set of all s,t-hyperpaths in G. It calls a function OneHyperpath that returns an s,t-hyperpath not containing any hyperedge from a specified set Out, and which returns the empty path if no such hyperpath exists Krieger and Kececioglu Algorithms for Molecular Biology (2022) 17:12 Page 16 of 24 We first prove this enumeration approach is correct, algorithm, when solving k subproblems on a hypergraph of and then analyze its time complexity. size ℓ with m hyperedges, is O k ℓm = O 2 ℓm . Correctness of the hyperpath enumeration algorithm We next show that function AllHyperpaths solves the Proof We bound the running time of function problem of source-sink hyperpath enumeration. AllHyperpaths (in Fig. 6) as follows. Solving a given Theorem 4 (Correctness of hyperpath enumeration) subproblem from the queue by function OneHyperpath The hyperpath enumeration algorithm generates every s,t- (which finds an s,t-hyperpath by iteratively removing hyperpath exactly once. hyperedges from the hypergraph and testing reachability to identif y a minimal set in which t is still reachable from s), Proof For the function AllHyperpaths (in Fig. 6), involves at most m calls to function ForwardReachable. we view the subproblems it processes as forming a A call to ForwardReachable takes O(ℓ) time (by tree: when a problem p is pulled off queue Q and the analysis in the proof of Theorem 1), so solving a causes a new subproblem q to be put onto Q, these subproblem takes O(ℓm) time. If AllHyperpaths subproblems q comprise the children of p in the tree. terminates after processing k subproblems, its total time Each subproblem is specified by a pair (Out, Keep) , is then O(k ℓm). representing the problem of finding an s,t-hyperpath We argue next that the out-sets of subproblems are all that contains no hyperedge in the set Out. Let P be an distinct. Consider the tree of subproblems processed s,t-hyperpath satisfying this out-constraint for prob- by AllHyperpaths (as in the proof of Theorem 4), and lem p. Any other s,t-hyperpath P distinct from P that two arbitrary subproblems x and y in this tree. If one of x also satisfies the out-constraint for p must not contain and y is a descendant of the other, their out-sets are dis- some hyperedge in P. (If P contains every hyperedge tinct, as a child always adds a hyperedge to the out-set of of P yet is distinct, it is a strict superset of P, con- its parent. If neither x nor y is a descendant of the other, let tradicting minimality.) Function AllHyperpaths subproblem u be their nearest common ancestor, subprob- forms the children of p by adding each hyperedge in P lems v and w be the children of u on the paths to x and y to the out-set of p for a different child. (So the hyper- respectively, and assume without loss of generality that paths satisfying the out-constraints of the children child v precedes child w. When child v adds hyperedge e to are all hyperpaths that both satisfy the constraints of the set Out of its parent u, edge e is not added to set Out parent p and are distinct from hyperpath P.) Conse- for any other children of u, and e is also added to set Keep quently hyperpath P, together with every solution to for all children of u following v, including w. Furthermore, the children of p, comprise all possible solutions to the set Out for a descendant is a superset of set Out for its problem p. ancestors, and set Out for a descendant is always disjoint This tree-like process begins at the root with a problem from set Keep for its ancestors. Consequently, the above having an empty out-set (whose solutions are all possi- hyperedge e is in the out-set of subproblem x but not sub- ble s,t-hyperpaths), and continues refining each problem problem y, so their out-sets are again distinct. into its children subproblems until reaching the leaves Since subproblem out-sets are distinct, k = O(2 ) . Com- (which have no solution). u Th s the set consisting of each bining this with the prior total time for hyperpath enumera- hyperpath P found at the nodes of this tree contains all tion yields a worst-case time bound of O(2 ℓm). s,t-hyperpaths. In brief, function AllHyperpaths generates every In practice, typically k ≪ 2 , so the running time is s,t-hyperpath. Since it checks for uniqueness, the enu- much faster than the worst-case bound suggests. Func- meration algorithm generates every source-sink hyper- tion AllHyperpaths can tractably generate all source- path exactly once. sink hyperpaths for large hypergraphs, as shown in the next section on experimental results, since many of its Time complexity of the hyperpath enumeration algorithm subproblems quickly become infeasible for real cell-sign- We now bound the running time of function AllHy- aling networks. perpaths in terms of the number of subproblems it solves, and parameters of the input hypergraph. Experimental results Theorem 5 (Time complexity of hyperpath enumer- We now present results from computational experiments ation) The running time of the hyperpath enumeration on real pathway databases that compare the hyperpath found by our heuristic to the optimal solution. We also K rieger and Kececioglu Algorithms for Molecular Biology (2022) 17:12 Page 17 of 24 remark on the prevalence of biological instances with the NCI-PID datasets. The hypergraphs from the Large cyclic shortest hyperpaths, study the cause of subopti- and Reactome datasets contain respectively 40 and 433 mality in our heuristic, report actual running times, and self-loops, showing that many cyclic hyperpaths are likely discuss biological examples of cyclic hyperpaths. to exist. However, a small number of these self-loops are unreachable, due to an otherwise unreachable vertex appear- Datasets ing in both the head and tail of the hyperedge. The sources We evaluate the quality of our heuristic on four data- and targets used in all our experiments are respectively ver- sets built by combining different annotated signaling tices with no in-edges (or vertices whose only in-edge is an pathways from two pathway databases, NCI-PID and unreachable self-loop), and vertices with no out-edges. The Reactome. NCI-PID [28] is a curated human-pathway number of forward-reachable, backward-traceable, and database containing biochemical reactions for complex doubly-reachable hyperedges shows how many hyperedges assembly, cellular transport, and transcriptional regula- remain after the heuristic prunes the input hypergraph to tion. Reactome [29] also contains curated human sign- the doubly-reachable subgraph before computing a solution. aling pathways, and is actively maintained with new On average, hyperedges from all four hypergraphs have small reactions being continuously added. We constructed head and tail sets, and vertices have low in- and out-degree, hypergraphs from three subsets of NCI-PID pathways reflecting the sparseness of the hypergraphs. used in Ritz et al. [5], named the Small, Medium, and Large datasets. The Small dataset is a small Wnt sign - Experimental setup aling pathway consisting of the union of two pathways: To prepare the hypergraphs from each dataset for our “degradation of β-catenin” and “canonical Wnt signal- experiments, we parsed the union of the pathways in the ing”. The Medium dataset is a larger Wnt signaling path - dataset. We connected a supersource s to all source ver- way including four additional pathways: “noncanonical tices—namely, the input vertices with no in-edges—by a Wnt signaling”, “Wnt signaling network”, “regulation single zero-weight hyperedge whose tail consisted of the of nuclear β-catenin”, and “presenilin action in Notch supersource s and whose head contained all the source and Wnt signaling”, which correspond to non-canonical vertices. We also included in the head of this hyperedge branches of Wnt signaling. The Large dataset contains from supersource s all input vertices whose sole in-edge all NCI-PID pathways. Similarly, the Reactome dataset was a self-loop, since otherwise such a self-loop was not is the union of all Reactome pathways. The NCI-PID and traversable. For each specific target vertex v—namely, Reactome pathways were downloaded in the BioPAX for- each input vertex with no out-edges—we had a sepa- mat [30] from Pathway Commons, and processed using rate version of the hypergraph that differed only by con - a parser from Franzese et al. [22] built on PaxTools [31]. necting this target v to a sink t by a single zero-weight To construct the hypergraphs for each dataset, we ordinary-graph edge directed from v to t, giving us a mapped each entity (such as a protein, small molecule, and specific target instance. Note that these choices for the so on) to a vertex in the hypergraph. Each complex was source and target vertices are reasonable, as they are the represented as a unique vertex distinct from the entities molecules where biologists stopped annotating a given in the complex. Multiple forms of the same protein map pathway. Note also that the supersource s and the sink t to different vertices denoting compartmentalization and remain the same across all target instances in a dataset. post-translational modifications, such as phosphorylation For each target instance, we trimmed the hypergraph and ubiquitination. We treated each variant as a distinct to the doubly-reachable set: the set of hyperedges that entity because many pathways describe the transportation were both forward-reachable from supersource s, and of a protein from one cellular compartment to another, or backward-traceable from sink t. Table 1 gives the aver- the marking of a protein for degradation by ubiquitina- age and maximum size of the forward-reachable, back- tion, necessitating that the corresponding vertices be dis- ward-traceable, and doubly-reachable sets over all target tinct to reflect these variants. Each reaction was mapped instances for a given dataset, which dramatically reduces to a hyperedge, where the reactants and positive regula- the size of the hypergraph over which the heuristic per- tors comprise the tail of the hyperedge, and the products forms most of its computation. comprise the head. All hyperedges were given unit weight, For each target instance, we found a hyperpath from even though the heuristic handles weighted edges, as NCI- supersource s to sink t using our shortest hyperpath PID is missing reaction rates for some reactions. heuristic implemented in the new tool Hhugin [25], Table 1 gives statistics on the hypergraphs constructed and compared its length to the solution of the MILP from each of the four datasets. The hypergraphs are very of Ritz et al. [21] if the heuristic hyperpath was acyclic. sparse: there are fewer hyperedges than vertices in all For each cyclic target instance where the heuristic out- four datasets, with Reactome being even sparser than put a cyclic hyperpath, we exhaustively enumerated all Krieger and Kececioglu Algorithms for Molecular Biology (2022) 17:12 Page 18 of 24 Table 1 Dataset Summaries NCI-PID Measure Small Medium Large Reactome Vertices 56 350 9,009 20,458 Hyperedges 36 228 8,456 11,802 Pathways 2 6 213 2,516 Sources 19 138 3,200 8,296 Targets 10 102 2,636 5,066 Self-loops 1 8 40 433 Unreachable self-loops 1 7 14 32 mean max mean max mean max mean max Tail size 1.8 3 1.9 5 1.9 10 2.4 26 Head size 1.3 3 1.3 4 1.1 5 1.6 28 Forward-reachable set 35 35 192 192 6,169 6,169 4,645 4,645 Backward-traceable set 28 28 49 70 1,198 2,863 4,027 7,021 Doubly-reachable set 27 27 42 60 756 1,836 929 1,725 In-degree 0.8 5 0.8 15 1.0 323 0.9 1,056 Out-degree 1.1 4 1.2 24 1.7 326 1.4 1,167 s,t-hyperpaths, and compared the heuristic hyperpath to of these instances was cyclic due to a self-loop. In gen- the shortest hyperpath found by this enumeration. (Enu- eral, Reactome is much sparser than NCI-PID, and 432 of merating all s,t-hyperpaths for one source-sink instance the 433 self-loops in Reactome are never used in a heu- takes on average around 20 hours in practice—so it is not ristic hyperpath. feasible to perform this enumeration on all acyclic target The abundance of cyclic hyperpaths in the NCI-PID instances.) and Reactome datasets demonstrates the importance of a shortest hyperpath algorithm that properly han- dles cycles. We discuss concrete examples of biological Abundance of cyclic hyperpaths cyclic shortest hyperpaths in a later section on biological Cyclic shortest hyperpaths appear in all four datasets. To examples. take just one example, in the Small and Medium data- sets, the only hyperpath from ubiquitinated β-catenin to APC is cyclic, so for this target instance the acyclic Quality of the hyperpath heuristic shortest-hyperpath MILP fails to find a solution. Admit - To determine the quality of our hyperpath heuristic, tedly this particular source-target pair is specially chosen, we compared the length of the heuristic hyperpath to as ubiquitinated β-catenin has an in-edge and APC has an optimal shortest hyperpath. In general, no practical an out-edge so they would not normally be considered exact algorithm is currently known for finding a short - under our definition of sources and targets. Nevertheless, est source-sink hyperpath. Consequently, on the target this pair demonstrates there do exist cyclic hyperpaths instances where the heuristic found a cyclic hyperpath, in the NCI-PID database—even in the union of just two we determined the optimum by generating all source- pathways—that are missed by the current state-of-the-art sink hyperpaths and retaining the shortest one, using when computing only acyclic shortest hyperpaths. our algorithm for hyperpath enumeration. On the target In the Large dataset, 38 target instances have cyclic instances where the heuristic found an acyclic hyperpath, heuristic hyperpaths. Of these, 22 were cyclic because we compared its length just to the optimal hyperpath of a self-loop, and 16 were cyclic due to a non-trivial returned by the MILP for shortest acyclic hyperpaths. An cycle. For all these instances, no acyclic hyperpath exists even shorter cyclic hyperpath could exist for these latter between supersource s and sink t. It is likely that even instances, but finding it by enumerating all hyperpaths is more cycles exist within the hypergraph from the Large simply too time-consuming to carry out for every such dataset, as there were 8 self-loops that were not on any instance. hyperpath found by the heuristic. Table 2 summarizes the quality of the heuristic on In the Reactome dataset, the heuristic found a cyclic acyclic instances. On the Small, Medium, and Reac- shortest hyperpath on 22 target instances, and only one tome datasets, the heuristic hyperpath is optimal on all K rieger and Kececioglu Algorithms for Molecular Biology (2022) 17:12 Page 19 of 24 target instances, meaning the heuristic hyperpath and the of these suboptimal instances, where the heuristic must shortest acyclic hyperpath from the MILP have the same now discriminate among a much higher number of length. On the Large dataset, the heuristic is optimal on hyperpaths that have much greater path-length variance. over 99% of the instances, demonstrating the quality of The fraction of all hyperpaths that are optimal is fairly the heuristic on these biological datasets. The small frac - small, with only around 3% being optimal for the median tion of instances where our heuristic was suboptimal are instance. Even faced with many alternate solutions, the discussed in more detail in the next subsection. heuristic still found a hyperpath that was nearly optimal: Table 3 summarizes the quality of the heuristic on the median difference between the length of the heuris - instances where it output a cyclic hyperpath. On all these tic hyperpath and the shortest hyperpath was 1 hyper- cyclic instances, the acyclic MILP failed to find a solu - edge, the maximum difference was 6 hyperedges, and tion, so we could not compare the heuristic to an optimal the median ratio of the length of the heuristic hyperpath hyperpath other than by exhaustively enumerating all to the shortest hyperpath was 1.1 (so it was only 10% hyperpaths and picking the shortest one—which verified longer). Next we investigate what could be causing this that the heuristic on these instances in fact found an opti- suboptimality. mal solution. Cyclic instances from the Reactome (and The suboptimality of the heuristic is likely coming from Large) datasets contain many distinct hyperpaths, with the repeated calls to the function RecoverShortHy- a median of 22 (respectively 3) hyperpaths, and a maxi- perpath, which proceeds in two phases. In phase (I), mum of 136 (respectively 364) hyperpaths. The hyper - this function recovers an s,e-superpath S, relying on in- paths tend to vary in length, with a maximum difference edge lists to hyperedges f, where the in-edge list for f con- between the length of the longest and shortest hyperpath tains only hyperedges removed from the heap prior to f, of 15 (respectively 43) hyperedges, and a median dif- which may exclude hyperedges in a shortest s,e-hyper- ference of [2, 3] (respectively 1) hyperedges. This dem - path. In phase (II), this function trims superpath S to a onstrates that the heuristic is discriminating between hyperpath by greedily considering hyperedges in S for hyperpaths of different lengths and choosing the best removal, which may also remove a hyperedge in an opti- hyperpath over worse hyperpaths, further indicating the mal s,e-hyperpath. quality of the heuristic. In every cyclic target instance, all To determine whether the recover or trim phases were s,t-hyperpaths were cyclic, and many shared a common responsible for suboptimality, we ran the following exper- cycle; most of the hyperedges occurring in one hyperpath iment. After the heuristic determined its estimated path but not another appeared outside this shared cycle. length for every hyperedge in the hypergraph, we called RecoverShortHyperpath on each in-edge to the tar- Studying the suboptimality of the heuristic get where we ran its recovery phase but stopped before We call the small number of target instances in these its trimming phase, and unioned together the resulting experiments where the heuristic found a known subopti- s,t-superpaths from each in-edge to create one large s,t- mal hyperpath its suboptimal instances. Table 4 summa- superpath F. We then took an optimal s,t-hyperpath P rizes these 23 suboptimal instances, which are all from and examined whether P ⊆ F : in other words, whether the Large NCI-PID dataset, and are all acyclic instances. the recovery phase permitted the heuristic to potentially (The heuristic was optimal on all cyclic instances, and all find an optimal hyperpath. We discovered that for all Reactome, Small, and Medium instances. We men- 23 suboptimal instances P ⊆F , indicating phase (I) of tion as well that the maximum values across the table RecoverShortHyperpath that recovers an s,e-super- occur in distinct target instances.) To gain insight into path was forcing the heuristic to be suboptimal on every why the heuristic found a suboptimal solution on these instance. instances, we enumerated all source-sink hyperpaths for On the other hand the trimming phase of Recov- every suboptimal instance. (This enumeration also veri - erShortHyperpath could also be leading to fied that on all suboptimal instances, the acyclic MILP in fact found a shortest hyperpath, as there was no shorter Table 2 Acyclic Instance Summaries cyclic hyperpath.) Hyperpath enumeration confirmed that these subopti - NCI-PID mal instances are much harder than the cyclic instances. Measure Small Medium Large Reactome The median number of hyperpaths is nearly 140 times Target instances 10 102 2,636 5,066 higher for suboptimal NCI-PID instances compared Reachable instances 10 90 2,220 2,432 to cyclic NCI-PID instances, and the length difference Acyclic instances 9 89 2,182 2,410 between the longest and shortest hyperpaths is 30 times Heuristic was optimal 100% (9) 100% (89) 99% (2,159) 100% (2,410) larger. This stark contrast indicates the inherent difficulty Krieger and Kececioglu Algorithms for Molecular Biology (2022) 17:12 Page 20 of 24 Table 3 Cyclic Instance Summaries Measure NCI-PID Small Medium Large Reactome Target instances 10 102 2,636 5,066 Reachable instances 10 90 2,220 2,432 Cyclic instances 1 1 38 22 Heuristic was optimal 100% 100% 100% 100% Non-trivial cycles 1 1 22 21 median max median max median max median max 1 1 1 1 3 364 22 136 Number of hyperpaths 0 0 0 0 1 43 [2,3] 15 Path length range Total number of hyperpaths for a cyclic target instance Difference between the length of the longest and shortest hyperpaths suboptimality, which we investigated as follows. For each Table 4 Suboptimal Instance Summaries suboptimal instance, we modified the recovery phase Reactome 0 / 2,432 Suboptimal instances of RecoverShortHyperpath to use all in-edges in NCI-PID 23 / 2,220 the hypergraph to each hyperedge, rather than the in- medianmax edge lists collected by the heuristic. (In this situation, the recovered superpath F definitely contains a shortest 418 1,470 Number of hyperpaths* hyperpath P.) Phase (II) then trimmed this superpath as 30 50 Path length range normal. We discovered that the trimming phase often Heuristic path-length difference1 6 fails to find a shortest hyperpath within this larger super - Heuristic path-length ratio 1.1 1.3 path (which was the entire doubly-reachable subgraph). Number of shortest hyperpaths 7110 This indicates that while phase (I) is definitely caus - Fraction of shortest hyperpaths 3.1% 26.7% ing suboptimality, simply changing phase (I) to recover Total number of hyperpaths for a target instance a larger superpath may in turn lead to suboptimality in Difference between the length of the longest and shortest hyperpaths phase (II). Biological examples Implementation and running time We now discuss three instances with cyclic shortest The heuristic is implemented in Python 2.7.3, compris - hyperpaths from the Large and Reactome datasets. ing around 500 lines of code. The parser used to convert The hyperpath found by our heuristic for these three the BioPAX format into hypergraphs is from [22]. For instances is optimal (as was the case for all instances directed hypergraph representation and reachability we where the heuristic found a cyclic path), and is drawn in used Halp (github.com/Murali-group/halp/). Figs. 7, 8, and 9. We describe the hypergraph structure All heuristic and hyperpath enumeration source code is and constituent reactions for each instance. available at http://hhugin.cs.arizona.edu. Assembly of the JUP/DSP complex The first example Experiments were run on a laptop with a 2.9 GHz Intel captures the assembly of the JUP/DSP complex from the Core i5 CPU, and 16 GB of RAM. The running time of Large dataset. Figure 7 shows the shortest hyperpath the hyperpath heuristic was 55 seconds on average for returned by our heuristic with the JUP/DSP complex as the instances from the Large and Reactome datasets, the target. All vertices at the top of the figure are con - which have just under 1000 doubly-reachable hyperedges nected to the supersource. on average. Memory usage was low, with the heuristic This hyperpath includes seven hyperedges from four using less than 2 GB of memory. different NCI-PID pathways: “E-cadherin signaling in Enumerating all hyperpaths for the instances is time- the nascent adherens junction” (hyperedges e and e ), consuming, taking 20.4 hours on average for the subopti- 1 “Posttranslational regulation of adherens junction sta- mal instances with a maximum time of 53.8 hours, which bility and dissassembly” (hyperedges e , e and e ), is not practical to carry out for all 4600 target instances. 2 6 7 “Signaling events mediated by PRL” (hyperedge e ), 3 K rieger and Kececioglu Algorithms for Molecular Biology (2022) 17:12 Page 21 of 24 and “Signaling events mediated by hepatocyte growth Phosphorylation of p53 The second example captures factor receptor (c-Met)” (hyperedge e ). We briefly the phosphorylation of p53 by NUAK1 (ARK5) from describe the key events in this hyperpath. Protein the Reactome dataset. The heuristic hyperpath, which γ-catenin (also known as junction plakoglobin or JUP) is optimal, is shown in Fig. 8. All of the vertices at the is initially complexed with Cadherin 1 (CDH1) in the top are connected to the supersource. tail of hyperedge e . In hyperedge e , the metallopro- Hyperedge e shows the complex formation of 1 2 1 tease meprinβ cleaves E-cadherin (CDH1), releas- FOXO3 and FOXO4 with the STK11 gene, allowing for ing it from its complex with α-catenin (CTNNA1) and the transcription of the gene in hyperedge e . Hy p er- δ-catenin (CTNND1) [32]. The CDH1/JUP complex adds edges e and e deal with the transcription of protein 3 4 α-catenin (CTNNA1 in hyperedge e ) and CTNND1 p53 (TP53), and its formation into a homotetramer. 2+ and Ca (in hyperedge e ) to form a five-member com - The p53 tetramer then forms a complex with NUAK1 plex. Hepatocyte growth factor (HGF) activates the (ARK5) and STK11 in hyperedge e , allowing for the proto-oncogene tyrosine-protein kinase Src (hyper- phosphorylation of NUAK1 via ATP in hyperedge e . edge e ) [33]. Src regulates the breakup of this complex Once NUAK1 is phosphorylated, it directly phospho- into its individual components [34] (hyperedge e ), f re e- rylates p53 [35], activating it and allowing it to assist ing JUP to bind with DSP and creating the two cycles in in DNA damage repair. The final hyperedge e , show n this hyperpath via CTNNA1 and CTNNB1. The hyper - in red, breaks apart the p53 tetramer/NUAK1/STK11 path culminates in the formation of a complex between complex, resulting in a cycle of free STK11. This hyper - desmoplasmin (DSP) and JUP. path features two transcriptional hyperedges e and e , 2 3 The hypergraph for this instance is large, with shown dotted. 6168 forward-reachable hyperedges, 2642 backward- This example from Reactome is slightly smaller than traceable hyperedges, and 1665 doubly-reachable the example from the Large dataset, with only 4645 hyperedges. There is no acyclic hyperpath from the forward-reachable edges, 7021 backward-traceable supersource to JUP/DSP. When enumerating all s,t- edges, and 1632 hyperedges in the doubly-reachable hyperpaths for this instance, there were 16 alternate set. There was no acyclic hyperpath for this instance. In hyperpaths, and the longest hyperpath had 3 more contrast to the first example, no alternate hyperpaths to hyperedges than the heuristic path, which was verified the target exist in the hypergraph. to be optimal. Fig. 7 Cyclic shortest hyperpath to the JUP/DSP complex in the Large dataset. All vertices in the hyperpath connected to the supersource are shown at the top of the figure. The hyperedges in this hyperpath come from four different pathways, and show the different complexes JUP participates in until finally being free to bind with desmoplakin (DSP). Positive regulators of reactions are shown by dashed lines ending in a disc. Hyperedges e , e , and e , shown in red, create two separate cycles back to α-catenin and δ-catenin 1 5 6 Krieger and Kececioglu Algorithms for Molecular Biology (2022) 17:12 Page 22 of 24 Fig. 8 Cyclic shortest hyperpath to phosphorylated p53 in the Reactome dataset. All vertices in the hyperpath connected to the supersource are shown at the top of the figure. The hyperedges in this hyperpath show the transcription of STK11 and p53 ( TP53) before NUAK1 (ARK5) participates in the phosphorylation of the p53 tetramer. Hyperedges e , e , and e , shown in red, create a cycle when the phosphorylation of p53 breaks up a 5 6 7 complex, returning STK11 to its solitary state. Hyperedges e and e show transcription, and are drawn dotted 2 3 Fig. 9 Cyclic shortest hyperpath to the HEY2/ARNT complex in the Large dataset. All vertices from the hyperpath connected to the supersource are shown at the top of the figure. Positive regulators of reactions are shown by dashed lines ending in a disc. The eleven hyperedges span three different NCI-PID pathways, and show the events upstream of HEY2 transcription, ultimately culminating in its repression of ARNT. The cycle between hyperedges e and e , shown in red in the figure, recreates nuclear HIF1A. Edge e , shown dotted, is a template reaction, where the 9 11 8 NOTCH1/RBPJ complex upregulates the transcription of the protein HEY2 HEY2/ARNT complex assembly The final example we (aryl hydrocarbon receptor nuclear translocator or discuss is the formation of the HEY2/ARNT complex ARNT). “Hairy/enhancer-of-split related with YRPW from the Large dataset. The shortest hyperpath from motif protein 2” (HEY2) is a transcriptional repressor the supersource to HEY2/ARNT, which was found by [37] that physically interacts with ARNT (hyperedge the heuristic, is shown in Fig. 9. Once again, the sources e ). The hyperdges e and e show a pair of reactions 11 9 11 are at the top of the figure, with the hyperedge from the where HIF1 is formed and then repressed by HEY2. supersource not shown. Hyperedges e –e capture events in the Notch signaling 1 7 This hyperpath with eleven edges spans three path - pathway that occur upstream of the formation of the ways: “Notch signaling pathway” (hyperdges e –e ) , transcriptional activator formed by the complex of the 1 7 “Hypoxic and oxygen homeostasis regulation of HIF- nuclear protein “Recombining binding protein suppres- 1-α ” (hy p ere dge s e , e ), and “Notch-mediated HES/ sor of hairless” (RBPJ) and Notch intracellular domain 9 10 HEY network” (hyperedges e , e ). Hy p oxi a-induc ible (NICD). The expression of protein HEY2 is up-regu - 8 11 factor 1 (HIF-1) is a heterodimeric transcription fac- lated by the NICD/RBPJ complex [38]. tor that regulates genes that are induced by hypoxia This signaling hypergraph was markedly smaller than [36]. It is a complex of HIF-1α (HIF1A) and HIF-1β the other two examples. The hypergraph had 6169 K rieger and Kececioglu Algorithms for Molecular Biology (2022) 17:12 Page 23 of 24 enumeration algorithm, and performed all experiments. All authors read and forward-reachable hyperedges, but only 23 hyperedges approved the final manuscript. were backward-traceable, hence only 23 hyperedges were doubly-reachable, due to the poor connectivity Funding This research was supported by the US National Science Foundation through of the HEY2/ARNT complex to other vertices in the grants CCF-1617192 and IIS-2041613 to JK. graph. Even though the hypergraph is small, the hyper- path shown is not the only shortest hyperpath to the Availability of data and materials Source code for the hyperpath heuristic and the hyperpath enumeration target, as e and e can be replaced by hyperedges con- 2 3 algorithm, as well as the hypergraphs from the parsed Reactome, Small, taining Jagged2 instead of Jagged1. Medium, and Large datasets, is available free for non-commercial use at http://hhugin.cs.arizona.edu. Conclusions Declarations We have presented the first heuristic for Shortest Hyper - paths in general directed hypergraphs with positive edge Ethics approval and consent to participate weights, where the length of a hyperpath is the sum of the Not applicable. weights of its hyperedges. The heuristic handles cycles, is Consent for publication guaranteed to be efficient, finds optimal hyperpaths for sin - Not applicable. gleton-tail hypergraphs, and is highly accurate in practice. It Competing interests matches the state-of-the-art mixed-integer linear program The authors declare that they have no competing interests. for shortest acyclic hyperpaths on over 99% of all instances from the NCI-PID and Reactome databases, and surpasses Received: 10 January 2022 Accepted: 1 February 2022 Published: 26 May 2022 the state-of-the-art on all instances where no acyclic hyper- path exists. Moreover, exhaustively enumerating all source- sink hyperpaths using our hyperpath enumeration algorithm References demonstrates that on every cyclic instance from these data- 1. Li Y, McGrail DJ, Latysheva N, Yi S, Babu MM, Sahni N. Pathway perturba- bases, the heuristic was provably optimal. tions in signaling networks: linking genotype to phenotype. Semin Cell Dev Biol. 2020;99:3–11. 2. Sharan R, Ideker T. Modeling cellular machinery through biological net- Further research work comparison. Nat Biotechnol. 2006;24(4):427–33. Given that we can quickly find hyperpaths that are close to 3. Vidal M, Cusick ME, Barabási A-L. Interactome networks and human optimal in real cell-signaling hypergraphs, several research disease. Cell. 2011;144(6):986–98. 4. Klamt S, Haus U-U, Theis F. Hypergraphs and cellular networks. PLoS directions beckon. While the inapproximability of Short- Comput Biol. 2009;5(5):1000385. est Hyperpaths [16] rules out a constant-factor approxima- 5. Ritz A, Tegge AN, Kim H, Poirel CL, Murali TM. Signaling hypergraphs. tion unless P=NP , is there an approximation algorithm Trends Biotechnol. 2014;32(7):356–62. 6. Ramadan E, Tarafdar A, Pothen A. A hypergraph model for the yeast whose approximation ratio on hypergraphs with n vertices protein complex network. In: Proceedings of the 18th Parallel and Distrib- matches the theoretical lower bound of ln n ? More practi- uted Processing Symposium. 2004. p. 189–196. cally, given that in our experiments our heuristic was sub- 7. Hu Z, Mellor J, Wu J, Kanehisa M, Stuart JM, DeLisi C. Towards zoomable multidimensional maps of the cell. Nat Biotechnol. 2007;25(5):547–54. optimal only on acyclic instances, is there a fast method for 8. Christensen TS, Oliveira AP, Nielsen J. Reconstruction and logical acyclic hyperpaths that outperforms our heuristic? Since a modeling of glucose repression signaling pathways in Saccharomyces user would like to know how close to optimal a computed cerevisiae. BMC Syst Biol. 2009;3(1):7. 9. Heath LS, Sioson AA. Semantics of multimodal network models. IEEE/ hyperpath is for their particular input graph, is there an effi - ACM Trans Computat Biol Bioinform. 2009;6(2):271–80. cient heuristic that, as well as giving an upper bound on the 10. Ramadan E, Perincheri S, Tuck D. A hyper-graph approach for analyzing optimum through its hyperpath, also outputs a lower bound transcriptional networks in breast cancer. In: Proceedings of the 1st ACM Conference on Bioinformatics and Computational Biology (ACM-BCB). on the length of the shortest hyperpath? Many intriguing 2010:556–562. research avenues are open. 11. Zhou W, Nakhleh L. Properties of metabolic graphs: biological organiza- tion or representation artifacts? BMC Bioinform. 2011;12(1):132. Acknowledgements 12. Ritz A, Murali TM. Pathway analysis with signaling hypergraphs. In: Pro- We especially wish to thank T.M. Murali for introducing us to the problem of ceedings of the 5th ACM Conference on Bioinformatics, Computational shortest hyperpaths in cell-signaling hypergraphs, for orienting us to the biol- Biology, and Health Informatics (ACM-BCB). 2014. p. 249–258. ogy literature, and for discussing the JUP/DSP biological example. In addition, 13. Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P. Molecular Biol- we thank Anna Ritz for discussing the NCI-PID and Reactome datasets, and for ogy of the Cell. New York: Garland Science; 2007. providing the BioPax parser. We also thank the anonymous reviewers for their 14. Italiano GF, Nanni U. Online maintenance of minimal directed hyper- helpful comments. graphs. Technical Report, Department of Computer Science, Columbia University. 1989. This paper is an extended journal version of a prior conference paper by the 15. Gallo G, Longo G, Pallottino S, Nguyen S. Directed hypergraphs and coauthors [39]. applications. Discret Appl Math. 1993;42(2–3):177–201. 16. Ausiello G, Laura L. Directed hypergraphs: introduction and fundamental Author contributions algorithms—a survey. Theor Comput Sci. 2017;658:293–306. SK and JK designed and analyzed the hyperpath heuristic and hyperpath enu- meration algorithm. SK implemented the hyperpath heuristic and hyperpath Krieger and Kececioglu Algorithms for Molecular Biology (2022) 17:12 Page 24 of 24 17. Cottret L, Vieira Milreu P, Acuña V, Marchetti-Spaccamela A, Viduani Publisher’s Note Martinez F, Sagot M-F, Stougie L. Enumerating precursor sets of target Springer Nature remains neutral with regard to jurisdictional claims in pub- metabolites in a metabolic network. In: Proceedings of the 8th Workshop lished maps and institutional affiliations. on Algorithms in Bioinformatics ( WABI). 2008. p. 233–244. 18. Acuña V, Milreu PV, Cottret L, Marchetti-Spaccamela A, Stougie L, Sagot M-F. Algorithms and complexity of enumerating minimal precursor sets in genome-wide metabolic networks. Bioinformatics. 2012;28(19):2474–83. 19. Andrade R, Wannagat M, Klein CC, Acuña V, Marchetti-Spaccamela A, Milreu PV, Stougie L, Sagot M-F. Enumeration of minimal stoichiometric precursor sets in metabolic networks. Algorithm Mol Biol. 2016;11(1):25. 20. Carbonell P, Fichera D, Pandit SB, Faulon J-L. Enumerating metabolic pathways for the production of heterologous target chemicals in chassis organisms. BMC Syst Biol. 2012;6(1):10. 21. Ritz A, Avent B, Murali TM. Pathway analysis with signaling hypergraphs. IEEE/ACM Trans Comput Biol Bioinform. 2017;14(5):1042–55. 22. Franzese N, Groce A, Murali TM, Ritz A. Hypergraph-based connectiv- ity measures for signaling pathway topologies. PLoS Comput Biol. 2019;15(10):1–26. 23. Schwob MR, Zhan J, Dempsey A. Modeling cell communication with time-dependent signaling hypergraphs. IEEE/ACM Trans Comput Biol Bioinform. 2021;18(3):1151–63. 24. Nielsen LR, Pretolani D. A remark on the definition of a B-hyperpath. Tech- nical Report, Department of Operations Research, University of Aarhus. 25. Krieger S, Kececioglu J. Hhugin: hypergraph heuristic for general short- est source-sink hyperpaths, version 1.0. 2021 http:// hhugin. cs. arizo na. edu 26. Cormen TH, Leiserson CE, Rivest RL, Stein C. Introduction to Algorithms. 3rd ed. Cambridge, Massachusetts: MIT Press; 2009. 27. Hamacher HW, Queyranne M. K best solutions to combinatorial optimiza- tion problems. Annal Oper Res. 1985;4:123–43. 28. Schaefer CF, Anthony K, Krupa S, Buchoff J, Day M, Hannay T, Buetow KH. PID: the pathway interaction database. Nucl Acids Res. 2009;37:674–9. 29. Joshi-Tope G, Gillespie M, Vastrik I, D’Eustachio P, Schmidt E, de Bono B, Jassal B, Gopinath GR, Wu GR, Matthews L, Lewis S, Birney E, Stein L. Reactome: a knowledgebase of biological pathways. Nucl Acids Res. 2005;33:428–32. 30. Demir E, Cary MP, Paley S, et al. The BioPAX community standard for pathway data sharing. Nat Biotechnol. 2010;28(9):935–42. 31. Demir E, Babur Ö, Rodchenkov I, Aksoy BA, Fukuda KI, Gross B, Sümer OS, Bader GD, Sander C. Using biological pathway data with Paxtools. PLoS Comput Biol. 2013;9(9):1003194. 32. Huguenin M, Müller EJ, Trachsel-Rösmann S, Oneda B, Ambort D, Sterchi EE, Lottaz D. The metalloprotease meprinbeta processes E-cadherin and weakens intercellular adhesion. PLoS One. 2008;3(5):2153. 33. Palacios F, Tushir JS, Fujita Y, D’Souza-Schorey C. Lysosomal targeting of E-cadherin: a unique mechanism for the down-regulation of cell-cell adhesion during epithelial to mesenchymal transitions. Mol Cell Biol. 2005;25(1):389–402. 34. Miravet S, Piedra J, Castaño J, Raurell I, Francì C, Duñach M, García de Herreros A. Tyrosine phosphorylation of plakoglobin causes contrary effects on its association with desmosomes and adherens junction com- ponents and modulates β-catenin-mediated transcription. Mol Cell Biol. 2003;23(20):7391–402. 35. Hou X, Liu J-E, Liu W, Liu C-Y, Liu Z-Y, Sun Z-Y. A new role of NUAK1: directly phosphorylating p53 and regulating cell proliferation. Oncogene. 2011;30(26):2933–42. Re Read ady y to to submit y submit your our re researc search h ? Choose BMC and benefit fr ? Choose BMC and benefit from om: : 36. Jiang BH, Rue E, Wang GL, Roe R, Semenza GL. Dimerization, DNA bind- ing, and transactivation properties of hypoxia-inducible factor 1. J Biol fast, convenient online submission Chem. 1996;271(30):17771–8. thorough peer review by experienced researchers in your field 37. Chin MT, Maemura K, Fukumoto S, Jain MK, Layne MD, Watanabe M, Hsieh CM, Lee ME. Cardiovascular basic helix loop helix factor 1, a novel rapid publication on acceptance transcriptional repressor expressed preferentially in the developing and support for research data, including large and complex data types adult cardiovascular system. J Biol Chem. 2000;275(9):6381–7. • gold Open Access which fosters wider collaboration and increased citations 38. Iso T, Chung G, Hamamori Y, Kedes L. HERP1 is a cell type-specific primary target of Notch. J Biol Chem. 2002;277(8):6598–607. maximum visibility for your research: over 100M website views per year 39. Krieger S, Kececioglu J. Fast approximate shortest hyperpaths for inferring pathways in cell signaling hypergraphs. In: Proceedings of the 21st ISCB At BMC, research is always in progress. Workshop on Algorithms in Bioinformatics ( WABI). Leibniz International Learn more biomedcentral.com/submissions Proceedings in Informatics, vol 201. 2021. p. 1–20.
Algorithms for Molecular Biology – Springer Journals
Published: May 26, 2022
Keywords: Systems biology; cell signaling networks; reaction pathways; directed hypergraphs; shortest hyperpaths; efficient heuristics; hyperpath enumeration
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.