Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Comprehensive overview and assessment of miRNA target prediction tools in human and drosophila melanogaster

Comprehensive overview and assessment of miRNA target prediction tools in human and drosophila... MicroRNAs (miRNAs) are small non-coding RNAs that control gene expression at the post- transcriptional level through complementary base pairing with the target mRNA, leading to mRNA degradation and blocking translation process. Any dysfunctions of these small regulatory molecules have been linked with the development and progression of several diseases. Therefore, it is necessary to reliably predict potential miRNA targets. A large number of computational prediction tools have been developed which provide a faster way to find putative miRNA targets, but at the same time their results are often inconsistent. Hence, finding a reliable, functional miRNA target is still a challenging task. Also, each tool is equipped with different algorithms, and it is difficult for the biologists to know which tool is the best choice for their study. This paper briefly describes fundamental of miRNA target prediction algorithms, discuss frequently used prediction tools, and further, the performance of frequently used prediction tools have been assessed using experimentally validated high confident mature miRNAs and their targets for two organisms Human and Drosophila Melanogaster. Both Drosophila Melanogaster and Human supported miRNA target prediction tools have been evaluated separately to find out best performing tool for each of these two organisms. In the human dataset, TargetScan showed the best results amongst the other predictors followed by the miRmap and microT, whereas in the D. Melanogaster dataset, MicroT tool showed the best performance followed by the TargetScan in the comparison of other tools. Keywords microRNA target prediction, target prediction algorithm, transcript prediction, computational tools, feature extraction. Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza 1. Introduction Micro RNAs (miRNAs) are short endogenous RNAs nearly 22 nucleotides long originating from the non-coding RNAs (Bartel, 2004). miRNAs were first identified in Caenorhabditis elegansin the year 1993 using genetic methods (Lee et al., 1993). miRNAs are expressed from long transcripts produced in animals, plants, viruses, and single-celled eukaryotes (Liu et al., 2012). miRNAs have become the focus of many research because of their significant role in degradation of mRNA, post-translational inhibition through complimentary base pairing (He & Hannon, 2004), and ability to control many biological processes such as homeostasis (Liu et al., 2012). miRNA regulates the target mRNA to make adjustments to the forming corresponding protein, which dysregulates the functions of miRNA, thereby leading to several human diseases (Bing et al., 2012). Cancer is the most common disease caused by miRNAs and their differential expression leads to different types of cancer such as lung cancer (Yanaihara et al, 2006), prostate cancer (Porkka et al., 2007), and ovarian cancer (Yang et al., 2008). miRNAs have also been implicated for causing neurological disorders such as Alzheimer’s disease (Hébert et al., 2009), Schizophrenia (Beveridge et al., 2010), and multiple sclerosis (Cox et a., 2010). A large amount of miRNA data has been generated in recent years due to the major efforts in identifying their targets, and inferring their functions which is difficult to explore and assess by using only biological methods. Therefore, the computational methods in biological research provide statistical approaches to assess their quality and accuracy. In the last few years, several computational tools have been developed for the prediction of miRNA targets, but prediction results greatly vary among these tools due to differences in their algorithms and training features. Therefore, it is difficult for a scientist to choose the best miRNA target prediction tool. In this paper, we have evaluated the performance of 11 miRNA target prediction tools for human as well as Drosophila melanogaster datasets providing the comprehensive summary of the considered tools, their target prediction assessment based on various metrics, including accuracy, number of targets predicted, sensitivity, specificity, true positive rates, false positive rates, and so on. Many approaches have been made in the past few years to assess and evaluate the performance of existing miRNA target prediction tools. Mendes et al. (2009) evaluated the miRNA gene finding methods and target identification, reporting some problems in the existing methods. Bartel (2009) discussed the features of available miRNA target prediction tools, highlighting reasons for the differences among their performances, including recognition of the target nucleotide opposite to the miRNA first nucleotide. According to Bartel (2009), TargetScan rewards an ‘A’ across from the position 1, whereas other algorithms with seed pairing feature rewards a Watson-Crick (WC) match across this position (Krek et al., 2005; Lewis et al., 2005; Stark et al., 2005; Gaidatzis et al., 2007). Ruby et al., (2007) identified many conserved miRNAs through large-scale sequencing, which were not predicted by the tools. Alexiou et al., (2009) tested eight miRNA target predictors on a small Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza datasets of five and sixty one miRNAs and proposed that the targets predicted by more than one algorithm are better than the other targets. Fan and Kurgan, (2014) studied seven miRNA predictors and the TargetScan and miRMap showed the overall high quality. They proposed that the prediction of target sites is more difficult than predicting the target genes due to the lower predictive quality of the prediction tools at the duplex level. Srivastava et al., (2014) evaluated the performance of eleven miRNA target predictors on the plant datasets. Akhtar et al., (2015) assessed the accuracy of miRNA predictors and reported the prediction of large number of false positives as the major flaw; but their accuracy in the prediction of true targets is still questionable. This review is a comprehensive analysis of the performance of the existing miRNA target prediction tools. 2. Common features of miRNA target prediction tools Computational methods are used to identify that how miRNAs specifically targets the mRNAs. Following are a few common features on which most of the miRNA target prediction tools are based (Sarah et al., 2014). 2.1 Seed match The region of miRNA starting from 5’-end to the 3’-end consisting of first 2-8 nucleotides is called the seed sequence (Lewis et al., 2005). It is considered as Watson-Crick (WC) match between a miRNA and its target by most of the prediction tools. An alignment between the miRNA and its target lying within the WC matching without any gaps in between is considered as the perfect seed match. Different algorithms consider different types of seed matches. The most commonly considered seed matches are as follows (Lewis et al., 2003; Kreck et al., 2005; Brennecke et al., 2005): a. 6-mer: a perfect seed matching for six nucleotides between the miRNA seed and the mRNA. b. 7-mer-m8: a perfect seed match between 2-8 nucleotides of miRNA seed sequence. c. 7mer-A1: a perfect seed match between 2-7 nucleotides of miRNA seed sequence in addition to an A across the miRNA first nucleotide. d. 8-mer: a perfect seed match between nucleotides 2-8 of miRNA seed sequence in addition to an A across the miRNA first nucleotide. 2.2 Free energy It is a Gibb’s free energy which is used as a measure of stability of miRNA structure by many tools. When a miRNA binds to the target mRNA resulting to a stable structure, it is considered as the most likely target of that miRNA. The reactions with more negative delta-G are less reactive, and therefore, have more stability. The hybridization of miRNA with its target mRNA provide information about the high and low free energy regions and delta-G predicts the strength of bonding between the miRNA and its target mRNA (Yue et al., 2009). Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza 2.3 Conservation It is the occurrence of a same sequence across the species. This feature analyzes the regions such as the miRNA, 3’-UTR, 5’-UTR. It has been found that seed region is more conserved than the other regions (Lewis et al., 2003). A small portion of miRNA which interacts with the target mRNA has conserved pairing which compensates for the mismatched seed and known as ‘3’-Compensatory sites’ (Friedman et al., 2009). Conservation analysis helps to predict whether a predicted miRNA target is functional or not. 2.4 Site accessibility It is the measure of the ease of miRNA by which it may locate its target mRNA to hybridize with it. The miRNA first binds to a short accessible region of a mRNA and then their hybridization are marked by the unfolding of the mRNA secondary structure after the completion of binding of the miRNA. Hence, to find the most probable target of the miRNA, the amount of energy required to make a site accessible is evaluated. There are a few other features which are used in most of the target prediction algorithms. GU Wobble seed match calculates the chances of a G pairing with a U instead of C (Doench et al., 2004). Position Contribution determines the position of a target sequence within the mRNA (Grimson et al., 2007). Seed pairing stability is the free energy change calculated for a predicted duplex (Garcia et al., 2011). Target-site abundance determines the number of sites occurring in the 3’-UTR (Garcia et al., 2011). Local AU content is the concentration of A and U nucleotides which flank in the corresponding seed region (Friedman et al., 2009; Betel et al., 2010). 3’-Compensatory pairing is the pairing region (12- 17 nts) in which the base pairs match with miRNA nucleotides. 3. miRNA databases Basically, there are few online miRNA databases which provide all the experimentally validated miRNAs belonging to different species, including miRBase (Griffiths-Jones et al., 2006; 2008), TarBase (Sethupathy et al., 2006; Vlachos et al., 2014), and miRTarbase (Chou et al., 2016). miRBase is an online database which is available at http://www.mirbase.org/ (Griffiths-Jones et al., 2006). The miRBase is an online searchable archive of published miRNA sequences and annotation. Each record in miRBase signifies a predicted hairpin-portion of a miRNA transcript called as ‘mir’ along with information of location and sequence of the mature miRNA. The miRBase also stores sequences of all the published mature miRNA, along with their predicted source hairpin precursors and annotation relating to their discovery, structure and function. miRBase has a nomenclature scheme for all predicted targets, for example, has-miR-121, in which the first three alphabets signify the organism, ‘R’ in miR denotes the mature miRNA sequence, and a number as a suffix. TarBase is a manually curated and experimentally supported collection of miRNA targets (Sethupathy et al., 2006). DIANA- Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza TarBase v7.0 (Vlachos et al., 2014) stores more than half a million miRNA-gene interactions which uses 356 different cell types from 24 species, including human, mouse, fruit fly, zebrafish, and worms. miRTarBase is another manually curated database which stores more than 360 thousand miRNA-target interactions (Chou et al., 2016). These accumulated target-interactions have been further experimentally validated by reporter assay, western blot, microarray and next-generation sequencing experiments. miRTarBase release 6.0 (Chou et al., 2016) contains 3,786 miRNAs and 22,563 targets from 18 different species. There are several other databases such as miRDB (Wang, 2008), which uses miRNA sequences from miRBase and mRNA 3’-UTR sequences are imported from the GenBank files using BioPerl (http://www.bioperl.org), and uses MirTarget version 2 tool for the genome-wide target prediction (Wang and Naqa, 2007). miRNAMap (Hsu et al., 2006) is another database which stores the miRNA genes, putative miRNA genes, known and putative miRNA targets of human, mouse, rat and dog. The putative miRNA targets are obtained using RNAz (https://www.tbi.univie.ac.at/software/RNAz/), which is a tool used for non-coding RNA prediction based on comparative sequence analysis (Washietl et al., 2005). The mature miRNA of the putative miRNA genes are accurately predicted using a machine learning approach, called mmiRNA. The miRNA targets within the conserved regions of 3’-UTR of the genes are predicted using the miRanda algorithm (Enright et al., 2003). The miRGate (Andrés-León et al., 2015) is another comprehensive database consist of miRNA-mRNA pairs which are calculated using five target prediction algorithms: miRanda (Enright et al., 2003), TargetScan (Lewis et al., 2005; Bartel, 2009; Agarwal et al., 2015), RNAhybrid (Krüger & Rehmsmeier, 2006), microTar (Thadani& Tammi, 2006), and PITA (Kertesz et al., 2007). It also consists of complete sequences of miRNA and mRNAs 3’-UTRs of human (including human viruses), mouse, and rat with experimentally validated data. In miRGate, miRNA sequences are taken from the miRBase 20 (Kozomara and Griffiths-Jones, 2013) as it consists of a lot of datasets as compared to the other datasets. The miRGate obtained experimentally validated data from four databases: miRecords (Xiao et al., 2009), TarBase (Vergoulis et al., 2012), OncomirDB (Wang et al., 2014), and miRTarBase (Chou et al., 2016). 4. Materials and Methods 4.1 Datasets For a comprehensive evaluation of predicted targets of miRNA from eleven different prediction tools, we have considered the miRNAs which were validated by experimental methods taken from miRNA databases to obtain optimal results. In this study, we have considered datasets from two species: Drosophila melanogaster and human. The high confidence miRNAs were downloaded from miRBase and their validated targets were obtained from the miRTarbase. The database consists of 28,645 entries of around 110 species at the time of writing this manuscript. The downloaded two datasets have been considered as a benchmark for the evaluation of eleven considered miRNA target prediction tools. Detailed description of these two datasets is as follows: Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza Dataset-I: Drosophilla melanogaster Drosophilla melanogaster [BDGP5.0] is a model organism, having total 256 precursor miRNA sequences. We have focused our study on the miRNAs having a high probability of expression level notated as high confidence miRNAs. Drosophilla melanogaster has 76 high confidence miRNAs. These high confidence miRNAs were then searched for their validated targets using miRBase and miRTarBase which is a database of experimentally validated microRNA-target interactions. According to miRTarBase, Drosophilla melanogaster shows 147 miRNA-target interactions between 45 miRNAs and 86 target genes. In this study, targets of Drosophila melanogaster miRNAs were predicted usingseven different tools,namely TargetScan, MicroT-CDS, PicTar, miRror, microRNA, ComiR, and PITA, and their performance evaluation has been performed. Dataset-II: Human (Homo sapiens) The name and sequences of highly confidence, mature miRNAs were downloaded from miRBase. There are 2588 highly confidence human miRNAs, out of which 208 random miRNAs were selected in this study. The validated target genes of all mature miRNAs were downloaded from TarBase and miRTarBase. These targets for human miRNAs were separated into another file using a program, which were used as the benchmark for testing ten target prediction tools, namely TargetScan, miRSystem, mirWalk, miRmap (Vejnar et al., 2012), miRSearch (Lewis et al., 2005; García et al., 2011), microT, microRNA, PITA, CoMir, and PicTar. The targets for each miRNA were predicted using considered ten tools and then further assessed for their performance and accuracy. Table 1 presents a brief summary of considered datasets for our comprehensive assessment. Table 1 Summary of datasets used for the assessment of miRNA target prediction tools Organisms Number of Number Data source Tools used for assessment miRNAs of targets Drosophila Out of 76 140 miRBase, • PicTar (Krek et al., 2005) Melanogaster entries in TarBase, • PITA (Kertesz et al., 2007) miRBase, 44 MirTarBase • microRNA (Betel et al., 2008) experimentally • CoMir (Coronnello&Benos, 2013) validated are • microT-CDS (Paraskevopoulou et al., 2013) considered • MiRorSuite (Friedman et al., (2014) • TargetScan (Agarwal et al., 2015) Human Out of 2,588 26,315 miRBase, • PITA (Kertesz et al., 2007) • microRNA (Betel et al., 2008) high confident TarBase, • miRSearch(Lewis et al., 2005; García et al., miRNAs in MirTarBase 2011) miRBase, 208 • miRSystem (Lu et al. (2012) are considered • miRmap (Vejnar et al., (2012) randomly • microT-CDS (Paraskevopoulou et al., 2013) • CoMir (Coronnello&Benos, 2013) • mirWalk (Dweep et al., 2014) • TargetScan (Agarwal et al., 2015) • PicTar Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza 4.2 Target Prediction Tools Categorizing the gene targets of miRNAs is essential for illustrating the biological mechanisms underlying these powerful regulatory molecules. There are several miRNA target prediction algorithms which exploit different approaches to predict the binding targets. In animal genomes, miRNAs show only partial complementarity to their target mRNA in disparity to plants where miRNAs can bind with almost perfect complementarity to their targets (Carrington and Ambros, 2003), which also makes it difficult to predict the target genes for animal genomes (Martin et al., 2007). In fact, these tools still need many fold of improvement and bioinformatics techniques require high-throughput experiments in order to validate predictions. Existing miRNA target prediction tools applies machine learning methods and probabilistic learning algorithms in order to construct predictive models whose foundation lies on experimentally verified miRNA targets. In the following section, we discussed miRNA target prediction techniques and summarized a detailed comparison of methodologies and features they use. All computer-based miRNA target prediction programs are created with specific features and parameters, where minor variation may result differently for the same input. The eleven prediction tools considered in this study are described briefly in the following section. 4.2.1 ComiR ComiR (Combinatorial microRNA) is a web server to predict the targets of a set of miRNAs (Coronnello et al., 2012; Coronnello&Benos, 2013). It is easy to access and give the expecting result with higher accuracy in comparison to other tools. CoMir computes the potential of an mRNA being targeted by a miRNA in the species (human, mouse, fly, and worm genomes). The target genes can be predicted in two ways, either by entering a set of miRNAs along with their expression levels, or by entering a list of miRNA IDs. In the former case, CoMir calculates the targeting potential in two ways: first by applying four different methods: (i) miRanda (Enright et al., 2003) which calculates the probability of mRNA:miRNA binding based on the Fermi-Dirac equations (Zhao et al., 2009; Coronello et al., 2012) that consider the miRNA expression, and sum the individual probabilities over all of the mRNA of all miRNAs in the given set; (ii) second method is similar to PITA (Kertesz et al., 2007), in which the equations substitutes the standard energies, (iii) in the third method, TargetScan (Lewis et al., 2005) scoring (without conservation) is weighted by each miRNA expression level, and (iv) mirSVR (Betel et al., 2010) is used, whose scores are combined to the weighted miRNA expression levels. Finally, in the second step, the predictions of the above four methods applied in the first step are combined with the support vector machine (SVM) which is trained on high quality dataset. On the other hand, when the miRNA IDs are input without expression levels, the CoMir assumes all the miRNAs as expressed at the same level (Coronello and Benos, 2013). If single Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza miRNA is selected, ComiR computes each gene targeting score for miRNA for selected species. It has an optional box in which we can input single miRNA sequence in FASTA format and it will predict all target genes for the miRNA. All required mature miRNA sequences can be downloaded from miRbase database. ComiR supports four species: H. sapiens, D. melanogaster, E. elegans and M. musculus. Evaluation of the result can be done in two manners either based on rank or score. 4.2.2 microT-CDS DIANA-microT-CDS (Paraskevopoulou et al., 2013) is the latest version of microT algorithm. The algorithm uses the presence of a positive and negative set of miRNA recognition elements (MREs) to be found in both the 3'-UTR and CDS regions. DIANA-microT-CDS achieves a major rise in sensitivity as compared to previous versions. The sensitivity according to the available review literature of microT-CDS is ~65%, whereas it was only 52% in the older versions of microT. In our study, miRNA target prediction is made using microT-CDS on Human and D. melanogaster (fruit fly) high confidence miRNAs datasets. 4.2.3 MicroRNA MicroRNA (Betel et al., 2008) is an online target prediction tool, which predicts candidate targets and its downregulation scores using the mirSVR (Betel et al., 2010) machine learning method. The miRanda algorithm (Enright et al., 2003) is used to predict the targets and observed miRNA expression level. It computes the complementarity between a given set of miRNAs and an mRNA on the basis of weighted Smith-Waterman algorithm. The secondary filter applied in this tool is to estimate the free energy of the formation of the miRNA:mRNA duplex. The current version is used to predict targets for Human, Drosophila melanogaster, roundworm and mouse. Targets are predicted through miRNA identifiers and species. It gives all target genes with their alignment sites. 4.2.4 miRror Suite miRror Suite (Friedman et al., 2010; 2014) is an online tool to predict likely targets for a set of miRNAs. It has two protocols for prediction: gene to miRNA and miRNA to gene. miRror ranks a list of target genes according to their likelihood to be targeted by the given set of miRNAs. It requires miRNA ID and gene accession ID to predict the expected results. It accepts a set of miRNAs/genes or at least two valid miRNA/genes. miRror supports several species and integrates many other resources, including TargetScan database (Grimson et al., 2007), PITA (Kertesz et al., 2007), PicTar (Krek et al., 2005), Microcosm (John et al., 2004), MiRanda (Betel et al., 2008) (conserved and non-conserved), miRDB (Wang, 2008), RNA22 (Miranda et al., 2006) and Mirz (Hausser et al., 2009). It gives scores based on the integrated databases. Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza 4.2.5 PicTar PicTar is an algorithm for the identification of miRNA targets (Krek et al., 2005). It supports several organisms including Drosophila, and the non-conserved co-expressed human miRNAs.After entering a query (nucleotide sequence of mature miRNA or multiple sequence alignment of RNA residues), PicTar first locate all the possible sites termed as nuclei (length 7, starting at position 1 or 2 of the 5’ end of the miRNA) in the given sequence followed by some filters. The optimal free energy of each nuclei is predicted which narrows down to lesser targets. The highly probable nuclei with optimal free energy falling into the overlapping positions in the alignment of the considered species are called anchors. If the 3’-UTR alignment has enough anchors, each UTR in the alignment is then subjected to be scored by the central PicTar maximum likelihood procedure, after which all the scores of the orthologous transcripts are combined. Finally, a list of transcripts ranked by the PicTar score is displayed (Krek et al., 2005). 4.2.6 PITA PITA (Kertesz et al., 2007) incorporates a new approach the prediction of miRNA targets. Its main hypothesis is based on the fact that mRNA structure plays significant role in recognizing targets by thermodynamically promoting or suppressing the interaction. This tool allows the user to run the PITA algorithm on his choice of UTRs and miRNAs. PITA first scans the UTR for potential miRNA targets and then scores each site according to the parameter-free model explained by Kertesz et al., (2007). This model computes the difference between the free- energy gained by the formation of miRNA-target duplex and the energy released by the un-pairing of the target to make it accessible to the miRNA. The PITA algorithm uses the features such as seed match, free energy, site accessibility, target-site abundance, and G:U pairs allowed in the seed. 4.2.7 TargetScan TargetScan (Lewis et al., 2005; Bartel, 2009; Agarwal et al., 2015) is one of the wide-range miRNA target prediction tool that supports human, mouse, fruit-fly, worm, and fish. It has been upgraded several times and provides wide range of information about their predicted as well as validated binding sites on their target genes. It estimates the cumulative weighted context++ score (CWCS) for each miRNA. The CWCS score ranks based upon the predicted repression or PCT (probability of conserved targeting) aggregated score of the longest 3’-UTR isoform. Firstly, the 6mer, 7mer-A1, 7mer-m8, and 8mer are first filtered to remove overlapping sites for each miRNA family, then the CWCS is calculated for each member of a miRNA family, and the member which represents the greatest predicted repression score, is chosen to represent that family and the reference 3’-UTR with the most 3p-seq tags represents the gene (Agarwal et al., 2015). Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza 4.2.8 miRSystem miRSystem (Lu et al., 2012) integrates seven tools to predict targets for miRNAs, which includes DIANA-microT (Maragkakis et al., 2009), miRanda (Betel et al., 2008), miRBridge (Tsang et al., 2010), PicTar (Krek et al., 2005), PITA (Kertesz et al., 2007), RNA22 (Miranda et al., 2006), and TargetScan (Lewis et al., 2005; Bartel, 2009; Agarwal et al., 2015). Currently it supports human and mouse. 4.2.9 miRWalk miRWalk (Dweep et al., 2014) is a comprehensive miRNA target prediction tool, which integrates 12 existing target prediction tools, namely DIANA-microTv4.0, DIANA-microT-CDS, miRanda Release 2010, mirBridge (Tsang et al., 2010), miRDB4.0 (Wang, 2008), miRmap (Vejnar et al., 2012), miRNAMap (Hsu et al., 2006), PicTar2, PITA, RNA22 version 2, RNAhybrid2.1 (Krüger & Rehmsmeier, 2006), and TargetScan6.2. It provides the miRNA binding sites within the complete sequence of a gene. It supports human, rat, dog, mouse, and cow species. 4.2.10 miRMap miRMap (Vejnar and Zdobnov, 2012) is a comprehensive prediction tool which implements eleven different features for target prediction. One of the eleven featuresevaluates the significance of negative selection, which is based on a performing predictor for evolution named PhyloP (Pollard et al., 2010). Currently, it supports human, mouse, rat, cow, opossum, chicken, chimpanzee, and zebrafish. 4.2.11 miRSearch miRSearch (Lewis et al., 2005; García et al., 2011) is an online search tool for miRNA targets interaction. The results are based on TargetScan (Lewis et al., 2005) providing gene targets for human, mouse, and rat miRNAs. miRSearch uses an advanced algorithm to cross-reference all the annotations found in the literature and displays a comprehensive list of miRNA-mRNA interactions. It uses context++ score to predict results without considering site conservation. All the 11 considered target prediction tools are compared in terms of input requirement, tool features, supported species, tool URL, and its citation, as shown in Table 2. The objective of this paper is to evaluate and assess the performance of miRNA target prediction tools in human and drosophila melanogaster. Therefore, we have considered those tools which either supports human or drosophila melanogaster. Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza Table 2. List of tools for miRNA target prediction S.No. Name of Input Tool features Supported Tool URL References Tools species 1. ComiR miRNA http://www.benosl Coronnello • seed match • Human name ab.pitt.edu/comir/ et al., • conservation • Mouse (2012). • free energy • Fly • site accessibility • Worm • target-site abundance • machine learning • 3' compensatory pairing • G:U pairs allowed in the seed • local AU content • miRNA expression level 2. microT-CDS miRNA • seed match • Human http://diana.imis.at name, gene hena- • conservation • Mouse name, innovation.gr/Dian Paraskevop • free energy • Fly Ensembl aTools/index.php? oulou et al., • site accessibility • Worm ID r=microT_CDS/in (2013). • target-site abundance dex • machine learning • 3' compensatory pairing • G:U pairs allowed in the seed • local AU content microRNA miRNA http://www.micror Betel et al. 3. • local AU content • Human name na.org/microrna/h (2008). • seed match • Mouse ome.do • conservation • Fly • secondary structure • Rat accessibility MiRorSuite miRNA http://www.proto. 4. • seed match • Human name cs.huji.ac.il/mirror Friedman et • conservation • Mouse /index.php al., (2014). • free energy • Fly • site accessibility • Rat • target-site abundance • Worm • machine learning • Fish • 3' compensatory pairing • G:U pairs allowed in the seed • local AU content 5. Pictar miRNA seed match, pairing • Vertebrate http://pictar.mdc- Krek et al., name, gene stability berlin.de/ 2005 • Fly name • Worm • Mouse PITA miRNA https://genie.weiz Kertesz et 6. • seed match • Human name mann.ac.il/pubs/m al., (2007). • free energy • Mouse ir07/index.html • site accessibility • Fly • target-site abundance • Worm • G:U pairs allowed in the seed Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza 7. TargetScan miRNA http://www.targets Lewis et • seed match • Human name, can.org/vert_71/ al., 2005; • conservation • Mouse miRNA Bartel, • free energy • Fly family, 2009; • site accessibility • Worm gene name Agarwal et • target-site abundance • Fish al., 2015 • 3' compensatory pairing • G:U pairs allowed in the seed • local AU content 8. miRmap miRNA http://mirmap.ezla • seed match • Human name b.org/ • conservation • Mouse Vejnar et • free energy • Rat al., (2012) • site accessibility • Cow • local AU content • Chicken • Site over-representation • Zebrafish probability 9. miRSearch miRNA miRSearch uses an https://www.exiqo Lewis et al. • Human name advanced cross-referencing n.com/miRSearch (2005); • Mouse system to identify García et • Rat validated and predicted al. (2011) miRs for any target. miRSystem miRNA Integrated system using http://mirsystem.c Lu et al. 10. • Human name seven prediction tools: gm.ntu.edu.tw/ (2012). • Mouse DIANA, miRanda, miRBridge, PicTar, PITA, rna22, and TargetScan. 11. miRWalk miRNA Combines 12 existing http://129.206.7.1 Dweep et • Human name miRNA target prediction 50/ al., 2014 • Mouse algorithms: DIANA- • Rat microTv4.0, DIANA- microT-CDS, miRanda- rel2010, mirBridge, miRDB4.0, miRmap, miRNAMap, doRiNA i.e., PicTar2, PITA, RNA22v2, RNAhybrid2.1 and Targetscan6.2 4.3 Empirical evaluation We used comprehensive evaluation metrics to analyze the performance and accuracy of the considered eleven miRNA target prediction tools. The predicted targets were categorized into four categories: true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN). TP and TN refers to the count of the correctly predicted functional and non-functional targets, respectively, whereas, FP and FN are the counts of the functional and non-functional targets which were not validated by the experimentally proven targets. The predictions were assessed using the following measures: = (1) Muniba Faiza, K Kh hu us sh hn nu um ma a T Ta an nv ve ee er r, , S Sa am ma an n F Fa at ti ih hi i,, Y Yo on ng gh hu ua a W Wa an ng g,, K Kh ha al li id d R Ra az za a = (2) = (3) = (4) = 2 ∗ (5) 5. Results and Discussions W W We e e e e ev v va a al l lu u ua a at t te e ed d d t t th h he e e p p pe e er r rf f fo o or r rm m ma a an n nc c ce e e o o of f f e e el l le e ev v ve e en n n d d di i if f ff f fe e er r re e en n nt t t m m mi i iR R RN N NA A A t t ta a ar r rg g ge e et t t p p pr r re e ed d di i ic c ct t to o or r rs s s o o on n n t t th h he e e d d da a at t ta a as s se e et t ts s s o o of f f t t tw w wo o o different species (human and D D.. m me el la an no og ga as st te er r) ),, s sc ca al li in ng g t th he em m o on n d di if ff fe er re en nt t p pa ar ra am meters such as s s se e en n ns s si i it t ti i iv v vi i it t ty y y,, , s s sp p pe e ec c ci i if f fi i ic c ci i it t ty y y,, , p p pr r re e ec c ci i is s si i io o on n n,, , r r re e ec c ca a al l ll l l,,, a a an n nd d d o o ot t th h he e er r r s s si i im m mi i il l la a ar r r e e em m mp p pi i ir r ri i ic c ca a al l l m m me e et t th h ho o od d ds s s... T T Th h he e e s s st t tr r ra a at t te e eg g gy y y a a ap p pp p pl l li i ie e ed d d t t to o o e e ev v va a al l lu u ua a at t te e e t t th h he e e p p pe e er r rf f fo o or r rm m ma a an n nc c ce e e o o of f f m m mi i iR R RN N NA A A t t ta a ar r rg g ge e et t t p p pr r re e ed d di i ic c ct t to o or r rs s s i i is s s s s sh h ho o ow w wn n n i i in n n F F Fi i ig g g.. . 1 1 1.. . T T Th h he e e p p pe e er r rf f fo o or r rm m ma a an n nc c ce e e e e ev v va a al l lu u ua a at t ti i io o on n n o of f p pr re ed di ic ct ti io on n t to oo ol ls s w wa as s p pe er rf fo or rm me ed d i in n t te er rm ms s o of f d di if ff fe er re en nt t m me et tr ri ic cs s n na am me el ly y,, a av ve er ra ag ge e p pr re ed di ic ct ti io on n p pe er r miRNA, TPR, F-m m me e ea a as s su u ur r re e e,,, a a an n nd d d c c co o om m mb b bi i in n ni i in n ng g g r r re e es s su u ul l lt t ts s s o o of f f b b be e es s st t t p p pe e er r rf f fo o or r rm m mi i in n ng g g t t to o oo o ol l ls s s ( ( (u u un n ni i io o on n n & & & i i in n nt t te e er r rs s se e ec c ct t ti i io o on n n) ) ) on experimentally validated D Dr ro os so op ph hi il la a M Me el la an no og ga as st te er r and Human miRNA targets. Fig. 1 Representation of t t th h he e e s s st t tr r ra a at t te e eg g gy y y a a ap p pp p pl l li i ie e ed d d t t to o o e e ev v va a al l lu u ua a at t te e e m m mR R RN N NA A A t t ta a ar r rg g ge e et t t p p pr r re e ed d di i ic c ct t ti i io o on n n t t to o oo o ol l ls s s Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza 5.1 Dataset-I (Drosophila) (a) Average predictions per miRNA A single miRNA could target multiple (6-7) mRNAs and a single mRNA could be targeted by several (4-5) miRNAs (German et al., 2008; Beauclair et al., 2010). Our results shown in Fig. 2 are consistent with this hypothesis. The average number of targets predicted for D. melanogaster ranged between 100-1400 per miRNA (Fig. 2). The results suggest that the tool microRNA predicted the highest number of targets which is followed by CoMir, microT, miRor, PITA, PicTar, and TargetScan. Fig. 2 Average number of target prediction per miRNA by different tools in Drosophila. (b) True positive rate (TPR) & false positive rate (FPR) The plot of TPR and total number of predicted targets are shown in Fig. 3, which suggests that the seven considered tools for drosophila dataset predicted a large number of targets and majority of the tools followed the similar pattern. The highest TPR is achieved by the tool MicroT followed by Comir with a very slight difference. All the considered tools predict a large number of false positives, and due to that the FPR goes to ~99%. Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza Fig. 3 Evaluation of tools on the basis of true positive rate (TPR) and total number of targets predicted for Drosophila melanogaster. (c) Precision-recall and F-measure Precision and recall are the important parameters to evaluate the accuracy and sensitivity of predictions.. According to the results obtained from the analysis of drosophila dataset, TargetScan showed the highest precision 0.0097 (though smaller according to the scale) and the recall is 0.5214 (Supplementary file). The precision of these seven tools ranged between 0.005 to 0.009 and recall ranges between 0.2 and 0. The highest recall is shown by the MicroT tool whereas the precision Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza comes next to that of TargetScan (Fig. 4). The tools were evaluated at an optimal score of 0.0 and the F-measure was calculated which is a harmonic mean of precision and recall, also known as the F1- score. The F-measure is only high when the precision and recall are high. The highest F-measure amongst the seven tools used to predict targets for Drosophila melanogaster is shown by TargetScan followed by MicroT. Fig. 4 A line graph showing the F-measure and Precision-recall calculated drosophila dataset. TargetScan showed the highest F-measure, while microT shows the highest recall. Muniba Faiza, K Kh hu us sh hn nu um ma a T Ta an nv ve ee er r,, S Sa am ma an n F Fa at ti ih hi i,, Y Yo on ng gh hu ua a W Wa an ng g,, K Kh ha al li id d R Ra az za a (d) C Co om mb bi in ni in ng g r re es su ul lt ts s t to o i im mp pr ro ov ve e a ac cc cu ur ra ac cy y a an nd d r re el li ia ab bi il li it ty y It is h h hy y yp p po o ot t th h he e es s si i iz z ze e ed d d t t th h ha a at t t u u un n ni i io o on n ns s s o o of f f p p pr r re e ed d di i ic c ct t te e ed d d r r re e es s su u ul l lt t ts s s a a ar r re e e s s su u up p pp p po o os s se e ed d d t t to o o a a ac c ch h hi i ie e ev v ve e e h h hi i ig g gh h he e er r r r r re e ec c ca a al l ll l ls s s w w wh h he e en n n compared to the outcomes of i in nd di iv vi id du ua al l t to oo ol ls s. Similarly, the intersections m ma ay y a ac ch hi ie ev ve e higher precisions. The results of the b be es st t p pe er rf fo or rm mi in ng g tools for drosophila d da at ta as se et t w we er re e c co om mb bi in ne ed d a as s u un ni io on ns s and intersections to improve their r re ec ca al ll l a an nd d precision ( (F Fi ig g.. 5 5) ).. T Th he e u un ni io on n o of f M Mi ic cr ro oT T a an nd d m mi ic cr ro oR RN NA A s s sh h ho o ow w we e ed d d a a a t t tw w wo o o f f fo o ol l ld d d i i in n nc c cr r re e ea a as s se e e i i in n n t t th h he e e T T TP P Ps s s,,, w w wh h hi i ic c ch h h w w wa a as s s a a as s s m m mu u uc c ch h h a a as s s f f fo o ou u un n nd d d t t to o o b b be e e d d de e ec c cr r re e ea a as s se e ed d d i i in n n t t th h he e e c c ca a as s se e e o o of f f t t th h he e e intersection of microRNA and C Co oM Mi ir r. Fig. 5 C Co om mb bi in ne ed d o ou ut tp pu ut ts s o of f t th he e d da at ta as se et t-I (D. melanogaster) to improve accuracy & reliability. A A) ) represents the c c co o om m mp p pa a ar r ri i is s so o on n n o o of f f t t th h he e e t t to o ot t ta a al l l m m mi i iR R RN N NA A A t t ta a ar r rg g ge e et t ts s s p p pr r re e ed d di i ic c ct t te e ed d d b b by y y t t th h he e e t t to o oo o ol l ls s s t t to o o t t th h he e e u u un n ni i io o on n n a a an n nd d d i i in n nt t te e er r rs s se e ec c ct t ti i io o on n n o o of f f t t th h he e e t t to o oo o ol l ls s s t t to o o improve their performance. B) u un ni io on n of microRNA and Comir, and C) intersection o of f M Mi ic cr ro oT T,, C Co om mi ir r a an nd d TargetScan. Abbreviations: Mt–Micro oT T,, R R–microRNA, C–Comir, T–TargetScan. Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza Dataset-II (human) (a) Average predictions per miRNA In the case of human dataset, the average number of predicted targets ranged between 100-8000 per miRNA (Fig. 6). The results suggests that miRmap predicted the highest number of targets which is followed by TargetScan, Comir, miRWalk, MicroT, PITA, miRSearch, microRNA, and miRSystem. Fig. 6 Average number of target prediction per miRNA by different tools in the human dataset (b) True positive rate (TPR) In the case of human dataset, the plot of TPR, FPR, and the total number of predicted targets for human dataset is shown in Fig. 7, which suggests that the seven tools predicted a large number of targets. According to the results, the highest TPR was achieved by the TargetScan followed by miRMap with a small difference. However, miRMap predicted higher number of targets than the TargetScan (Fig. 7A). This shows that the variation between the TPR and the total number of predicted targets (Fig. 7B) is inconsistent and independent. It means that whether a tool predicts higher or smaller number of targets, the TPR remains unaffected. The least FPR is achieved by the tool CoMir followed by MicroT, and TargetScan, miRSystem, miRWalk, miRmap, microRNA, miRSearch, and PITA attained the FPR closer to each other followed by PITA. According to the results, the TargetScan showed the higher TPR and FPR as well. Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza Fig. 7 TPRs and total number of targets predicted by different tools (c) Precision-recall and F-measure The precision and recall of human dataset was also calculated. According to the results, miRSearch showed the highest precision 0.03 (though smaller according to the scale) while the highest recall was showed by the TargetScan (0.77) (Supplementary file). The precision and recall of these seven tools ranged between 0.009 to 0.03 and 0.05 to 0.77 respectively. Since the range of the precision and recall was very low, therefore, we calculated the F-measure for the tools (Fig. 8). The highest F-score was shown by MicroT followed by the TargetScan, CoMir, miRSystem, miRSearch, PITA, PicTar, miRmap, and miRWalk. Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza Fig 8 F-measure and Precision-recall of human dataset (d) Combining results to improve accuracy and reliability The results of the best performing tools considered for human dataset were combined as unions and intersections to increase their performance and accuracy, as done for the first dataset of D. melanogaster. After score optimizations, the combination of TargetScan and miRmap resulted in 31914 TPs, which is a higher than the TPs predicted by the TargetScan alone (Fig. 9). The combination of miRmap and Comir was less than that of the former, but the false positives were more than that of the TargetScan. The intersection of miRmap, Comir, and miRSearch were very low, which showed only 162 TPs shared by these three tools. Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza Fig. 9 Combined results on dataset-II (human) to improve accuracy. TargetScan showed the highest number of the TPs and the FPs. The union of PITA and miRMap is lesser than the predictions of TargetScan alone. The intersection of MCS showed very less number of predicted targets. Abbreviations: M–miRmap, R–microRNA, C–Comir, T–TargetScan. 6. Conclusion We analyzed eleven miRNA target predictors on two benchmark datasets by applying significant empirical methods to evaluate and assess their accuracy and performance. The best performing tools for the datasets evaluated on the basis of metrics shown in Table 3. According to our results, MicroT, microRNA, and CoMir showed the highest performance in dataset-I (Drosophila melanogaster), and in the dataset-II (Human), TargetScan and miRMap showed the best performance. The predicted results were combined to improve the performance of the tools in both the datasets, but any relevant Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza improvement was not observed in the TPs. It was also observed that the TPR is independent of the number of targets predicted by a tool. For example, in the case of dataset-II (human), miRWalk predicted a large number of targets, but the TPR was very low. Table 3 Metric evaluated best performing tools for both the datasets. Evaluation Datasets methods Dataset-I (D. melanogaster) Dataset-II (human) True positive rate MicroT TargetScan F-measure TargetScan MicroT Precision-Recall MicroT TargetScan Union of results MicroT-microRNA TargetScan–miRMap On the basis of the previous and our analysis, we can say that the existing tools have many limitations and drawbacks which embark the need for more accurate and precise miRNA target predictors. The current tools generate a large amount of false positives and works on different algorithms which makes it difficult to compare them. Although several algorithms and models have been developed to predict miRNAs in-silico, prediction of significant targets with high statistical confidence is still a challenging task. Conflict of Interest The authors declare that there is no any conflict of interest in the publication of the manuscript. Acknowledgement The author Khalid Raza is thankful to Indian Council for Cultural Relations (ICCR), India for his appointment as Visiting Professor (ICCR Chair) at Ain Shams University, Cairo, Egypt. Some part of the manuscript is written during this period. References Agarwal, V., Bell, G. W., Nam, J. W., & Bartel, D. P. (2015). Predicting effective microRNA target sites in mammalian mRNAs. elife, 4, e05005. Akhtar, M. M., Micolucci, L., Islam, M. S., Olivieri, F., &Procopio, A. D. (2015). Bioinformatic tools for microRNA dissection. Nucleic acids research, 44(1), 24-44. Alexiou, P., Maragkakis, M., Papadopoulos, G. L., Reczko, M., &Hatzigeorgiou, A. G. (2009). Lost in translation: an assessment and perspective for computational microRNA target identification. Bioinformatics, 25(23), 3049-3055. Andrés-León, E., González Peña, D., Gómez-López, G., & Pisano, D. G. (2015). miRGate: a curated database of human, mouse and rat miRNA–mRNA targets. Database, 2015, bav035. Backes, C., Meese, E., Lenhof, H. P., & Keller, A. (2010). A dictionary on microRNAs and their putative target pathways. Nucleic acids research, 38(13), 4476-4486. Bartel, D. P. (2004). MicroRNAs: genomics, biogenesis, mechanism, and function. cell, 116(2), 281-297. Bartel, D. P. (2009). MicroRNAs: target recognition and regulatory functions. cell, 136(2), 215-233. Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza Beauclair, L., Yu, A., &Bouché, N. (2010). microRNA‐directed cleavage and translational repression of the copper chaperone for superoxide dismutase mRNA in Arabidopsis. The Plant Journal, 62(3), 454-462. Betel, D., Koppal, A., Agius, P., Sander, C., & Leslie, C. (2010). Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites. Genome biology, 11(8), R90. Betel, D., Wilson, M., Gabow, A., Marks, D. S., & Sander, C. (2008). The microRNA. org resource: targets and expression. Nucleic acids research, 36(suppl_1), D149-D153. Beveridge, N. J., Gardiner, E., Carroll, A. P., Tooney, P. A., & Cairns, M. J. (2010). Schizophrenia is associated with an increase in cortical microRNA biogenesis. Molecular psychiatry, 15(12), 1176-1189. Brennecke, J., Stark, A., Russell, R. B., and Cohen, S. M. (2005). Principles of microRNA-target recognition. PLoS Biol. 3:e85. doi: 10.1371/journal.pbio.0030085. Carrington, J. C., & Ambros, V. (2003). Role of microRNAs in plant and animal development. Science, 301(5631), 336-338. Chou, C. H., Chang, N. W., Shrestha, S., Hsu, S. D., Lin, Y. L., Lee, W. H., ...& Tsai, T. R. (2016). miRTarBase 2016: updates to the experimentally validated miRNA-target interactions database. Nucleic acids research, 44(D1), D239-D247. Coronnello, C., & Benos, P. V. (2013). ComiR: combinatorial microRNA target prediction tool. Nucleic acids research, 41(W1), W159-W164. Coronnello, C., Hartmaier, R., Arora, A., Huleihel, L., Pandit, K. V., Bais, A. S., ... &Benos, P. V. (2012). Novel modeling of combinatorial miRNA targeting identifies SNP with potential role in bone density. PLoSComputBiol, 8(12), e1002830. Cox, M. B., Cairns, M. J., Gandhi, K. S., Carroll, A. P., Moscovis, S., Stewart, G. J., ... &ANZgene Multiple Sclerosis Genetics Consortium. (2010). MicroRNAs miR-17 and miR-20a inhibit T cell activation genes and are under-expressed in MS whole blood. PloS one, 5(8), e12132. Doench, J. G., and Sharp, P. A. (2004). Specificity of microRNA target selection in translational repression. Genes Dev. 18, 504–511. doi: 10.1101/gad.1184404 Dweep, H., Gretz, N., &Sticht, C. (2014). miRWalk database for miRNA–target interactions. RNA Mapping: Methods and Protocols, 289-305. Enright, A. J., John, B., Gaul, U., Tuschl, T., Sander, C., & Marks, D. S. (2003). MicroRNA targets in Drosophila. Genome biology, 5(1), R1. Fan, X., & Kurgan, l., (2014). Comprehensive overview and assessment of computational prediction of microRNA targets in animals. Briefings in bioinformatics, 16(5), 780-794. Friedman, R. C., Farh, K. K., Burge, C. B., and Bartel, D. P. (2009). Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 19, 92–105. doi: 10.1101/gr.082701.108. Gaidatzis, D., van Nimwegen, E., Hausser, J., &Zavolan, M. (2007). Inference of miRNA targets using evolutionary conservation and pathway analysis. BMC bioinformatics, 8(1), 69. Garcia, D. M., Baek, D., Shin, C., Bell, G. W., Grimson, A., and Bartel, D. P. (2011).Weak seed-pairing stability and high target-site abundance decrease the proficiency of lsy-6 and other microRNAs. Nat. Struct. Mol. Biol. 18, 1139–1146. doi:10.1038/nsmb.2115 German, M. A., Pillay, M., Jeong, D. H., Hetawal, A., Luo, S., Janardhanan, P., ...& De Paoli, E. (2008). Global identification of microRNA-target RNA pairs by parallel analysis of RNA ends. Nature biotechnology, 26(8), 941. Griffiths-Jones S, Harpreet Kaur Saini, Stijn van Dongen and Anton J. Enright. (2008).miRBase: tools for microRNA genomics. D154–D158 Nucleic Acids Research, Vol. 36, Database issue. doi:10.1093/nar/gkm952 Griffiths-Jones S, Russell J. Grocock, Stijn van Dongen, Alex Bateman and Anton J. Enright. (2006).miRBase: microRNA sequences, targets, and gene nomenclature. D140–D144 Nucleic Acids Research, Vol. 34, Database issue.doi:10.1093/nar/gkj112 Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza Grimson, A., Farh, K. K., Johnston, W. K., Garrett-Engele, P., Lim, L. P., and Bartel,D.P.(2007). MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol. Cell 27, 91–105. doi: 10.1016/j.molcel.2007.06.017 Grün, D., Wang, Y. L., Langenberger, D., Gunsalus, K. C., &Rajewsky, N. (2005). microRNA target predictions across seven Drosophila species and comparison to mammalian targets. PLoS computational biology, 1(1), e13. Hausser, J., Berninger, P., Rodak, C., Jantscher, Y., Wirth, S., &Zavolan, M. (2009). MirZ: an integrated microRNA expression atlas and target prediction resource. Nucleic acids research, 37(suppl_2), W266- W272. He, L., & Hannon, G. J. (2004). MicroRNAs: small RNAs with a big role in gene regulation. Nature Reviews Genetics, 5(7), 522-531 Hébert, S. S., Horré, K., Nicolaï, L., Bergmans, B., Papadopoulou, A. S., Delacourte, A., & De Strooper, B. (2009). MicroRNA regulation of Alzheimer's Amyloid precursor protein expression. Neurobiology of disease, 33(3), 422-428. Hsu, P. W., Huang, H. D., Hsu, S. D., Lin, L. Z., Tsou, A. P., Tseng, C. P., ... &Hofacker, I. L. (2006). miRNAMap: genomic maps of microRNA genes and their target genes in mammalian genomes. Nucleic acids research, 34(suppl_1), D135-D139. John, B., Enright, A. J., Aravin, A., Tuschl, T., Sander, C., & Marks, D. S. (2004). Human microRNA targets. PLoS biology, 2(11), e363. Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U., & Segal, E. (2007). The role of site accessibility in microRNA target recognition. Nature genetics, 39(10), 1278-1284. Kiriakidou, M., Nelson, P. T., Kouranov, A., Fitziev, P., Bouyioukos, C., Mourelatos, Z., &Hatzigeorgiou, A. (2004). A combined computational-experimental approach predicts human microRNA targets. Genes & development, 18(10), 1165-1178. Kozomara, A., & Griffiths-Jones, S. (2013). miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic acids research, 42(D1), D68-D73. Krek, A., Grun, D., Poy, M. N., Wolf, R., Rosenberg, L., Epstein, E. J., et al. (2005). Combinatorial microRNA target predictions. Nat. Genet. 37, 495–500. doi: 10.1038/ng1536. Krek, A., Grün, D., Poy, M. N., Wolf, R., Rosenberg, L., Epstein, E. J., &Rajewsky, N. (2005). Combinatorial microRNA target predictions. Nature genetics, 37(5), 495-500. Krüger, J., &Rehmsmeier, M. (2006). RNAhybrid: microRNA target prediction easy, fast and flexible. Nucleic acids research, 34(suppl_2), W451-W454. Lee, R. C., Feinbaum, R. L., &Ambros, V. (1993). The C. elegansheterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell, 75(5), 843-854. Lewis, B. P., Burge, C. B., and Bartel, D. P. (2005). Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120, 15–20. doi: 10.1016/j.cell.2004.12.035. Lewis, B. P., Shih, I. H., Jones-Rhoades, M. W., Bartel, D. P., and Burge, C. B. (2003). Prediction of mammalian microRNA targets. Cell 115, 787–798. doi: 10.1016/S0092-8674(03)01018-3 Liu, B., Li, J., & Cairns, M. J. (2012). Identifying miRNAs, targets and functions. Briefings in bioinformatics, 15(1), 1-19. Lu, T. P., Lee, C. Y., Tsai, M. H., Chiu, Y. C., Hsiao, C. K., Lai, L. C., & Chuang, E. Y. (2012). miRSystem: an integrated system for characterizing enriched functions and pathways of microRNA targets. PloS one, 7(8), e42390. Martin, G., Schouest, K., Kovvuru, P., & Spillane, C. (2007). Prediction and validation of microRNA targets in animal genomes. Journal of biosciences, 32, 1049-1052. Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza Mendes, N. D., Freitas, A. T., &Sagot, M. F. (2009). Current tools for the identification of miRNA genes and their targets. Nucleic acids research, 37(8), 2419-2433. Miranda, K. C., Huynh, T., Tay, Y., Ang, Y. S., Tam, W. L., Thomson, A. M., ...&Rigoutsos, I. (2006). A pattern-based method for the identification of MicroRNA binding sites and their corresponding heteroduplexes. Cell, 126(6), 1203-1217. Paraskevopoulou, M. D., Georgakilas, G., Kostoulas, N., Vlachos, I. S., Vergoulis, T., Reczko, M., ...&Hatzigeorgiou, A. G. (2013). DIANA-microT web server v5. 0: service integration into miRNA functional analysis workflows. Nucleic acids research, 41(W1), W169-W173. Pollard,K.S., Hubisz,M.J., Rosenbloom,K.R. and Siepel,A. (2010) Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res., 20, 110–121. Porkka, K. P., Pfeiffer, M. J., Waltering, K. K., Vessella, R. L., Tammela, T. L., &Visakorpi, T. (2007). MicroRNA expression profiling in prostate cancer. Cancer research, 67(13), 6130-6135. Rehmsmeier, M., Steffen, P., Höchsmann, M., &Giegerich, R. (2004). Fast and effective prediction of microRNA/target duplexes. Rna, 10(10), 1507-1517. Ruby, J. G., Stark, A., Johnston, W. K., Kellis, M., Bartel, D. P., & Lai, E. C. (2007). Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs. Genome research, 17(12), 1850-1864. Sarah M. Peterson, Jeffrey A. Thompson, Melanie L. Ufkin, Pradeep Sathyanarayana, Lucy Liaw, and Clare Bates Congdon. Common features of miRNA prediction tools. Frontiers in Genetics. doi : 10.3389 /fgene .2014.00023. Sethupathy P, Benoit C, and Artemis G.H. (2006). TarBase: A comprehensive database of experimentally supported animal microRNA targets. RNA, 12:192–197. Published by Cold Spring Harbor Laboratory Press. Copyright a 2006 RNA Society. Sethupathy, P., Corda, B., & Hatzigeorgiou, A. G. (2006). TarBase: A comprehensive database of experimentally supported animal microRNA targets. Rna, 12(2), 192-197. Srivastava, P. K., Moturu, T. R., Pandey, P., Baldwin, I. T., & Pandey, S. P. (2014). A comparison of performance of plant miRNA target prediction tools and the characterization of features for genome-wide target prediction. Bmc Genomics, 15(1), 348. Stark, A., Brennecke, J., Bushati, N., Russell, R. B., & Cohen, S. M. (2005). Animal MicroRNAs confer robustness to gene expression and have a significant impact on 3′ UTR evolution. Cell, 123(6), 1133-1146. Thadani, R., & Tammi, M. T. (2006). MicroTar: predicting microRNA targets from RNA duplexes. BMC bioinformatics, 7(5), S20. Tsang, J. S., Ebert, M. S., & van Oudenaarden, A. (2010). Genome-wide dissection of microRNA functions and cotargeting networks using gene set signatures. Molecular cell, 38(1), 140-153. Vejnar, C. E., &Zdobnov, E. M. (2012). MiRmap: comprehensive prediction of microRNA target repression strength. Nucleic acids research, 40(22), 11673-11683. Vergoulis,T., Vlachos,I.S., Alexiou,P. et al. (2012). TarBase 6.0: capturing the exponential growth of miRNA targets with experimental support. Nucleic Acids Res., 40, D222–D229 Vlachos, I. S., Paraskevopoulou, M. D., Karagkouni, D., Georgakilas, G., Vergoulis, T., Kanellos, I., ...&Fevgas, A. (2014). DIANA-TarBase v7. 0: indexing more than half a million experimentally supported miRNA: mRNA interactions. Nucleic acids research, 43(D1), D153-D159. Wang, X. (2008). miRDB: a microRNA target prediction and functional annotation database with a wiki interface. Rna, 14(6), 1012-1017. Wang, X., & El Naqa, I.M. (2007). Prediction of both conserved and nonconserved microRNA targets in animals. Bioinformatics, 24(3), 325-332. Wang, D., Gu, J., Wang, T. et al. (2014) OncomiRDB: a database for the experimentally verified oncogenic and tumor-suppressiv microRNAs. Bioinformatics., 30, 2237–2238 Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza Washietl, S., Hofacker, I. L., &Stadler, P. F. (2005). Fast and reliable prediction of noncoding RNAs. Proceedings of the National Academy of Sciences of the United States of America, 102(7), 2454-2459. Xiao,F., Zuo,Z., Cai,G. et al. (2009) miRecords: an integrated resource for microRNA-target interactions. Nucleic Acids Res., 37, D105–D110 Yanaihara, N., Caplen, N., Bowman, E., Seike, M., Kumamoto, K., Yi, M., ...&Calin, G. A. (2006). Unique microRNA molecular profiles in lung cancer diagnosis and prognosis. Cancer cell, 9(3), 189-198. Yang, H., Kong, W., He, L., Zhao, J. J., O'Donnell, J. D., Wang, J., ... & Cheng, J. Q. (2008). MicroRNA expression profiling in human ovarian cancer: miR-214 induces cell survival and cisplatin resistance by targeting PTEN. Cancer research, 68(2), 425-433. Yue, D., Liu, H., and Huang, Y. (2009). Survey of computational algorithms for MicroRNA target prediction. Curr. Genomics 10, 478–492. doi: 10.2174/138920209789208219. Zhao, Y., Granas, D., &Stormo, G. D. (2009). Inferring binding energies from selected binding sites. PLoS computational biology, 5(12), e1000590. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Quantitative Biology arXiv (Cornell University)

Comprehensive overview and assessment of miRNA target prediction tools in human and drosophila melanogaster

Loading next page...
 
/lp/arxiv-cornell-university/comprehensive-overview-and-assessment-of-mirna-target-prediction-tools-q49VV7wiT6

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

ISSN
1574-8936
eISSN
ARCH-3345
DOI
10.2174/1574893614666190103101033
Publisher site
See Article on Publisher Site

Abstract

MicroRNAs (miRNAs) are small non-coding RNAs that control gene expression at the post- transcriptional level through complementary base pairing with the target mRNA, leading to mRNA degradation and blocking translation process. Any dysfunctions of these small regulatory molecules have been linked with the development and progression of several diseases. Therefore, it is necessary to reliably predict potential miRNA targets. A large number of computational prediction tools have been developed which provide a faster way to find putative miRNA targets, but at the same time their results are often inconsistent. Hence, finding a reliable, functional miRNA target is still a challenging task. Also, each tool is equipped with different algorithms, and it is difficult for the biologists to know which tool is the best choice for their study. This paper briefly describes fundamental of miRNA target prediction algorithms, discuss frequently used prediction tools, and further, the performance of frequently used prediction tools have been assessed using experimentally validated high confident mature miRNAs and their targets for two organisms Human and Drosophila Melanogaster. Both Drosophila Melanogaster and Human supported miRNA target prediction tools have been evaluated separately to find out best performing tool for each of these two organisms. In the human dataset, TargetScan showed the best results amongst the other predictors followed by the miRmap and microT, whereas in the D. Melanogaster dataset, MicroT tool showed the best performance followed by the TargetScan in the comparison of other tools. Keywords microRNA target prediction, target prediction algorithm, transcript prediction, computational tools, feature extraction. Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza 1. Introduction Micro RNAs (miRNAs) are short endogenous RNAs nearly 22 nucleotides long originating from the non-coding RNAs (Bartel, 2004). miRNAs were first identified in Caenorhabditis elegansin the year 1993 using genetic methods (Lee et al., 1993). miRNAs are expressed from long transcripts produced in animals, plants, viruses, and single-celled eukaryotes (Liu et al., 2012). miRNAs have become the focus of many research because of their significant role in degradation of mRNA, post-translational inhibition through complimentary base pairing (He & Hannon, 2004), and ability to control many biological processes such as homeostasis (Liu et al., 2012). miRNA regulates the target mRNA to make adjustments to the forming corresponding protein, which dysregulates the functions of miRNA, thereby leading to several human diseases (Bing et al., 2012). Cancer is the most common disease caused by miRNAs and their differential expression leads to different types of cancer such as lung cancer (Yanaihara et al, 2006), prostate cancer (Porkka et al., 2007), and ovarian cancer (Yang et al., 2008). miRNAs have also been implicated for causing neurological disorders such as Alzheimer’s disease (Hébert et al., 2009), Schizophrenia (Beveridge et al., 2010), and multiple sclerosis (Cox et a., 2010). A large amount of miRNA data has been generated in recent years due to the major efforts in identifying their targets, and inferring their functions which is difficult to explore and assess by using only biological methods. Therefore, the computational methods in biological research provide statistical approaches to assess their quality and accuracy. In the last few years, several computational tools have been developed for the prediction of miRNA targets, but prediction results greatly vary among these tools due to differences in their algorithms and training features. Therefore, it is difficult for a scientist to choose the best miRNA target prediction tool. In this paper, we have evaluated the performance of 11 miRNA target prediction tools for human as well as Drosophila melanogaster datasets providing the comprehensive summary of the considered tools, their target prediction assessment based on various metrics, including accuracy, number of targets predicted, sensitivity, specificity, true positive rates, false positive rates, and so on. Many approaches have been made in the past few years to assess and evaluate the performance of existing miRNA target prediction tools. Mendes et al. (2009) evaluated the miRNA gene finding methods and target identification, reporting some problems in the existing methods. Bartel (2009) discussed the features of available miRNA target prediction tools, highlighting reasons for the differences among their performances, including recognition of the target nucleotide opposite to the miRNA first nucleotide. According to Bartel (2009), TargetScan rewards an ‘A’ across from the position 1, whereas other algorithms with seed pairing feature rewards a Watson-Crick (WC) match across this position (Krek et al., 2005; Lewis et al., 2005; Stark et al., 2005; Gaidatzis et al., 2007). Ruby et al., (2007) identified many conserved miRNAs through large-scale sequencing, which were not predicted by the tools. Alexiou et al., (2009) tested eight miRNA target predictors on a small Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza datasets of five and sixty one miRNAs and proposed that the targets predicted by more than one algorithm are better than the other targets. Fan and Kurgan, (2014) studied seven miRNA predictors and the TargetScan and miRMap showed the overall high quality. They proposed that the prediction of target sites is more difficult than predicting the target genes due to the lower predictive quality of the prediction tools at the duplex level. Srivastava et al., (2014) evaluated the performance of eleven miRNA target predictors on the plant datasets. Akhtar et al., (2015) assessed the accuracy of miRNA predictors and reported the prediction of large number of false positives as the major flaw; but their accuracy in the prediction of true targets is still questionable. This review is a comprehensive analysis of the performance of the existing miRNA target prediction tools. 2. Common features of miRNA target prediction tools Computational methods are used to identify that how miRNAs specifically targets the mRNAs. Following are a few common features on which most of the miRNA target prediction tools are based (Sarah et al., 2014). 2.1 Seed match The region of miRNA starting from 5’-end to the 3’-end consisting of first 2-8 nucleotides is called the seed sequence (Lewis et al., 2005). It is considered as Watson-Crick (WC) match between a miRNA and its target by most of the prediction tools. An alignment between the miRNA and its target lying within the WC matching without any gaps in between is considered as the perfect seed match. Different algorithms consider different types of seed matches. The most commonly considered seed matches are as follows (Lewis et al., 2003; Kreck et al., 2005; Brennecke et al., 2005): a. 6-mer: a perfect seed matching for six nucleotides between the miRNA seed and the mRNA. b. 7-mer-m8: a perfect seed match between 2-8 nucleotides of miRNA seed sequence. c. 7mer-A1: a perfect seed match between 2-7 nucleotides of miRNA seed sequence in addition to an A across the miRNA first nucleotide. d. 8-mer: a perfect seed match between nucleotides 2-8 of miRNA seed sequence in addition to an A across the miRNA first nucleotide. 2.2 Free energy It is a Gibb’s free energy which is used as a measure of stability of miRNA structure by many tools. When a miRNA binds to the target mRNA resulting to a stable structure, it is considered as the most likely target of that miRNA. The reactions with more negative delta-G are less reactive, and therefore, have more stability. The hybridization of miRNA with its target mRNA provide information about the high and low free energy regions and delta-G predicts the strength of bonding between the miRNA and its target mRNA (Yue et al., 2009). Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza 2.3 Conservation It is the occurrence of a same sequence across the species. This feature analyzes the regions such as the miRNA, 3’-UTR, 5’-UTR. It has been found that seed region is more conserved than the other regions (Lewis et al., 2003). A small portion of miRNA which interacts with the target mRNA has conserved pairing which compensates for the mismatched seed and known as ‘3’-Compensatory sites’ (Friedman et al., 2009). Conservation analysis helps to predict whether a predicted miRNA target is functional or not. 2.4 Site accessibility It is the measure of the ease of miRNA by which it may locate its target mRNA to hybridize with it. The miRNA first binds to a short accessible region of a mRNA and then their hybridization are marked by the unfolding of the mRNA secondary structure after the completion of binding of the miRNA. Hence, to find the most probable target of the miRNA, the amount of energy required to make a site accessible is evaluated. There are a few other features which are used in most of the target prediction algorithms. GU Wobble seed match calculates the chances of a G pairing with a U instead of C (Doench et al., 2004). Position Contribution determines the position of a target sequence within the mRNA (Grimson et al., 2007). Seed pairing stability is the free energy change calculated for a predicted duplex (Garcia et al., 2011). Target-site abundance determines the number of sites occurring in the 3’-UTR (Garcia et al., 2011). Local AU content is the concentration of A and U nucleotides which flank in the corresponding seed region (Friedman et al., 2009; Betel et al., 2010). 3’-Compensatory pairing is the pairing region (12- 17 nts) in which the base pairs match with miRNA nucleotides. 3. miRNA databases Basically, there are few online miRNA databases which provide all the experimentally validated miRNAs belonging to different species, including miRBase (Griffiths-Jones et al., 2006; 2008), TarBase (Sethupathy et al., 2006; Vlachos et al., 2014), and miRTarbase (Chou et al., 2016). miRBase is an online database which is available at http://www.mirbase.org/ (Griffiths-Jones et al., 2006). The miRBase is an online searchable archive of published miRNA sequences and annotation. Each record in miRBase signifies a predicted hairpin-portion of a miRNA transcript called as ‘mir’ along with information of location and sequence of the mature miRNA. The miRBase also stores sequences of all the published mature miRNA, along with their predicted source hairpin precursors and annotation relating to their discovery, structure and function. miRBase has a nomenclature scheme for all predicted targets, for example, has-miR-121, in which the first three alphabets signify the organism, ‘R’ in miR denotes the mature miRNA sequence, and a number as a suffix. TarBase is a manually curated and experimentally supported collection of miRNA targets (Sethupathy et al., 2006). DIANA- Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza TarBase v7.0 (Vlachos et al., 2014) stores more than half a million miRNA-gene interactions which uses 356 different cell types from 24 species, including human, mouse, fruit fly, zebrafish, and worms. miRTarBase is another manually curated database which stores more than 360 thousand miRNA-target interactions (Chou et al., 2016). These accumulated target-interactions have been further experimentally validated by reporter assay, western blot, microarray and next-generation sequencing experiments. miRTarBase release 6.0 (Chou et al., 2016) contains 3,786 miRNAs and 22,563 targets from 18 different species. There are several other databases such as miRDB (Wang, 2008), which uses miRNA sequences from miRBase and mRNA 3’-UTR sequences are imported from the GenBank files using BioPerl (http://www.bioperl.org), and uses MirTarget version 2 tool for the genome-wide target prediction (Wang and Naqa, 2007). miRNAMap (Hsu et al., 2006) is another database which stores the miRNA genes, putative miRNA genes, known and putative miRNA targets of human, mouse, rat and dog. The putative miRNA targets are obtained using RNAz (https://www.tbi.univie.ac.at/software/RNAz/), which is a tool used for non-coding RNA prediction based on comparative sequence analysis (Washietl et al., 2005). The mature miRNA of the putative miRNA genes are accurately predicted using a machine learning approach, called mmiRNA. The miRNA targets within the conserved regions of 3’-UTR of the genes are predicted using the miRanda algorithm (Enright et al., 2003). The miRGate (Andrés-León et al., 2015) is another comprehensive database consist of miRNA-mRNA pairs which are calculated using five target prediction algorithms: miRanda (Enright et al., 2003), TargetScan (Lewis et al., 2005; Bartel, 2009; Agarwal et al., 2015), RNAhybrid (Krüger & Rehmsmeier, 2006), microTar (Thadani& Tammi, 2006), and PITA (Kertesz et al., 2007). It also consists of complete sequences of miRNA and mRNAs 3’-UTRs of human (including human viruses), mouse, and rat with experimentally validated data. In miRGate, miRNA sequences are taken from the miRBase 20 (Kozomara and Griffiths-Jones, 2013) as it consists of a lot of datasets as compared to the other datasets. The miRGate obtained experimentally validated data from four databases: miRecords (Xiao et al., 2009), TarBase (Vergoulis et al., 2012), OncomirDB (Wang et al., 2014), and miRTarBase (Chou et al., 2016). 4. Materials and Methods 4.1 Datasets For a comprehensive evaluation of predicted targets of miRNA from eleven different prediction tools, we have considered the miRNAs which were validated by experimental methods taken from miRNA databases to obtain optimal results. In this study, we have considered datasets from two species: Drosophila melanogaster and human. The high confidence miRNAs were downloaded from miRBase and their validated targets were obtained from the miRTarbase. The database consists of 28,645 entries of around 110 species at the time of writing this manuscript. The downloaded two datasets have been considered as a benchmark for the evaluation of eleven considered miRNA target prediction tools. Detailed description of these two datasets is as follows: Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza Dataset-I: Drosophilla melanogaster Drosophilla melanogaster [BDGP5.0] is a model organism, having total 256 precursor miRNA sequences. We have focused our study on the miRNAs having a high probability of expression level notated as high confidence miRNAs. Drosophilla melanogaster has 76 high confidence miRNAs. These high confidence miRNAs were then searched for their validated targets using miRBase and miRTarBase which is a database of experimentally validated microRNA-target interactions. According to miRTarBase, Drosophilla melanogaster shows 147 miRNA-target interactions between 45 miRNAs and 86 target genes. In this study, targets of Drosophila melanogaster miRNAs were predicted usingseven different tools,namely TargetScan, MicroT-CDS, PicTar, miRror, microRNA, ComiR, and PITA, and their performance evaluation has been performed. Dataset-II: Human (Homo sapiens) The name and sequences of highly confidence, mature miRNAs were downloaded from miRBase. There are 2588 highly confidence human miRNAs, out of which 208 random miRNAs were selected in this study. The validated target genes of all mature miRNAs were downloaded from TarBase and miRTarBase. These targets for human miRNAs were separated into another file using a program, which were used as the benchmark for testing ten target prediction tools, namely TargetScan, miRSystem, mirWalk, miRmap (Vejnar et al., 2012), miRSearch (Lewis et al., 2005; García et al., 2011), microT, microRNA, PITA, CoMir, and PicTar. The targets for each miRNA were predicted using considered ten tools and then further assessed for their performance and accuracy. Table 1 presents a brief summary of considered datasets for our comprehensive assessment. Table 1 Summary of datasets used for the assessment of miRNA target prediction tools Organisms Number of Number Data source Tools used for assessment miRNAs of targets Drosophila Out of 76 140 miRBase, • PicTar (Krek et al., 2005) Melanogaster entries in TarBase, • PITA (Kertesz et al., 2007) miRBase, 44 MirTarBase • microRNA (Betel et al., 2008) experimentally • CoMir (Coronnello&Benos, 2013) validated are • microT-CDS (Paraskevopoulou et al., 2013) considered • MiRorSuite (Friedman et al., (2014) • TargetScan (Agarwal et al., 2015) Human Out of 2,588 26,315 miRBase, • PITA (Kertesz et al., 2007) • microRNA (Betel et al., 2008) high confident TarBase, • miRSearch(Lewis et al., 2005; García et al., miRNAs in MirTarBase 2011) miRBase, 208 • miRSystem (Lu et al. (2012) are considered • miRmap (Vejnar et al., (2012) randomly • microT-CDS (Paraskevopoulou et al., 2013) • CoMir (Coronnello&Benos, 2013) • mirWalk (Dweep et al., 2014) • TargetScan (Agarwal et al., 2015) • PicTar Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza 4.2 Target Prediction Tools Categorizing the gene targets of miRNAs is essential for illustrating the biological mechanisms underlying these powerful regulatory molecules. There are several miRNA target prediction algorithms which exploit different approaches to predict the binding targets. In animal genomes, miRNAs show only partial complementarity to their target mRNA in disparity to plants where miRNAs can bind with almost perfect complementarity to their targets (Carrington and Ambros, 2003), which also makes it difficult to predict the target genes for animal genomes (Martin et al., 2007). In fact, these tools still need many fold of improvement and bioinformatics techniques require high-throughput experiments in order to validate predictions. Existing miRNA target prediction tools applies machine learning methods and probabilistic learning algorithms in order to construct predictive models whose foundation lies on experimentally verified miRNA targets. In the following section, we discussed miRNA target prediction techniques and summarized a detailed comparison of methodologies and features they use. All computer-based miRNA target prediction programs are created with specific features and parameters, where minor variation may result differently for the same input. The eleven prediction tools considered in this study are described briefly in the following section. 4.2.1 ComiR ComiR (Combinatorial microRNA) is a web server to predict the targets of a set of miRNAs (Coronnello et al., 2012; Coronnello&Benos, 2013). It is easy to access and give the expecting result with higher accuracy in comparison to other tools. CoMir computes the potential of an mRNA being targeted by a miRNA in the species (human, mouse, fly, and worm genomes). The target genes can be predicted in two ways, either by entering a set of miRNAs along with their expression levels, or by entering a list of miRNA IDs. In the former case, CoMir calculates the targeting potential in two ways: first by applying four different methods: (i) miRanda (Enright et al., 2003) which calculates the probability of mRNA:miRNA binding based on the Fermi-Dirac equations (Zhao et al., 2009; Coronello et al., 2012) that consider the miRNA expression, and sum the individual probabilities over all of the mRNA of all miRNAs in the given set; (ii) second method is similar to PITA (Kertesz et al., 2007), in which the equations substitutes the standard energies, (iii) in the third method, TargetScan (Lewis et al., 2005) scoring (without conservation) is weighted by each miRNA expression level, and (iv) mirSVR (Betel et al., 2010) is used, whose scores are combined to the weighted miRNA expression levels. Finally, in the second step, the predictions of the above four methods applied in the first step are combined with the support vector machine (SVM) which is trained on high quality dataset. On the other hand, when the miRNA IDs are input without expression levels, the CoMir assumes all the miRNAs as expressed at the same level (Coronello and Benos, 2013). If single Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza miRNA is selected, ComiR computes each gene targeting score for miRNA for selected species. It has an optional box in which we can input single miRNA sequence in FASTA format and it will predict all target genes for the miRNA. All required mature miRNA sequences can be downloaded from miRbase database. ComiR supports four species: H. sapiens, D. melanogaster, E. elegans and M. musculus. Evaluation of the result can be done in two manners either based on rank or score. 4.2.2 microT-CDS DIANA-microT-CDS (Paraskevopoulou et al., 2013) is the latest version of microT algorithm. The algorithm uses the presence of a positive and negative set of miRNA recognition elements (MREs) to be found in both the 3'-UTR and CDS regions. DIANA-microT-CDS achieves a major rise in sensitivity as compared to previous versions. The sensitivity according to the available review literature of microT-CDS is ~65%, whereas it was only 52% in the older versions of microT. In our study, miRNA target prediction is made using microT-CDS on Human and D. melanogaster (fruit fly) high confidence miRNAs datasets. 4.2.3 MicroRNA MicroRNA (Betel et al., 2008) is an online target prediction tool, which predicts candidate targets and its downregulation scores using the mirSVR (Betel et al., 2010) machine learning method. The miRanda algorithm (Enright et al., 2003) is used to predict the targets and observed miRNA expression level. It computes the complementarity between a given set of miRNAs and an mRNA on the basis of weighted Smith-Waterman algorithm. The secondary filter applied in this tool is to estimate the free energy of the formation of the miRNA:mRNA duplex. The current version is used to predict targets for Human, Drosophila melanogaster, roundworm and mouse. Targets are predicted through miRNA identifiers and species. It gives all target genes with their alignment sites. 4.2.4 miRror Suite miRror Suite (Friedman et al., 2010; 2014) is an online tool to predict likely targets for a set of miRNAs. It has two protocols for prediction: gene to miRNA and miRNA to gene. miRror ranks a list of target genes according to their likelihood to be targeted by the given set of miRNAs. It requires miRNA ID and gene accession ID to predict the expected results. It accepts a set of miRNAs/genes or at least two valid miRNA/genes. miRror supports several species and integrates many other resources, including TargetScan database (Grimson et al., 2007), PITA (Kertesz et al., 2007), PicTar (Krek et al., 2005), Microcosm (John et al., 2004), MiRanda (Betel et al., 2008) (conserved and non-conserved), miRDB (Wang, 2008), RNA22 (Miranda et al., 2006) and Mirz (Hausser et al., 2009). It gives scores based on the integrated databases. Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza 4.2.5 PicTar PicTar is an algorithm for the identification of miRNA targets (Krek et al., 2005). It supports several organisms including Drosophila, and the non-conserved co-expressed human miRNAs.After entering a query (nucleotide sequence of mature miRNA or multiple sequence alignment of RNA residues), PicTar first locate all the possible sites termed as nuclei (length 7, starting at position 1 or 2 of the 5’ end of the miRNA) in the given sequence followed by some filters. The optimal free energy of each nuclei is predicted which narrows down to lesser targets. The highly probable nuclei with optimal free energy falling into the overlapping positions in the alignment of the considered species are called anchors. If the 3’-UTR alignment has enough anchors, each UTR in the alignment is then subjected to be scored by the central PicTar maximum likelihood procedure, after which all the scores of the orthologous transcripts are combined. Finally, a list of transcripts ranked by the PicTar score is displayed (Krek et al., 2005). 4.2.6 PITA PITA (Kertesz et al., 2007) incorporates a new approach the prediction of miRNA targets. Its main hypothesis is based on the fact that mRNA structure plays significant role in recognizing targets by thermodynamically promoting or suppressing the interaction. This tool allows the user to run the PITA algorithm on his choice of UTRs and miRNAs. PITA first scans the UTR for potential miRNA targets and then scores each site according to the parameter-free model explained by Kertesz et al., (2007). This model computes the difference between the free- energy gained by the formation of miRNA-target duplex and the energy released by the un-pairing of the target to make it accessible to the miRNA. The PITA algorithm uses the features such as seed match, free energy, site accessibility, target-site abundance, and G:U pairs allowed in the seed. 4.2.7 TargetScan TargetScan (Lewis et al., 2005; Bartel, 2009; Agarwal et al., 2015) is one of the wide-range miRNA target prediction tool that supports human, mouse, fruit-fly, worm, and fish. It has been upgraded several times and provides wide range of information about their predicted as well as validated binding sites on their target genes. It estimates the cumulative weighted context++ score (CWCS) for each miRNA. The CWCS score ranks based upon the predicted repression or PCT (probability of conserved targeting) aggregated score of the longest 3’-UTR isoform. Firstly, the 6mer, 7mer-A1, 7mer-m8, and 8mer are first filtered to remove overlapping sites for each miRNA family, then the CWCS is calculated for each member of a miRNA family, and the member which represents the greatest predicted repression score, is chosen to represent that family and the reference 3’-UTR with the most 3p-seq tags represents the gene (Agarwal et al., 2015). Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza 4.2.8 miRSystem miRSystem (Lu et al., 2012) integrates seven tools to predict targets for miRNAs, which includes DIANA-microT (Maragkakis et al., 2009), miRanda (Betel et al., 2008), miRBridge (Tsang et al., 2010), PicTar (Krek et al., 2005), PITA (Kertesz et al., 2007), RNA22 (Miranda et al., 2006), and TargetScan (Lewis et al., 2005; Bartel, 2009; Agarwal et al., 2015). Currently it supports human and mouse. 4.2.9 miRWalk miRWalk (Dweep et al., 2014) is a comprehensive miRNA target prediction tool, which integrates 12 existing target prediction tools, namely DIANA-microTv4.0, DIANA-microT-CDS, miRanda Release 2010, mirBridge (Tsang et al., 2010), miRDB4.0 (Wang, 2008), miRmap (Vejnar et al., 2012), miRNAMap (Hsu et al., 2006), PicTar2, PITA, RNA22 version 2, RNAhybrid2.1 (Krüger & Rehmsmeier, 2006), and TargetScan6.2. It provides the miRNA binding sites within the complete sequence of a gene. It supports human, rat, dog, mouse, and cow species. 4.2.10 miRMap miRMap (Vejnar and Zdobnov, 2012) is a comprehensive prediction tool which implements eleven different features for target prediction. One of the eleven featuresevaluates the significance of negative selection, which is based on a performing predictor for evolution named PhyloP (Pollard et al., 2010). Currently, it supports human, mouse, rat, cow, opossum, chicken, chimpanzee, and zebrafish. 4.2.11 miRSearch miRSearch (Lewis et al., 2005; García et al., 2011) is an online search tool for miRNA targets interaction. The results are based on TargetScan (Lewis et al., 2005) providing gene targets for human, mouse, and rat miRNAs. miRSearch uses an advanced algorithm to cross-reference all the annotations found in the literature and displays a comprehensive list of miRNA-mRNA interactions. It uses context++ score to predict results without considering site conservation. All the 11 considered target prediction tools are compared in terms of input requirement, tool features, supported species, tool URL, and its citation, as shown in Table 2. The objective of this paper is to evaluate and assess the performance of miRNA target prediction tools in human and drosophila melanogaster. Therefore, we have considered those tools which either supports human or drosophila melanogaster. Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza Table 2. List of tools for miRNA target prediction S.No. Name of Input Tool features Supported Tool URL References Tools species 1. ComiR miRNA http://www.benosl Coronnello • seed match • Human name ab.pitt.edu/comir/ et al., • conservation • Mouse (2012). • free energy • Fly • site accessibility • Worm • target-site abundance • machine learning • 3' compensatory pairing • G:U pairs allowed in the seed • local AU content • miRNA expression level 2. microT-CDS miRNA • seed match • Human http://diana.imis.at name, gene hena- • conservation • Mouse name, innovation.gr/Dian Paraskevop • free energy • Fly Ensembl aTools/index.php? oulou et al., • site accessibility • Worm ID r=microT_CDS/in (2013). • target-site abundance dex • machine learning • 3' compensatory pairing • G:U pairs allowed in the seed • local AU content microRNA miRNA http://www.micror Betel et al. 3. • local AU content • Human name na.org/microrna/h (2008). • seed match • Mouse ome.do • conservation • Fly • secondary structure • Rat accessibility MiRorSuite miRNA http://www.proto. 4. • seed match • Human name cs.huji.ac.il/mirror Friedman et • conservation • Mouse /index.php al., (2014). • free energy • Fly • site accessibility • Rat • target-site abundance • Worm • machine learning • Fish • 3' compensatory pairing • G:U pairs allowed in the seed • local AU content 5. Pictar miRNA seed match, pairing • Vertebrate http://pictar.mdc- Krek et al., name, gene stability berlin.de/ 2005 • Fly name • Worm • Mouse PITA miRNA https://genie.weiz Kertesz et 6. • seed match • Human name mann.ac.il/pubs/m al., (2007). • free energy • Mouse ir07/index.html • site accessibility • Fly • target-site abundance • Worm • G:U pairs allowed in the seed Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza 7. TargetScan miRNA http://www.targets Lewis et • seed match • Human name, can.org/vert_71/ al., 2005; • conservation • Mouse miRNA Bartel, • free energy • Fly family, 2009; • site accessibility • Worm gene name Agarwal et • target-site abundance • Fish al., 2015 • 3' compensatory pairing • G:U pairs allowed in the seed • local AU content 8. miRmap miRNA http://mirmap.ezla • seed match • Human name b.org/ • conservation • Mouse Vejnar et • free energy • Rat al., (2012) • site accessibility • Cow • local AU content • Chicken • Site over-representation • Zebrafish probability 9. miRSearch miRNA miRSearch uses an https://www.exiqo Lewis et al. • Human name advanced cross-referencing n.com/miRSearch (2005); • Mouse system to identify García et • Rat validated and predicted al. (2011) miRs for any target. miRSystem miRNA Integrated system using http://mirsystem.c Lu et al. 10. • Human name seven prediction tools: gm.ntu.edu.tw/ (2012). • Mouse DIANA, miRanda, miRBridge, PicTar, PITA, rna22, and TargetScan. 11. miRWalk miRNA Combines 12 existing http://129.206.7.1 Dweep et • Human name miRNA target prediction 50/ al., 2014 • Mouse algorithms: DIANA- • Rat microTv4.0, DIANA- microT-CDS, miRanda- rel2010, mirBridge, miRDB4.0, miRmap, miRNAMap, doRiNA i.e., PicTar2, PITA, RNA22v2, RNAhybrid2.1 and Targetscan6.2 4.3 Empirical evaluation We used comprehensive evaluation metrics to analyze the performance and accuracy of the considered eleven miRNA target prediction tools. The predicted targets were categorized into four categories: true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN). TP and TN refers to the count of the correctly predicted functional and non-functional targets, respectively, whereas, FP and FN are the counts of the functional and non-functional targets which were not validated by the experimentally proven targets. The predictions were assessed using the following measures: = (1) Muniba Faiza, K Kh hu us sh hn nu um ma a T Ta an nv ve ee er r, , S Sa am ma an n F Fa at ti ih hi i,, Y Yo on ng gh hu ua a W Wa an ng g,, K Kh ha al li id d R Ra az za a = (2) = (3) = (4) = 2 ∗ (5) 5. Results and Discussions W W We e e e e ev v va a al l lu u ua a at t te e ed d d t t th h he e e p p pe e er r rf f fo o or r rm m ma a an n nc c ce e e o o of f f e e el l le e ev v ve e en n n d d di i if f ff f fe e er r re e en n nt t t m m mi i iR R RN N NA A A t t ta a ar r rg g ge e et t t p p pr r re e ed d di i ic c ct t to o or r rs s s o o on n n t t th h he e e d d da a at t ta a as s se e et t ts s s o o of f f t t tw w wo o o different species (human and D D.. m me el la an no og ga as st te er r) ),, s sc ca al li in ng g t th he em m o on n d di if ff fe er re en nt t p pa ar ra am meters such as s s se e en n ns s si i it t ti i iv v vi i it t ty y y,, , s s sp p pe e ec c ci i if f fi i ic c ci i it t ty y y,, , p p pr r re e ec c ci i is s si i io o on n n,, , r r re e ec c ca a al l ll l l,,, a a an n nd d d o o ot t th h he e er r r s s si i im m mi i il l la a ar r r e e em m mp p pi i ir r ri i ic c ca a al l l m m me e et t th h ho o od d ds s s... T T Th h he e e s s st t tr r ra a at t te e eg g gy y y a a ap p pp p pl l li i ie e ed d d t t to o o e e ev v va a al l lu u ua a at t te e e t t th h he e e p p pe e er r rf f fo o or r rm m ma a an n nc c ce e e o o of f f m m mi i iR R RN N NA A A t t ta a ar r rg g ge e et t t p p pr r re e ed d di i ic c ct t to o or r rs s s i i is s s s s sh h ho o ow w wn n n i i in n n F F Fi i ig g g.. . 1 1 1.. . T T Th h he e e p p pe e er r rf f fo o or r rm m ma a an n nc c ce e e e e ev v va a al l lu u ua a at t ti i io o on n n o of f p pr re ed di ic ct ti io on n t to oo ol ls s w wa as s p pe er rf fo or rm me ed d i in n t te er rm ms s o of f d di if ff fe er re en nt t m me et tr ri ic cs s n na am me el ly y,, a av ve er ra ag ge e p pr re ed di ic ct ti io on n p pe er r miRNA, TPR, F-m m me e ea a as s su u ur r re e e,,, a a an n nd d d c c co o om m mb b bi i in n ni i in n ng g g r r re e es s su u ul l lt t ts s s o o of f f b b be e es s st t t p p pe e er r rf f fo o or r rm m mi i in n ng g g t t to o oo o ol l ls s s ( ( (u u un n ni i io o on n n & & & i i in n nt t te e er r rs s se e ec c ct t ti i io o on n n) ) ) on experimentally validated D Dr ro os so op ph hi il la a M Me el la an no og ga as st te er r and Human miRNA targets. Fig. 1 Representation of t t th h he e e s s st t tr r ra a at t te e eg g gy y y a a ap p pp p pl l li i ie e ed d d t t to o o e e ev v va a al l lu u ua a at t te e e m m mR R RN N NA A A t t ta a ar r rg g ge e et t t p p pr r re e ed d di i ic c ct t ti i io o on n n t t to o oo o ol l ls s s Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza 5.1 Dataset-I (Drosophila) (a) Average predictions per miRNA A single miRNA could target multiple (6-7) mRNAs and a single mRNA could be targeted by several (4-5) miRNAs (German et al., 2008; Beauclair et al., 2010). Our results shown in Fig. 2 are consistent with this hypothesis. The average number of targets predicted for D. melanogaster ranged between 100-1400 per miRNA (Fig. 2). The results suggest that the tool microRNA predicted the highest number of targets which is followed by CoMir, microT, miRor, PITA, PicTar, and TargetScan. Fig. 2 Average number of target prediction per miRNA by different tools in Drosophila. (b) True positive rate (TPR) & false positive rate (FPR) The plot of TPR and total number of predicted targets are shown in Fig. 3, which suggests that the seven considered tools for drosophila dataset predicted a large number of targets and majority of the tools followed the similar pattern. The highest TPR is achieved by the tool MicroT followed by Comir with a very slight difference. All the considered tools predict a large number of false positives, and due to that the FPR goes to ~99%. Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza Fig. 3 Evaluation of tools on the basis of true positive rate (TPR) and total number of targets predicted for Drosophila melanogaster. (c) Precision-recall and F-measure Precision and recall are the important parameters to evaluate the accuracy and sensitivity of predictions.. According to the results obtained from the analysis of drosophila dataset, TargetScan showed the highest precision 0.0097 (though smaller according to the scale) and the recall is 0.5214 (Supplementary file). The precision of these seven tools ranged between 0.005 to 0.009 and recall ranges between 0.2 and 0. The highest recall is shown by the MicroT tool whereas the precision Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza comes next to that of TargetScan (Fig. 4). The tools were evaluated at an optimal score of 0.0 and the F-measure was calculated which is a harmonic mean of precision and recall, also known as the F1- score. The F-measure is only high when the precision and recall are high. The highest F-measure amongst the seven tools used to predict targets for Drosophila melanogaster is shown by TargetScan followed by MicroT. Fig. 4 A line graph showing the F-measure and Precision-recall calculated drosophila dataset. TargetScan showed the highest F-measure, while microT shows the highest recall. Muniba Faiza, K Kh hu us sh hn nu um ma a T Ta an nv ve ee er r,, S Sa am ma an n F Fa at ti ih hi i,, Y Yo on ng gh hu ua a W Wa an ng g,, K Kh ha al li id d R Ra az za a (d) C Co om mb bi in ni in ng g r re es su ul lt ts s t to o i im mp pr ro ov ve e a ac cc cu ur ra ac cy y a an nd d r re el li ia ab bi il li it ty y It is h h hy y yp p po o ot t th h he e es s si i iz z ze e ed d d t t th h ha a at t t u u un n ni i io o on n ns s s o o of f f p p pr r re e ed d di i ic c ct t te e ed d d r r re e es s su u ul l lt t ts s s a a ar r re e e s s su u up p pp p po o os s se e ed d d t t to o o a a ac c ch h hi i ie e ev v ve e e h h hi i ig g gh h he e er r r r r re e ec c ca a al l ll l ls s s w w wh h he e en n n compared to the outcomes of i in nd di iv vi id du ua al l t to oo ol ls s. Similarly, the intersections m ma ay y a ac ch hi ie ev ve e higher precisions. The results of the b be es st t p pe er rf fo or rm mi in ng g tools for drosophila d da at ta as se et t w we er re e c co om mb bi in ne ed d a as s u un ni io on ns s and intersections to improve their r re ec ca al ll l a an nd d precision ( (F Fi ig g.. 5 5) ).. T Th he e u un ni io on n o of f M Mi ic cr ro oT T a an nd d m mi ic cr ro oR RN NA A s s sh h ho o ow w we e ed d d a a a t t tw w wo o o f f fo o ol l ld d d i i in n nc c cr r re e ea a as s se e e i i in n n t t th h he e e T T TP P Ps s s,,, w w wh h hi i ic c ch h h w w wa a as s s a a as s s m m mu u uc c ch h h a a as s s f f fo o ou u un n nd d d t t to o o b b be e e d d de e ec c cr r re e ea a as s se e ed d d i i in n n t t th h he e e c c ca a as s se e e o o of f f t t th h he e e intersection of microRNA and C Co oM Mi ir r. Fig. 5 C Co om mb bi in ne ed d o ou ut tp pu ut ts s o of f t th he e d da at ta as se et t-I (D. melanogaster) to improve accuracy & reliability. A A) ) represents the c c co o om m mp p pa a ar r ri i is s so o on n n o o of f f t t th h he e e t t to o ot t ta a al l l m m mi i iR R RN N NA A A t t ta a ar r rg g ge e et t ts s s p p pr r re e ed d di i ic c ct t te e ed d d b b by y y t t th h he e e t t to o oo o ol l ls s s t t to o o t t th h he e e u u un n ni i io o on n n a a an n nd d d i i in n nt t te e er r rs s se e ec c ct t ti i io o on n n o o of f f t t th h he e e t t to o oo o ol l ls s s t t to o o improve their performance. B) u un ni io on n of microRNA and Comir, and C) intersection o of f M Mi ic cr ro oT T,, C Co om mi ir r a an nd d TargetScan. Abbreviations: Mt–Micro oT T,, R R–microRNA, C–Comir, T–TargetScan. Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza Dataset-II (human) (a) Average predictions per miRNA In the case of human dataset, the average number of predicted targets ranged between 100-8000 per miRNA (Fig. 6). The results suggests that miRmap predicted the highest number of targets which is followed by TargetScan, Comir, miRWalk, MicroT, PITA, miRSearch, microRNA, and miRSystem. Fig. 6 Average number of target prediction per miRNA by different tools in the human dataset (b) True positive rate (TPR) In the case of human dataset, the plot of TPR, FPR, and the total number of predicted targets for human dataset is shown in Fig. 7, which suggests that the seven tools predicted a large number of targets. According to the results, the highest TPR was achieved by the TargetScan followed by miRMap with a small difference. However, miRMap predicted higher number of targets than the TargetScan (Fig. 7A). This shows that the variation between the TPR and the total number of predicted targets (Fig. 7B) is inconsistent and independent. It means that whether a tool predicts higher or smaller number of targets, the TPR remains unaffected. The least FPR is achieved by the tool CoMir followed by MicroT, and TargetScan, miRSystem, miRWalk, miRmap, microRNA, miRSearch, and PITA attained the FPR closer to each other followed by PITA. According to the results, the TargetScan showed the higher TPR and FPR as well. Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza Fig. 7 TPRs and total number of targets predicted by different tools (c) Precision-recall and F-measure The precision and recall of human dataset was also calculated. According to the results, miRSearch showed the highest precision 0.03 (though smaller according to the scale) while the highest recall was showed by the TargetScan (0.77) (Supplementary file). The precision and recall of these seven tools ranged between 0.009 to 0.03 and 0.05 to 0.77 respectively. Since the range of the precision and recall was very low, therefore, we calculated the F-measure for the tools (Fig. 8). The highest F-score was shown by MicroT followed by the TargetScan, CoMir, miRSystem, miRSearch, PITA, PicTar, miRmap, and miRWalk. Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza Fig 8 F-measure and Precision-recall of human dataset (d) Combining results to improve accuracy and reliability The results of the best performing tools considered for human dataset were combined as unions and intersections to increase their performance and accuracy, as done for the first dataset of D. melanogaster. After score optimizations, the combination of TargetScan and miRmap resulted in 31914 TPs, which is a higher than the TPs predicted by the TargetScan alone (Fig. 9). The combination of miRmap and Comir was less than that of the former, but the false positives were more than that of the TargetScan. The intersection of miRmap, Comir, and miRSearch were very low, which showed only 162 TPs shared by these three tools. Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza Fig. 9 Combined results on dataset-II (human) to improve accuracy. TargetScan showed the highest number of the TPs and the FPs. The union of PITA and miRMap is lesser than the predictions of TargetScan alone. The intersection of MCS showed very less number of predicted targets. Abbreviations: M–miRmap, R–microRNA, C–Comir, T–TargetScan. 6. Conclusion We analyzed eleven miRNA target predictors on two benchmark datasets by applying significant empirical methods to evaluate and assess their accuracy and performance. The best performing tools for the datasets evaluated on the basis of metrics shown in Table 3. According to our results, MicroT, microRNA, and CoMir showed the highest performance in dataset-I (Drosophila melanogaster), and in the dataset-II (Human), TargetScan and miRMap showed the best performance. The predicted results were combined to improve the performance of the tools in both the datasets, but any relevant Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza improvement was not observed in the TPs. It was also observed that the TPR is independent of the number of targets predicted by a tool. For example, in the case of dataset-II (human), miRWalk predicted a large number of targets, but the TPR was very low. Table 3 Metric evaluated best performing tools for both the datasets. Evaluation Datasets methods Dataset-I (D. melanogaster) Dataset-II (human) True positive rate MicroT TargetScan F-measure TargetScan MicroT Precision-Recall MicroT TargetScan Union of results MicroT-microRNA TargetScan–miRMap On the basis of the previous and our analysis, we can say that the existing tools have many limitations and drawbacks which embark the need for more accurate and precise miRNA target predictors. The current tools generate a large amount of false positives and works on different algorithms which makes it difficult to compare them. Although several algorithms and models have been developed to predict miRNAs in-silico, prediction of significant targets with high statistical confidence is still a challenging task. Conflict of Interest The authors declare that there is no any conflict of interest in the publication of the manuscript. Acknowledgement The author Khalid Raza is thankful to Indian Council for Cultural Relations (ICCR), India for his appointment as Visiting Professor (ICCR Chair) at Ain Shams University, Cairo, Egypt. Some part of the manuscript is written during this period. References Agarwal, V., Bell, G. W., Nam, J. W., & Bartel, D. P. (2015). Predicting effective microRNA target sites in mammalian mRNAs. elife, 4, e05005. Akhtar, M. M., Micolucci, L., Islam, M. S., Olivieri, F., &Procopio, A. D. (2015). Bioinformatic tools for microRNA dissection. Nucleic acids research, 44(1), 24-44. Alexiou, P., Maragkakis, M., Papadopoulos, G. L., Reczko, M., &Hatzigeorgiou, A. G. (2009). Lost in translation: an assessment and perspective for computational microRNA target identification. Bioinformatics, 25(23), 3049-3055. Andrés-León, E., González Peña, D., Gómez-López, G., & Pisano, D. G. (2015). miRGate: a curated database of human, mouse and rat miRNA–mRNA targets. Database, 2015, bav035. Backes, C., Meese, E., Lenhof, H. P., & Keller, A. (2010). A dictionary on microRNAs and their putative target pathways. Nucleic acids research, 38(13), 4476-4486. Bartel, D. P. (2004). MicroRNAs: genomics, biogenesis, mechanism, and function. cell, 116(2), 281-297. Bartel, D. P. (2009). MicroRNAs: target recognition and regulatory functions. cell, 136(2), 215-233. Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza Beauclair, L., Yu, A., &Bouché, N. (2010). microRNA‐directed cleavage and translational repression of the copper chaperone for superoxide dismutase mRNA in Arabidopsis. The Plant Journal, 62(3), 454-462. Betel, D., Koppal, A., Agius, P., Sander, C., & Leslie, C. (2010). Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites. Genome biology, 11(8), R90. Betel, D., Wilson, M., Gabow, A., Marks, D. S., & Sander, C. (2008). The microRNA. org resource: targets and expression. Nucleic acids research, 36(suppl_1), D149-D153. Beveridge, N. J., Gardiner, E., Carroll, A. P., Tooney, P. A., & Cairns, M. J. (2010). Schizophrenia is associated with an increase in cortical microRNA biogenesis. Molecular psychiatry, 15(12), 1176-1189. Brennecke, J., Stark, A., Russell, R. B., and Cohen, S. M. (2005). Principles of microRNA-target recognition. PLoS Biol. 3:e85. doi: 10.1371/journal.pbio.0030085. Carrington, J. C., & Ambros, V. (2003). Role of microRNAs in plant and animal development. Science, 301(5631), 336-338. Chou, C. H., Chang, N. W., Shrestha, S., Hsu, S. D., Lin, Y. L., Lee, W. H., ...& Tsai, T. R. (2016). miRTarBase 2016: updates to the experimentally validated miRNA-target interactions database. Nucleic acids research, 44(D1), D239-D247. Coronnello, C., & Benos, P. V. (2013). ComiR: combinatorial microRNA target prediction tool. Nucleic acids research, 41(W1), W159-W164. Coronnello, C., Hartmaier, R., Arora, A., Huleihel, L., Pandit, K. V., Bais, A. S., ... &Benos, P. V. (2012). Novel modeling of combinatorial miRNA targeting identifies SNP with potential role in bone density. PLoSComputBiol, 8(12), e1002830. Cox, M. B., Cairns, M. J., Gandhi, K. S., Carroll, A. P., Moscovis, S., Stewart, G. J., ... &ANZgene Multiple Sclerosis Genetics Consortium. (2010). MicroRNAs miR-17 and miR-20a inhibit T cell activation genes and are under-expressed in MS whole blood. PloS one, 5(8), e12132. Doench, J. G., and Sharp, P. A. (2004). Specificity of microRNA target selection in translational repression. Genes Dev. 18, 504–511. doi: 10.1101/gad.1184404 Dweep, H., Gretz, N., &Sticht, C. (2014). miRWalk database for miRNA–target interactions. RNA Mapping: Methods and Protocols, 289-305. Enright, A. J., John, B., Gaul, U., Tuschl, T., Sander, C., & Marks, D. S. (2003). MicroRNA targets in Drosophila. Genome biology, 5(1), R1. Fan, X., & Kurgan, l., (2014). Comprehensive overview and assessment of computational prediction of microRNA targets in animals. Briefings in bioinformatics, 16(5), 780-794. Friedman, R. C., Farh, K. K., Burge, C. B., and Bartel, D. P. (2009). Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 19, 92–105. doi: 10.1101/gr.082701.108. Gaidatzis, D., van Nimwegen, E., Hausser, J., &Zavolan, M. (2007). Inference of miRNA targets using evolutionary conservation and pathway analysis. BMC bioinformatics, 8(1), 69. Garcia, D. M., Baek, D., Shin, C., Bell, G. W., Grimson, A., and Bartel, D. P. (2011).Weak seed-pairing stability and high target-site abundance decrease the proficiency of lsy-6 and other microRNAs. Nat. Struct. Mol. Biol. 18, 1139–1146. doi:10.1038/nsmb.2115 German, M. A., Pillay, M., Jeong, D. H., Hetawal, A., Luo, S., Janardhanan, P., ...& De Paoli, E. (2008). Global identification of microRNA-target RNA pairs by parallel analysis of RNA ends. Nature biotechnology, 26(8), 941. Griffiths-Jones S, Harpreet Kaur Saini, Stijn van Dongen and Anton J. Enright. (2008).miRBase: tools for microRNA genomics. D154–D158 Nucleic Acids Research, Vol. 36, Database issue. doi:10.1093/nar/gkm952 Griffiths-Jones S, Russell J. Grocock, Stijn van Dongen, Alex Bateman and Anton J. Enright. (2006).miRBase: microRNA sequences, targets, and gene nomenclature. D140–D144 Nucleic Acids Research, Vol. 34, Database issue.doi:10.1093/nar/gkj112 Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza Grimson, A., Farh, K. K., Johnston, W. K., Garrett-Engele, P., Lim, L. P., and Bartel,D.P.(2007). MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol. Cell 27, 91–105. doi: 10.1016/j.molcel.2007.06.017 Grün, D., Wang, Y. L., Langenberger, D., Gunsalus, K. C., &Rajewsky, N. (2005). microRNA target predictions across seven Drosophila species and comparison to mammalian targets. PLoS computational biology, 1(1), e13. Hausser, J., Berninger, P., Rodak, C., Jantscher, Y., Wirth, S., &Zavolan, M. (2009). MirZ: an integrated microRNA expression atlas and target prediction resource. Nucleic acids research, 37(suppl_2), W266- W272. He, L., & Hannon, G. J. (2004). MicroRNAs: small RNAs with a big role in gene regulation. Nature Reviews Genetics, 5(7), 522-531 Hébert, S. S., Horré, K., Nicolaï, L., Bergmans, B., Papadopoulou, A. S., Delacourte, A., & De Strooper, B. (2009). MicroRNA regulation of Alzheimer's Amyloid precursor protein expression. Neurobiology of disease, 33(3), 422-428. Hsu, P. W., Huang, H. D., Hsu, S. D., Lin, L. Z., Tsou, A. P., Tseng, C. P., ... &Hofacker, I. L. (2006). miRNAMap: genomic maps of microRNA genes and their target genes in mammalian genomes. Nucleic acids research, 34(suppl_1), D135-D139. John, B., Enright, A. J., Aravin, A., Tuschl, T., Sander, C., & Marks, D. S. (2004). Human microRNA targets. PLoS biology, 2(11), e363. Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U., & Segal, E. (2007). The role of site accessibility in microRNA target recognition. Nature genetics, 39(10), 1278-1284. Kiriakidou, M., Nelson, P. T., Kouranov, A., Fitziev, P., Bouyioukos, C., Mourelatos, Z., &Hatzigeorgiou, A. (2004). A combined computational-experimental approach predicts human microRNA targets. Genes & development, 18(10), 1165-1178. Kozomara, A., & Griffiths-Jones, S. (2013). miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic acids research, 42(D1), D68-D73. Krek, A., Grun, D., Poy, M. N., Wolf, R., Rosenberg, L., Epstein, E. J., et al. (2005). Combinatorial microRNA target predictions. Nat. Genet. 37, 495–500. doi: 10.1038/ng1536. Krek, A., Grün, D., Poy, M. N., Wolf, R., Rosenberg, L., Epstein, E. J., &Rajewsky, N. (2005). Combinatorial microRNA target predictions. Nature genetics, 37(5), 495-500. Krüger, J., &Rehmsmeier, M. (2006). RNAhybrid: microRNA target prediction easy, fast and flexible. Nucleic acids research, 34(suppl_2), W451-W454. Lee, R. C., Feinbaum, R. L., &Ambros, V. (1993). The C. elegansheterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell, 75(5), 843-854. Lewis, B. P., Burge, C. B., and Bartel, D. P. (2005). Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120, 15–20. doi: 10.1016/j.cell.2004.12.035. Lewis, B. P., Shih, I. H., Jones-Rhoades, M. W., Bartel, D. P., and Burge, C. B. (2003). Prediction of mammalian microRNA targets. Cell 115, 787–798. doi: 10.1016/S0092-8674(03)01018-3 Liu, B., Li, J., & Cairns, M. J. (2012). Identifying miRNAs, targets and functions. Briefings in bioinformatics, 15(1), 1-19. Lu, T. P., Lee, C. Y., Tsai, M. H., Chiu, Y. C., Hsiao, C. K., Lai, L. C., & Chuang, E. Y. (2012). miRSystem: an integrated system for characterizing enriched functions and pathways of microRNA targets. PloS one, 7(8), e42390. Martin, G., Schouest, K., Kovvuru, P., & Spillane, C. (2007). Prediction and validation of microRNA targets in animal genomes. Journal of biosciences, 32, 1049-1052. Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza Mendes, N. D., Freitas, A. T., &Sagot, M. F. (2009). Current tools for the identification of miRNA genes and their targets. Nucleic acids research, 37(8), 2419-2433. Miranda, K. C., Huynh, T., Tay, Y., Ang, Y. S., Tam, W. L., Thomson, A. M., ...&Rigoutsos, I. (2006). A pattern-based method for the identification of MicroRNA binding sites and their corresponding heteroduplexes. Cell, 126(6), 1203-1217. Paraskevopoulou, M. D., Georgakilas, G., Kostoulas, N., Vlachos, I. S., Vergoulis, T., Reczko, M., ...&Hatzigeorgiou, A. G. (2013). DIANA-microT web server v5. 0: service integration into miRNA functional analysis workflows. Nucleic acids research, 41(W1), W169-W173. Pollard,K.S., Hubisz,M.J., Rosenbloom,K.R. and Siepel,A. (2010) Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res., 20, 110–121. Porkka, K. P., Pfeiffer, M. J., Waltering, K. K., Vessella, R. L., Tammela, T. L., &Visakorpi, T. (2007). MicroRNA expression profiling in prostate cancer. Cancer research, 67(13), 6130-6135. Rehmsmeier, M., Steffen, P., Höchsmann, M., &Giegerich, R. (2004). Fast and effective prediction of microRNA/target duplexes. Rna, 10(10), 1507-1517. Ruby, J. G., Stark, A., Johnston, W. K., Kellis, M., Bartel, D. P., & Lai, E. C. (2007). Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs. Genome research, 17(12), 1850-1864. Sarah M. Peterson, Jeffrey A. Thompson, Melanie L. Ufkin, Pradeep Sathyanarayana, Lucy Liaw, and Clare Bates Congdon. Common features of miRNA prediction tools. Frontiers in Genetics. doi : 10.3389 /fgene .2014.00023. Sethupathy P, Benoit C, and Artemis G.H. (2006). TarBase: A comprehensive database of experimentally supported animal microRNA targets. RNA, 12:192–197. Published by Cold Spring Harbor Laboratory Press. Copyright a 2006 RNA Society. Sethupathy, P., Corda, B., & Hatzigeorgiou, A. G. (2006). TarBase: A comprehensive database of experimentally supported animal microRNA targets. Rna, 12(2), 192-197. Srivastava, P. K., Moturu, T. R., Pandey, P., Baldwin, I. T., & Pandey, S. P. (2014). A comparison of performance of plant miRNA target prediction tools and the characterization of features for genome-wide target prediction. Bmc Genomics, 15(1), 348. Stark, A., Brennecke, J., Bushati, N., Russell, R. B., & Cohen, S. M. (2005). Animal MicroRNAs confer robustness to gene expression and have a significant impact on 3′ UTR evolution. Cell, 123(6), 1133-1146. Thadani, R., & Tammi, M. T. (2006). MicroTar: predicting microRNA targets from RNA duplexes. BMC bioinformatics, 7(5), S20. Tsang, J. S., Ebert, M. S., & van Oudenaarden, A. (2010). Genome-wide dissection of microRNA functions and cotargeting networks using gene set signatures. Molecular cell, 38(1), 140-153. Vejnar, C. E., &Zdobnov, E. M. (2012). MiRmap: comprehensive prediction of microRNA target repression strength. Nucleic acids research, 40(22), 11673-11683. Vergoulis,T., Vlachos,I.S., Alexiou,P. et al. (2012). TarBase 6.0: capturing the exponential growth of miRNA targets with experimental support. Nucleic Acids Res., 40, D222–D229 Vlachos, I. S., Paraskevopoulou, M. D., Karagkouni, D., Georgakilas, G., Vergoulis, T., Kanellos, I., ...&Fevgas, A. (2014). DIANA-TarBase v7. 0: indexing more than half a million experimentally supported miRNA: mRNA interactions. Nucleic acids research, 43(D1), D153-D159. Wang, X. (2008). miRDB: a microRNA target prediction and functional annotation database with a wiki interface. Rna, 14(6), 1012-1017. Wang, X., & El Naqa, I.M. (2007). Prediction of both conserved and nonconserved microRNA targets in animals. Bioinformatics, 24(3), 325-332. Wang, D., Gu, J., Wang, T. et al. (2014) OncomiRDB: a database for the experimentally verified oncogenic and tumor-suppressiv microRNAs. Bioinformatics., 30, 2237–2238 Muniba Faiza, Khushnuma Tanveer, Saman Fatihi, Yonghua Wang, Khalid Raza Washietl, S., Hofacker, I. L., &Stadler, P. F. (2005). Fast and reliable prediction of noncoding RNAs. Proceedings of the National Academy of Sciences of the United States of America, 102(7), 2454-2459. Xiao,F., Zuo,Z., Cai,G. et al. (2009) miRecords: an integrated resource for microRNA-target interactions. Nucleic Acids Res., 37, D105–D110 Yanaihara, N., Caplen, N., Bowman, E., Seike, M., Kumamoto, K., Yi, M., ...&Calin, G. A. (2006). Unique microRNA molecular profiles in lung cancer diagnosis and prognosis. Cancer cell, 9(3), 189-198. Yang, H., Kong, W., He, L., Zhao, J. J., O'Donnell, J. D., Wang, J., ... & Cheng, J. Q. (2008). MicroRNA expression profiling in human ovarian cancer: miR-214 induces cell survival and cisplatin resistance by targeting PTEN. Cancer research, 68(2), 425-433. Yue, D., Liu, H., and Huang, Y. (2009). Survey of computational algorithms for MicroRNA target prediction. Curr. Genomics 10, 478–492. doi: 10.2174/138920209789208219. Zhao, Y., Granas, D., &Stormo, G. D. (2009). Inferring binding energies from selected binding sites. PLoS computational biology, 5(12), e1000590.

Journal

Quantitative BiologyarXiv (Cornell University)

Published: Nov 5, 2017

There are no references for this article.