Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

TRM: A Powerful Two‐Stage Machine Learning Approach for Identifying SNP‐SNP Interactions

TRM: A Powerful Two‐Stage Machine Learning Approach for Identifying SNP‐SNP Interactions Studies have shown that interactions of single nucleotide polymorphisms (SNPs) may play an important role in understanding the causes of complex disease. We have proposed an integrated machine learning method that combines two machine‐learning methods—Random Forests (RF) and Multivariate Adaptive Regression Splines (MARS)—to identify a subset of important SNPs and detect interaction patterns more effectively and efficiently. In this two‐stage RF‐MARS (TRM) approach, RF is first applied to detect a predictive subset of SNPs, and then MARS is used to identify the interaction patterns. We evaluated the TRM performances in four models. RF variable selection was based on out‐of‐bag classification error rate (OOB) and variable important spectrum (IS). Our results support that RFOOB had better performance than MARS and RFIS in detecting important variables. This study demonstrates that TRMOOB, which is RFOOB plus MARS, has combined the strengths of RF and MARS in identifying SNP‐SNP interactions in a scenario of 100 candidate SNPs. TRMOOB had greater true positive rate and lower false positive rate compared with MARS, particularly for searching interactions with a strong association with the outcome. Therefore, the use of TRMOOB is favored for exploring SNP‐SNP interactions in a large‐scale genetic variation study. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Annals of Human Genetics Wiley

TRM: A Powerful Two‐Stage Machine Learning Approach for Identifying SNP‐SNP Interactions

Loading next page...
 
/lp/wiley/trm-a-powerful-two-stage-machine-learning-approach-for-identifying-snp-ThJpy6s6u9

References (41)

Publisher
Wiley
Copyright
Copyright © 2012 Wiley Subscription Services
ISSN
0003-4800
eISSN
1469-1809
DOI
10.1111/j.1469-1809.2011.00692.x
pmid
22150548
Publisher site
See Article on Publisher Site

Abstract

Studies have shown that interactions of single nucleotide polymorphisms (SNPs) may play an important role in understanding the causes of complex disease. We have proposed an integrated machine learning method that combines two machine‐learning methods—Random Forests (RF) and Multivariate Adaptive Regression Splines (MARS)—to identify a subset of important SNPs and detect interaction patterns more effectively and efficiently. In this two‐stage RF‐MARS (TRM) approach, RF is first applied to detect a predictive subset of SNPs, and then MARS is used to identify the interaction patterns. We evaluated the TRM performances in four models. RF variable selection was based on out‐of‐bag classification error rate (OOB) and variable important spectrum (IS). Our results support that RFOOB had better performance than MARS and RFIS in detecting important variables. This study demonstrates that TRMOOB, which is RFOOB plus MARS, has combined the strengths of RF and MARS in identifying SNP‐SNP interactions in a scenario of 100 candidate SNPs. TRMOOB had greater true positive rate and lower false positive rate compared with MARS, particularly for searching interactions with a strong association with the outcome. Therefore, the use of TRMOOB is favored for exploring SNP‐SNP interactions in a large‐scale genetic variation study.

Journal

Annals of Human GeneticsWiley

Published: Jan 1, 2012

Keywords: ; ;

There are no references for this article.