Dynamic Distributed and Parallel Machine Learning algorithms for big data mining processing
Dynamic Distributed and Parallel Machine Learning algorithms for big data mining processing
Djafri, Laouni
2022-08-23 00:00:00
This work can be used as a building block in other settings such as GPU, Map-Reduce, Spark or any other. Also, DDPML can be deployed on other distributed systems such as P2P networks, clusters, clouds computing or other technologies.Design/methodology/approachIn the age of Big Data, all companies want to benefit from large amounts of data. These data can help them understand their internal and external environment and anticipate associated phenomena, as the data turn into knowledge that can be used for prediction later. Thus, this knowledge becomes a great asset in companies' hands. This is precisely the objective of data mining. But with the production of a large amount of data and knowledge at a faster pace, the authors are now talking about Big Data mining. For this reason, the authors’ proposed works mainly aim at solving the problem of volume, veracity, validity and velocity when classifying Big Data using distributed and parallel processing techniques. So, the problem that the authors are raising in this work is how the authors can make machine learning algorithms work in a distributed and parallel way at the same time without losing the accuracy of classification results. To solve this problem, the authors propose a system called Dynamic Distributed and Parallel Machine Learning (DDPML) algorithms. To build it, the authors divided their work into two parts. In the first, the authors propose a distributed architecture that is controlled by Map-Reduce algorithm which in turn depends on random sampling technique. So, the distributed architecture that the authors designed is specially directed to handle big data processing that operates in a coherent and efficient manner with the sampling strategy proposed in this work. This architecture also helps the authors to actually verify the classification results obtained using the representative learning base (RLB). In the second part, the authors have extracted the representative learning base by sampling at two levels using the stratified random sampling method. This sampling method is also applied to extract the shared learning base (SLB) and the partial learning base for the first level (PLBL1) and the partial learning base for the second level (PLBL2). The experimental results show the efficiency of our solution that the authors provided without significant loss of the classification results. Thus, in practical terms, the system DDPML is generally dedicated to big data mining processing, and works effectively in distributed systems with a simple structure, such as client-server networks.FindingsThe authors got very satisfactory classification results.Originality/valueDDPML system is specially designed to smoothly handle big data mining classification.
http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.pngData Technologies and ApplicationsEmerald Publishinghttp://www.deepdyve.com/lp/emerald-publishing/dynamic-distributed-and-parallel-machine-learning-algorithms-for-big-IbtVs0j8fj
Dynamic Distributed and Parallel Machine Learning algorithms for big data mining processing
This work can be used as a building block in other settings such as GPU, Map-Reduce, Spark or any other. Also, DDPML can be deployed on other distributed systems such as P2P networks, clusters, clouds computing or other technologies.Design/methodology/approachIn the age of Big Data, all companies want to benefit from large amounts of data. These data can help them understand their internal and external environment and anticipate associated phenomena, as the data turn into knowledge that can be used for prediction later. Thus, this knowledge becomes a great asset in companies' hands. This is precisely the objective of data mining. But with the production of a large amount of data and knowledge at a faster pace, the authors are now talking about Big Data mining. For this reason, the authors’ proposed works mainly aim at solving the problem of volume, veracity, validity and velocity when classifying Big Data using distributed and parallel processing techniques. So, the problem that the authors are raising in this work is how the authors can make machine learning algorithms work in a distributed and parallel way at the same time without losing the accuracy of classification results. To solve this problem, the authors propose a system called Dynamic Distributed and Parallel Machine Learning (DDPML) algorithms. To build it, the authors divided their work into two parts. In the first, the authors propose a distributed architecture that is controlled by Map-Reduce algorithm which in turn depends on random sampling technique. So, the distributed architecture that the authors designed is specially directed to handle big data processing that operates in a coherent and efficient manner with the sampling strategy proposed in this work. This architecture also helps the authors to actually verify the classification results obtained using the representative learning base (RLB). In the second part, the authors have extracted the representative learning base by sampling at two levels using the stratified random sampling method. This sampling method is also applied to extract the shared learning base (SLB) and the partial learning base for the first level (PLBL1) and the partial learning base for the second level (PLBL2). The experimental results show the efficiency of our solution that the authors provided without significant loss of the classification results. Thus, in practical terms, the system DDPML is generally dedicated to big data mining processing, and works effectively in distributed systems with a simple structure, such as client-server networks.FindingsThe authors got very satisfactory classification results.Originality/valueDDPML system is specially designed to smoothly handle big data mining classification.
Journal
Data Technologies and Applications
– Emerald Publishing
Published: Aug 23, 2022
Keywords: Big data mining; Statistical sampling; Map-reduce; Machine learning; Distributed and parallel processing; Big data platforms
Recommended Articles
Loading...
References
Hadoop architecture and its issues
Alam, A.; Ahmed, J.
Solution approach to big data regarding parameter estimation problems in predictive analytics model
Alim, A.; Shukla, D.
Machine learning prediction for mortality of patients diagnosed with covid-19: a nationwide Korean cohort study
An, C.; Lim, H.; Kim, D.
Sample size and its importance in research
Andrade, C.
Simple random sampling with over-replacement
Antal, E.; Tille, Y.
Application of deep learning technique to manage covid-19 in routine clinical practice using ct images: Re- sults of 10 convolutional neural networks
Ardakani, A.A.; Kanafi, A.; Acharya, U.R.; Khadem, N.; Mohammadi, A.
Big data computing and clouds: trends and future directions
Assunção, M.D.; Calheiros, R.N.; Bianchi, S.; Netto, M.A.; Buyya, R.
What's new in icu in 2050: big data and machine learning
Bailly, S.; Meyfroidt, G.; Timsit, J.
Configuring in-memory cluster computing using random forest
To get new article updates from a journal on your personalized homepage, please log in first, or sign up for a DeepDyve account if you don’t already have one.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.