Privacy-preserving training of tree ensembles over continuous data

Samuel Adams; Chaitali Choudhary; Martine de Cock; Rafael Dowsley; David Melanson; Anderson Nascimento; Davis Railsback; Jianwei Shen

doi:10.2478/popets-2022-0042

Loading next page...

References (55)

2 | 1 ≤ j < λ} ← ComposeNet λ
Kyle Fritchman, Keerthanaa Saminathan, Rafael Dowsley, Tyler Hughes, M. Cock, Anderson Nascimento, A. Teredesai (2018)
Privacy-Preserving Scoring of Tree Ensembles: A Novel Framework for AI in Healthcare
2018 IEEE International Conference on Big Data (Big Data)
D. Bogdanov, S. Laur, J. Willemson (2008)
Sharemind: A Framework for Fast Privacy-Preserving Computations
IACR Cryptol. ePrint Arch., 2008
Mingjun Xiao, Liusheng Huang, Yonglong Luo, Hong Shen (2005)
Privacy Preserving C4.5 Algorithm Over Horizontally Partitioned Data
2006 Fifth International Conference on Grid and Cooperative Computing (GCC'06)
J. Quinlan (1992)
C4.5: Programs for Machine Learning
David Wu, Tony Feng, M. Naehrig, K. Lauter (2016)
Privately Evaluating Decision Trees and Random Forests
Proceedings on Privacy Enhancing Technologies, 2016
R. Canetti, Yehuda Lindell, R. Ostrovsky, A. Sahai (2002)
Universally composable two-party and multi-party secure computation
InitializerRonald RivestLaboratory (1999)
Unconditionally Secure Commitment and Oblivious Transfer Schemes Using Private Channels and a Trusted Initializer
Nuttapong Attrapadung, Goichiro Hanaoka, S. Kiyomoto, Tomoaki Mimoto, Jacob Schuldt (2017)
A Taxonomy of Secure Two-Party Comparison Protocols and Efficient Constructions
2017 15th Annual Conference on Privacy, Security and Trust (PST)
Kevin Deforth, Marc Desgroseilliers, Nicolas Gama, Mariya Georgieva, Dimitar Jetchev, Marius Vuille (2022)
XORBoost: Tree Boosting in the Multiparty Computation Setting
Proc. Priv. Enhancing Technol., 2022
M. Cock, Rafael Dowsley, Anderson Nascimento, Devin Reich, Ariel Todoki (2019)
Privacy-Preserving Classification of Personal Text Messages with Secure Multi-Party Computation: An Application to Hate-Speech Detection
IACR Cryptol. ePrint Arch., 2019
Martine Cock, Rafael Dowsley, Anderson Nascimento, Davis Railsback, Jianwei Shen, Ariel Todoki (2020)
High performance logistic regression for privacy-preserving genome analysis
BMC Medical Genomics, 14
S. García, J. Luengo, José Sáez, Victoria López, F. Herrera (2013)
A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning
IEEE Transactions on Knowledge and Data Engineering, 25
Rafael Dowsley, J. Graaf, J. Müller-Quade, Anderson Nascimento (2013)
On the Composability of Statistically Secure Bit Commitments
IACR Cryptol. ePrint Arch., 2008
R. Cramer, I. Damgård, J. Nielsen (2015)
Secure Multiparty Computation and Secret Sharing
(2016)
Protocol 10: Secure Optimized Bit Decomposition Protocol π decompOPT
S. Hoogh, Berry Schoenmakers, Ping Chen, H. Akker (2014)
Practical Secure Decision Tree Learning in a Teletreatment Application
Rafael Dowsley, J. Graaf, D. Marques, Anderson Nascimento (2010)
A Two-Party Protocol with Trusted Initializer for Computing the Inner Product
Rafael Dowsley, J. Müller-Quade, Tobias Nilges (2015)
Weakening the Isolation Assumption of Tamper-Proof Hardware Tokens
ArXiv, abs/1502.03487
Chris Peikert, V. Vaikuntanathan, Brent Waters (2008)
A Framework for Efficient and Composable Oblivious Transfer
IACR Cryptol. ePrint Arch., 2007
Rafael Dowsley, J. Müller-Quade, Anderson Nascimento (2008)
On Possibility of Universally Composable Commitments Based on Noisy Channels
Anais do VIII Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais (SBSeg 2008)
Party 2 creates the sharing
D. Hofheinz, J. Müller-Quade (2004)
Universally Composable Commitments Using Random Oracles
Yehuda Lindell, Benny Pinkas (2000)
Privacy Preserving Data Mining
Journal of Cryptology, 15
B. David, Rafael Dowsley, J. Graaf, D. Marques, Anderson Nascimento, Adriana Pinto (2016)
Unconditionally Secure, Universally Composable Privacy Preserving Linear Algebra
IEEE Transactions on Information Forensics and Security, 11
Sameer Wagh, Divya Gupta, Nishanth Chandran (2019)
SecureNN: 3-Party Secure Computation for Neural Network Training
Proceedings on Privacy Enhancing Technologies, 2019
R. Canetti (2001)
Universally composable security: a new paradigm for cryptographic protocols
Proceedings 2001 IEEE International Conference on Cluster Computing
B. David, Rafael Dowsley, R. Katti, Anderson Nascimento (2015)
Efficient Unconditionally Secure Comparison and Privacy Preserving Machine Learning Classification Protocols
P. Geurts, D. Ernst, L. Wehenkel (2006)
Extremely randomized trees
Machine Learning, 63
G. Behera (2011)
Privacy preserving C4.5 using Gini index
2011 2nd National Conference on Emerging Trends and Applications in Computer Science
Rafael Dowsley (2016)
Cryptography Based on Correlated Data: Foundations and Practice
Y. Ishai, E. Kushilevitz, Sigurd Meldgaard, Claudio Orlandi, Anat Paskin-Cherniavsky (2013)
On the Power of Correlated Randomness in Secure Computation
M. Cock, Rafael Dowsley, Anderson Nascimento, S. Newman (2015)
Fast, Privacy Preserving Linear Regression over Distributed Datasets based on Pre-Distributed Data
Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security
Ágnes Kiss, M. Naderpour, Jian Liu, N. Asokan, T. Schneider (2019)
SoK: Modular and Efficient Private Decision Tree Evaluation
Proceedings on Privacy Enhancing Technologies, 2019
context of PPML, e.g., [23
Thomas Dietterich (2000)
Ensemble Methods in Machine Learning
Anisha Agarwal, Rafael Dowsley, Nicholas McKinney, Dongrui Wu, Chin-Teng Lin, Martine Cock, A. Nascimento (2019)
Protecting Privacy of Users in Brain-Computer Interface Applications
IEEE Transactions on Neural Systems and Rehabilitation Engineering, 27
Saeed Samet, A. Miri (2008)
Privacy preserving ID3 using Gini Index over horizontally partitioned data
2008 IEEE/ACS International Conference on Computer Systems and Applications
Mark Abspoel, Daniel Escudero, Nikolaj Volgushev (2020)
Secure training of decision trees with continuous attributes
Proceedings on Privacy Enhancing Technologies, 2021
T. Nishide, K. Ohta (2007)
Multiparty Computation for Interval, Equality, and Comparison Without Bit-Decomposition Protocol
Jaideep Vaidya, Chris Clifton, Murat Kantarcioglu, A. Patterson (2005)
Privacy-preserving decision trees over vertically partitioned data
Jonathan Katz (2007)
Universally Composable Multi-party Computation Using Tamper-Proof Hardware
Martine Cock, Rafael Dowsley, Caleb Horst, R. Katti, Anderson Nascimento, W. Poon, Stacey Truex (2019)
Efficient and Private Scoring of Decision Trees, Support Vector Machines and Logistic Regression Models Based on Pre-Computation
IEEE Transactions on Dependable and Secure Computing, 16
Rafael Tonicelli, Anderson Nascimento, Rafael Dowsley, J. Müller-Quade, H. Imai, Goichiro Hanaoka, Akira Otsuka (2014)
Information-theoretically secure oblivious polynomial evaluation in the commodity-based model
International Journal of Information Security, 14
Fabian Pedregosa, G. Varoquaux, Alexandre Gramfort, V. Michel, B. Thirion, O. Grisel, Mathieu Blondel, Gilles Louppe, P. Prettenhofer, Ron Weiss, Ron Weiss, J. Vanderplas, Alexandre Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay (2011)
Scikit-learn: Machine Learning in Python
ArXiv, abs/1201.0490
Chuan Guo, Awni Hannun, Brian Knott, L. Maaten, M. Tygert, Ruiyu Zhu (2020)
Secure multiparty computations in floating-point arithmetic
ArXiv, abs/2001.03192
Donald Beaver (1997)
Commodity-based cryptography (extended abstract)
Rafael Dowsley, J. Müller-Quade, Akira Otsuka, Goichiro Hanaoka, H. Imai, Anderson Nascimento (2011)
Universally Composable and Statistically Secure Verifiable Secret Sharing Scheme Based on Pre-Distributed Data
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 94-A
L. Breiman (2001)
Random Forests
Machine Learning, 45
Paulo Barreto, B. David, Rafael Dowsley, Kirill Morozov, Anderson Nascimento (2017)
A Framework for Efficient Adaptively Secure Composable Oblivious Transfer in the ROM
IACR Cryptol. ePrint Arch., 2017
R. Canetti, M. Fischlin (2001)
Universally Composable Commitments
IACR Cryptol. ePrint Arch., 2001
J. Quinlan (1986)
Induction of Decision Trees
Machine Learning, 1
Yanguang Shen, Hui Shao, Li Yang (2009)
Privacy Preserving C4.5 Algorithm over Vertically Distributed Datasets
2009 International Conference on Networks Security, Wireless Communications and Trusted Computing, 2
B. Barak, R. Canetti, J. Nielsen, R. Pass (2004)
Universally composable protocols with relaxed set-up assumptions
45th Annual IEEE Symposium on Foundations of Computer Science
Payman Mohassel, Yupeng Zhang (2017)
SecureML: A System for Scalable Privacy-Preserving Machine Learning
2017 IEEE Symposium on Security and Privacy (SP)

Publisher: de Gruyter
Copyright: © 2022 Samuel Adams et al., published by Sciendo
ISSN: 2299-0984
eISSN: 2299-0984
DOI: 10.2478/popets-2022-0042
Publisher site: See Article on Publisher Site

Abstract

AbstractMost existing Secure Multi-Party Computation (MPC) protocols for privacy-preserving training of decision trees over distributed data assume that the features are categorical. In real-life applications, features are often numerical. The standard “in the clear” algorithm to grow decision trees on data with continuous values requires sorting of training examples for each feature in the quest for an optimal cut-point in the range of feature values in each node. Sorting is an expensive operation in MPC, hence finding secure protocols that avoid such an expensive step is a relevant problem in privacy-preserving machine learning. In this paper we propose three more efficient alternatives for secure training of decision tree based models on data with continuous features, namely: (1) secure discretization of the data, followed by secure training of a decision tree over the discretized data; (2) secure discretization of the data, followed by secure training of a random forest over the discretized data; and (3) secure training of extremely randomized trees (“extra-trees”) on the original data. Approaches (2) and (3) both involve randomizing feature choices. In addition, in approach (3) cut-points are chosen randomly as well, thereby alleviating the need to sort or to discretize the data up front. We implemented all proposed solutions in the semi-honest setting with additive secret sharing based MPC. In addition to mathematically proving that all proposed approaches are correct and secure, we experimentally evaluated and compared them in terms of classification accuracy and runtime. We privately train tree ensembles over data sets with thousands of instances or features in a few minutes, with accuracies that are at par with those obtained in the clear. This makes our solution more efficient than the existing approaches, which are based on oblivious sorting.

Journal

Proceedings on Privacy Enhancing Technologies – de Gruyter

Published: Apr 1, 2022

Keywords: Machine Learning; Privacy; Secure Multi-Party Computation; Decision Tree Ensembles; Random Forest; Training

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Privacy-preserving training of tree ensembles over continuous data

Privacy-preserving training of tree ensembles over continuous data

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Privacy-preserving training of tree ensembles over continuous data

Privacy-preserving training of tree ensembles over continuous data

References (55)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies