Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Using Word Embeddings to Deter Intellectual Property Theft through Automated Generation of Fake Documents

Using Word Embeddings to Deter Intellectual Property Theft through Automated Generation of Fake... Theft of intellectual property is a growing problem—one that is exacerbated by the fact that a successful compromise of an enterprise might only become known months after the hack. A recent solution called FORGE addresses this problem by automatically generating N “fake” versions of any real document so that the attacker has to determine which of the N + 1 documents that they have exfiltrated from a compromised network is real. In this article, we remove two major drawbacks in FORGE: (i) FORGE requires ontologies in order to generate fake documents—however, in the real world, ontologies, especially good ontologies, are infrequently available. The WE-FORGE system proposed in this article completely eliminates the need for ontologies by using distance metrics on word embeddings instead. (ii) FORGE generates fake documents by first identifying “target” concepts in the original document and then substituting “replacement” concepts for them. However, we will show that this can lead to sub-optimal results (e.g., as target concepts are selected without knowing the availability and/or quality of the replacement concepts, they can sometimes lead to poor results). Our WE-FORGE system addresses this problem in two possible ways by performing a joint optimization to select concepts and replacements simultaneously. We conduct a human study involving both computer science and chemistry documents and show that WE-FORGE successfully deceives adversaries. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Management Information Systems (TMIS) Association for Computing Machinery

Using Word Embeddings to Deter Intellectual Property Theft through Automated Generation of Fake Documents

Loading next page...
 
/lp/association-for-computing-machinery/using-word-embeddings-to-deter-intellectual-property-theft-through-90dOLIR9P7

References (25)

Publisher
Association for Computing Machinery
Copyright
Copyright © 2021 ACM
ISSN
2158-656X
eISSN
2158-6578
DOI
10.1145/3418289
Publisher site
See Article on Publisher Site

Abstract

Theft of intellectual property is a growing problem—one that is exacerbated by the fact that a successful compromise of an enterprise might only become known months after the hack. A recent solution called FORGE addresses this problem by automatically generating N “fake” versions of any real document so that the attacker has to determine which of the N + 1 documents that they have exfiltrated from a compromised network is real. In this article, we remove two major drawbacks in FORGE: (i) FORGE requires ontologies in order to generate fake documents—however, in the real world, ontologies, especially good ontologies, are infrequently available. The WE-FORGE system proposed in this article completely eliminates the need for ontologies by using distance metrics on word embeddings instead. (ii) FORGE generates fake documents by first identifying “target” concepts in the original document and then substituting “replacement” concepts for them. However, we will show that this can lead to sub-optimal results (e.g., as target concepts are selected without knowing the availability and/or quality of the replacement concepts, they can sometimes lead to poor results). Our WE-FORGE system addresses this problem in two possible ways by performing a joint optimization to select concepts and replacements simultaneously. We conduct a human study involving both computer science and chemistry documents and show that WE-FORGE successfully deceives adversaries.

Journal

ACM Transactions on Management Information Systems (TMIS)Association for Computing Machinery

Published: Feb 2, 2021

Keywords: AI security

There are no references for this article.