Access the full text.
Sign up today, get DeepDyve free for 14 days.
Abstract Stylometry is a form of authorship attribution that relies on the linguistic information to attribute documents of unknown authorship based on the writing styles of a suspect set of authors. This paper focuses on the cross-domain subproblem where the known and suspect documents differ in the setting in which they were created. Three distinct domains, Twitter feeds, blog entries, and Reddit comments, are explored in this work. We determine that state-of-the-art methods in stylometry do not perform as well in cross-domain situations (34.3% accuracy) as they do in in-domain situations (83.5% accuracy) and propose methods that improve performance in the cross-domain setting with both feature and classification level techniques which can increase accuracy to up to 70%. In addition to testing these approaches on a large real world dataset, we also examine real world adversarial cases where an author is actively attempting to hide their identity. Being able to identify authors across domains facilitates linking identities across the Internet making this a key security and privacy concern; users can take other measures to ensure their anonymity, but due to their unique writing style, they may not be as anonymous as they believe.
Proceedings on Privacy Enhancing Technologies – de Gruyter
Published: Jul 1, 2016
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.