Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

On the (In)effectiveness of Mosaicing and Blurring as Tools for Document Redaction

On the (In)effectiveness of Mosaicing and Blurring as Tools for Document Redaction Abstract In many online communities, it is the norm to redact names and other sensitive text from posted screenshots. Sometimes solid bars are used; sometimes a blur or other image transform is used. We consider the effectiveness of two popular image transforms - mosaicing (also known as pixelization) and blurring - for redaction of text. Our main finding is that we can use a simple but powerful class of statistical models - so-called hidden Markov models (HMMs) - to recover both short and indefinitely long instances of redacted text. Our approach borrows on the success of HMMs for automatic speech recognition, where they are used to recover sequences of phonemes from utterances of speech. Here we use HMMs in an analogous way to recover sequences of characters from images of redacted text. We evaluate an implementation of our system against multiple typefaces, font sizes, grid sizes, pixel offsets, and levels of noise. We also decode numerous real-world examples of redacted text. We conclude that mosaicing and blurring, despite their widespread usage, are not viable approaches for text redaction. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Proceedings on Privacy Enhancing Technologies de Gruyter

On the (In)effectiveness of Mosaicing and Blurring as Tools for Document Redaction

Loading next page...
 
/lp/de-gruyter/on-the-in-effectiveness-of-mosaicing-and-blurring-as-tools-for-MYaSKnBz5G

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
de Gruyter
Copyright
Copyright © 2016 by the
ISSN
2299-0984
eISSN
2299-0984
DOI
10.1515/popets-2016-0047
Publisher site
See Article on Publisher Site

Abstract

Abstract In many online communities, it is the norm to redact names and other sensitive text from posted screenshots. Sometimes solid bars are used; sometimes a blur or other image transform is used. We consider the effectiveness of two popular image transforms - mosaicing (also known as pixelization) and blurring - for redaction of text. Our main finding is that we can use a simple but powerful class of statistical models - so-called hidden Markov models (HMMs) - to recover both short and indefinitely long instances of redacted text. Our approach borrows on the success of HMMs for automatic speech recognition, where they are used to recover sequences of phonemes from utterances of speech. Here we use HMMs in an analogous way to recover sequences of characters from images of redacted text. We evaluate an implementation of our system against multiple typefaces, font sizes, grid sizes, pixel offsets, and levels of noise. We also decode numerous real-world examples of redacted text. We conclude that mosaicing and blurring, despite their widespread usage, are not viable approaches for text redaction.

Journal

Proceedings on Privacy Enhancing Technologiesde Gruyter

Published: Oct 1, 2016

References