Access the full text.
Sign up today, get DeepDyve free for 14 days.
Abstract Studies of Internet censorship rely on an experimental technique called probing. From a client within each country under investigation, the experimenter attempts to access network resources that are suspected to be censored, and records what happens. The set of resources to be probed is a crucial, but often neglected, element of the experimental design. We analyze the content and longevity of 758,191 webpages drawn from 22 different probe lists, of which 15 are alleged to be actual blacklists of censored webpages in particular countries, three were compiled using a priori criteria for selecting pages with an elevated chance of being censored, and four are controls. We find that the lists have very little overlap in terms of specific pages. Mechanically assigning a topic to each page, however, reveals common themes, and suggests that handcurated probe lists may be neglecting certain frequently censored topics. We also find that pages on controversial topics tend to have much shorter lifetimes than pages on uncontroversial topics. Hence, probe lists need to be continuously updated to be useful. To carry out this analysis, we have developed automated infrastructure for collecting snapshots of webpages, weeding out irrelevant material (e.g. site “boilerplate” and parked domains), translating text, assigning topics, and detecting topic changes. The system scales to hundreds of thousands of pages collected.
Proceedings on Privacy Enhancing Technologies – de Gruyter
Published: Jan 1, 2017
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.