Internet Filtering in China in 2004 2005
requested page did not exist (known as a 404 error ),
152
even though we could successfully, and
simultaneously, access the page from outside China. Furthermore, our testing of control sites known to
be accessible in China (for instance, Web sites of state agencies) through proxy servers returned errors for
approximately 20% of our requests. In our proxy testing analysis, we evaluate any site that we could
access from outside China, but not from inside China in the majority of our tests, as blocked. We also
attempted to identify overblocking, where China's filtering prevented access to pages with unrelated
content at domain names and URLs similar to those of sites identified by news reports and previous
testing as containing sensitive content.
Moreover, China's filtering system presents a risk of false positive results during testing. When
a user attempts to access a blocked site, the filtering system resets the user's connection to that site using
a TCP RST packet; subsequently, the system advertises a TCP window size of zero for that Web server.
The user is unable to connect to that server's IP address until the system advertises a window size greater
than zero. Our in state testing included an instance of this condition: a domain that had been accessible
prior to the testing of a URL (www.poets4peace.com/peacehall.htm) containing a potentially targeted
string was blocked after a request for a URL on that site with a prohibited keyword. Even though
subsequent URL requests were for unrelated content, the disruption caused by the zero window size
condition persisted and prevented the tester from reaching those pages.
We conducted testing in several stages, and analyze results by topic (below). The first component
was similar to our high impact list tests in other states we examined, checking a collection of URLs with
content on or domain names including sensitive topics (for example, a page with Falun Gong material, or
a URL such as www.falungong.com). The second testing phase built on our past data on filtering in China
by checking a set of domains known to be of concern. Additionally, we tested URLs containing words or
strings of letters similar to those found in pages on blocked topics (for example, testing URLs containing
the strings falu or flg to probe filtering of Falun Gong content), but that hosted pages with content
unrelated to the sensitive subject. Finally, we tested a long list comprised of the top Google search
results, in English and in Chinese, for keywords known to be sensitive and filtered.
153
D. Comparison of Testing Methods
Due to the complications of testing filtering in China, we sought to combine both proxy testing
and in state testing for this report. We found a 78% correlation between the two methods, with almost all
the sites we were able to access through the proxy also accessible in state. However, only 60% of the sites
identified as blocked during proxy testing were confirmed as such during the in state portion. The
discrepancies were concentrated in the sites tested to determine overblocking (filtering of URLs similar to
those containing targeted content, but with unrelated subject matter): 82% of the blocks during proxy
testing of sites sensitive content were confirmed during in state testing and only 22% of the
corresponding blocks of unrelated content were thus verified. Potential explanations include the difficulty
152
See 404 error, Wikipedia, at http://en.wikipedia.org/wiki/404_error.
153
ONI compiled these keywords from a list of terms blocked by the instant messaging software QQ. See The Words
You Never See in Chinese Cyberspace, China Digital News, at
http://journalism.berkeley.edu/projects/chinadn/en/archives/002885.html#more (Aug. 30, 2004).
22