Equation (5.1) is equivalent to log(y) = - log(r). Thus, Zipf's law can be drawn
as a straight line with a slope - in a log log plot. Several studies show that the Zipf
law can capture a skewed distribution of Web requests from a user community [18, 5, 40].
Therefore, the probability that a request is destined for page i, denoted as p(i), can be
obtained from Equation (5.2).
where C = (
where N is the total number of Web data items and all the Web data items are
ordered by their popularity rank . For example, p(1) is probability that a request is
for the most popular Web data item. Here is is a control parameter, which can capture
the popularity skewness of Web content. The larger is, the more skewed access to Web
Figure 5.5 shows the log log plots of the number of references to the Web data
items corresponding to the three traces examined in our study. The Web data items on
the X axis are ordered by the number of accesses. In other words, the most frequently
accessed data item is placed as the first place on the X axis. Table 5.4 shows the
values for those traces, calculated by Principal Component Analysis (PCA). Figure 5.5
and Table 5.4 show that the PSU trace has the highest skewness among the three trace
files, while the UCB trace has very low popularity skewness.
Since we are studying the performance implication of caching schemes, we are in
terested in obtaining expected cache hit ratios corresponding to different access patterns
or traces. From Equation (5.2), we can obtain the cache hit ratio using Zipf distribution