p(i), where N is the number of the most frequently accessed data items (e.g.,
files) in the cache. Thus,
(N ) =
where C = (
The temporal locality, (R
(N )), is asymptotically equal to lnN (when = 1)
(when 0 < < 1) . The above equation implies that the cache hit ratio
is determined by two parameters; 1) the number of web pages in the cache, and 2) the
slope of log log plot of the trace file, (i.e., ).
Figure 5.6 presents the cache hit ratio obtained using the traces and the equation
5.3 as a function of cache size. From this figure, we observe that the temporal locality of
the trace files and the temporal locality obtained from Equation (5.3) are pretty close.
In other words, the temporal locality of traces follow the Zipf distribution (using the
listed in Table 5.4). The PSU trace shows the highest locality among three traces.
Figure 5.6(c) depicts that less than 512MBytes is enough to provide more than 95%
cache hit ratio. This is because the frequently accessed data items are concentrated on a
small part of the whole data set, which shows very little sign of the heavy tail property,
even though the size of the PSU data set is more than 2 GBytes. Figure 5.6(a) shows
that the CSE trace has about 87% and 98% cache hit ratios when the cache size is set at
1 GBytes and 2 GBytes, respectively. However, the UCB trace shows very low temporal
locality compared with the other trace files. With 1 GBytes data cache, only 70% cache
hit ratio is achieved.