Workload Characterization
To study the performance implication of the NIC caching schemes on cluster based
Web servers, we perform a workload characterization of the three traces; the traces of
the CSE department in Penn State University (CSE) [63], UC Berkeley (UCB) [71], and
Penn State University (PSU) [62]. Unlike the prior experimental studies [23, 78], we did
not use a variety of traces such as Clarknet [11], WorldCup98 [10] and NASA [11]. These
workloads have small total data set sizes that are much smaller than the main memory
size of a single node in a modern cluster
. In this situation, the entire data in these
workloads can reside in the main memory of each node and thus have little intra cluster
communication to benefit from NIC caching. The CSE, UCB and PSU traces are chosen
for their large data set sizes (i.e., the data set sizes of the PSU, CSE, and UCB traces
are about 2.04 GBytes, 5 GBytes and 4.6 GBytes, respectively). Moreover, these traces
(except for the UCB trace) are recently gathered
Trace Analysis
Table 5.1 summarizes the statistical characteristics of the three traces. We elim 
inated the unsuccessful requests and requests for the dynamic data items in the traces
in advance. Table 5.1 shows the data set size, the number of requests, the number of
files, the average file size, the median file size, the standard deviation of the file size, the
total bytes of the transferred files, and the average transferred file size. The CSE trace
1The previous experiments used a smaller main memory to run these traces.
2The CSE trace is the most recent trace which is gathered from August 2, 2004 to September
30, 2004, and the PSU trace is obtained from March 31, 2002 to April 7, 2002. While the UCB
trace is gathered from November 1, 1996 to November 17, 1996, we chose it because of its large
data set size.



