because the average file size of the UCB trace is the smallest among the three traces, and
thus, PCI traffic reduction by the inclusive scheme is not very significant. In addition,
since the UCB trace has a low locality as shown in Figures 5.5 and 5.6, the NIC cache
only yields very low cache hit ratio. Due to the overhead of the NIC cache replacement,
Figure 5.11 shows 1% and 2 % less throughput with 2 and 3 nodes, respectively, which is
consistent with a previous study . For the PSU trace, the inclusive scheme shows a
marginal performance improvement (up to 7%), since the trace has the highest locality,
and this incurs less remote node accesses.
Impact of Intra Cluster Communication Bandwidth
The I/O bus, connecting the different components of the cluster based Web server,
is considered a major performance bottleneck. Intel proposed the PCI bus as the I/O
architecture back in 1992. At its inception, the 32 bit and 33MHz PCI bus with 133
MBytes/sec were expected to meet the I/O bandwidth requirement. However, with dras
tic advent of the processor and memory technology, the industry has increased the band
width of PCI bus from 32 bit and 33MHz to 64 bit and 33MHz (with 266Mbytes/sec).
Moreover, PCI X has been proposed with 532MBytes/sec and 800MBytes/sec. To meet
requirement of more I/O bandwidth, PCI Express has recently been proposed to pro
vide better bandwidth and shorter latency compared to PCI X and PCI. Finally, PCI
Express x1 provides 5Gbps peak raw data transfer rate and its link bandwidth can be
scaled up to x32. Additionally, as another effort to alleviate the I/O bottleneck, the
future InfiniBand based NICs are designed for direct connection to the memory bus ,
although most of them are currently connected to the PCI bus.