results of dcs block soars at the point k = 8. Since the sending and receiving processes
of dcs block always block themselves to yield the CPU to other processes, the response
time of remote cache read requests becomes very long.
Figure 3.6 (b) shows the throughput results for the 32 node configuration. The
first observation from this figure is that the dcs block model yields the worst performance,
although for light loads, all the four models have similar throughput. This is because
remote cache read becomes the performance bottleneck in the the dcs block model due
to the frequent blocking of the communicating processes. The three other models, dcs,
press via and adaptive, show almost the same throughput over the entire workload
except at k = 3, at which the dcs model experiences throughput deterioration. At this
point, many remote cache read requests of dcs compete for the CPU by preempting, thus
the CPU time is wasted from frequent interrupts and context switches. However, the
adaptive model yields better throughput over the entire workloads.
Table 3.4. Average Response Time of Each Request
Next, we further analyze the impact of the coscheduling technique by breaking
down of the latency and throughput results. Table 3.4 shows the average response time