next up previous
Next: The Metadata Server Overhead Up: Experimental Results on Write Previous: Experimental Environments

Benchmark

A simple benchmark, similar to the one used in Ref. [2][5][23][24], was used to measure the overall concurrent write performance of this parallel file system. Figure 5 gives a simplified MPI program of this benchmark. The overall and raw write throughput are calculated. The overall write throughput includes the overhead of contacting the metadata server while the raw write throughput does not include the open and close time and measures the aggregate throughput of the data servers exclusively. In both measurements, the completion time of the slowest client is considered as the overall completion time. While this benchmark may not reveal complete workload patterns of real applications, it allows a detailed and fair comparison of the performance of PVFS and the four duplication protocols.

Figure 5: Pseudocode of the benchmark
for all clients:
    synchronize with all clients using MPI barrier;
    t1 = current time;
    open a file;
    synchronize with all clients using MPI barrier;
    t2 = current time;
    loop to write data;
    t3 = current time;
    close the file;
    t4 = current time;
    ct1 = t4 - t1; /* overall completion time */
    ct2 = t3 - t2; /* raw completion time */
    send ct1 and ct2 to client 0;

for client 0:
    /* find the slowest client */
    find  maximum of ct1 and ct2 respectively;
    calculate overall write throughput using maximum ct1;
    calculate raw write throughput using maximum ct2;

The aggregate write performance is measured under three server configurations, 8 data servers mirroring 8, 16 data servers mirroring 16, and 32 data servers mirroring 32, respectively. With the metadata servers included, the total numbers of servers in the three configurations become 18, 34 and 66. In the three sets of tests, each client node writes a total amount of 16MB to the servers, i.e., it writes 2MB, 1MB and 0.5MB to each server node respectively, which are the approximate amounts of data written by a node during the checkpointing process of a real astrophysics code [25]. During the measurements, there were other computation applications running on our cluster, which shared the node resources, such as network, memory, processors and I/O devices, with the CEFT-PVFS, and thus the aggregate write performance was probably degraded. In order to reduce the influence of these applications on the performance of these protocols, many measurements were repeated at different times and the average value is calculated after discarding the 5 highest and 5 smallest measurements.


next up previous
Next: The Metadata Server Overhead Up: Experimental Results on Write Previous: Experimental Environments
Yifeng Zhu 2003-10-16