Design, Implementation, and Performance Evaluation of a Cost-Effective Fault-Tolerant Parallel Virtual File System. Yifeng Zhu, Hong Jiang, Xiao Qin, Dan Feng, and David Swanson. in Proceeding of International Workshop on Storage Network Architecture and Parallel I/Os, in conjunctions with 12th International Conference on Parallel Architectures and Compilation Techniques, New Orleans, LA, Sept. 27 - Oct. 1, 2003.

Abstract
Fault tolerance is one of the most important issues for parallel file systems. This paper presents the design, implementation and performance evaluation of a cost-effective, fault-tolerant parallel virtual file system (CEFT-PVFS) that provides parallel I/O service without requiring any additional hardware by utilizing existing commodity disks on cluster nodes and incorporates fault tolerance in the form of disk mirroring. While mirroring is a straightforward idea, we have implemented this open source system and conducted extensive experiments to evaluate the feasibility, efficiency and scalability of this fault tolerant approach on one of the current largest clusters, where the issues of data consistency and recovery are also investigated. Four mirroring protocols are proposed, reflecting whether the fault-tolerant operations are client driven or server driven; synchronous or asynchronous. Their relative merits are assessed by comparing their write performances, measured in the real systems, and their reliability and availability measures, obtained through analytical modeling. The results indicate that, in cluster environments, mirroring can improve the reliability by a factor of over 40 (4000%) while sacrificing the peak write performance by 33-58% when both systems are of identical sizes (i.e., counting the 50% mirroring disks in the mirrored system). In addition, protocols with higher peak write performance are less reliable than those with lower peak write performance, with the latter achieving a higher reliability and availability at the expense of some write bandwidth. A hybrid protocol is proposed to optimize this tradeoff.

BibTeX Entry
  @InProceedings{yzhu:snapi03,
author = "Yifeng Zhu and Hong Jiang and Xiao Qin and Dan Feng and David Swanson",
title = "{Design}, Implementation, and Performance Evaluation of a Cost-Effective Fault-Tolerant Parallel Virtual File System",
booktitle = "Proceedings of the International Workshop on Storage Network Architecture and Parallel {I/O}s, in conjunctions with 12th International Conference on Parallel Architectures and Compilation Techniques",
location = "New Orleans, LA",
year = "2003",
month = sep
}

Full Paper
 
Last modified on October 16, 2003