LIBRA: a library for reliable distributed applications

Download files
Access & Terms of Use
open access
Altmetric
Abstract
This paper describes libra, a library to support efficient reliable distributed applications. Libra is designed to meet two objectives: to simplify the development of reliable distributed applications, and to achieve fault-tolerance at low run-time cost. The first objective is met by the provision of fault-tolerance transparency and a simple, easy to use high-level message passing interface. Fault-tolerance is provided to applications transparently by libra and is based on distributed consistent checkpointing and rollback-recovery integrated with a user-level network communication protocol. The second objective is met by the use of protocols which minimise communication overhead for taking a consistent distributed checkpoint and catching messages in transit, and impose low overhead in terms of running times. The paper presents measurements backing up these claims.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Ouyang, Jinsong
;
Heiser, Gernot
Supervisor(s)
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
1996
Resource Type
Conference Paper
Degree Type
UNSW Faculty
Files
download pdpta96.pdf 36.27 KB Adobe Portable Document Format
Related dataset(s)