On Energy-efficient Checkpointing in High-throughput Cycle-stealing Distributed Systems

Matthew Forshaw, A. Stephen McGough, Nigel Thomas

2014

Abstract

Checkpointing is a fault-tolerance mechanism commonly used in High Throughput Computing (HTC) environments to allow the execution of long-running computational tasks on compute resources subject to hardware and software failures and interruptions from resource owners. With increasing scrutiny of the energy consumption of IT infrastructures, it is important to understand the impact of checkpointing on the energy consumption of HTC environments. In this paper we demonstrate through trace-driven simulation on real-world datasets that existing checkpointing strategies are inadequate at maintaining an acceptable level of energy consumption whilst reducing the makespan of tasks. Furthermore, we identify factors important in deciding whether to employ checkpointing within an HTC environment, and propose novel strategies to curtail the energy consumption of checkpointing approaches.

Download


Paper Citation


in Harvard Style

Forshaw M., McGough A. and Thomas N. (2014). On Energy-efficient Checkpointing in High-throughput Cycle-stealing Distributed Systems . In Proceedings of the 3rd International Conference on Smart Grids and Green IT Systems - Volume 1: SMARTGREENS, ISBN 978-989-758-025-3, pages 262-267. DOI: 10.5220/0004958302620267

in Bibtex Style

@conference{smartgreens14,
author={Matthew Forshaw and A. Stephen McGough and Nigel Thomas},
title={On Energy-efficient Checkpointing in High-throughput Cycle-stealing Distributed Systems},
booktitle={Proceedings of the 3rd International Conference on Smart Grids and Green IT Systems - Volume 1: SMARTGREENS,},
year={2014},
pages={262-267},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004958302620267},
isbn={978-989-758-025-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Conference on Smart Grids and Green IT Systems - Volume 1: SMARTGREENS,
TI - On Energy-efficient Checkpointing in High-throughput Cycle-stealing Distributed Systems
SN - 978-989-758-025-3
AU - Forshaw M.
AU - McGough A.
AU - Thomas N.
PY - 2014
SP - 262
EP - 267
DO - 10.5220/0004958302620267