Bio-backfill: A Scheduling Policy Enhancing the Performance of Bioinformatics Workflows in Shared Clusters

Ferran Badosa, Antonio Espinosa, Gonzalo Vera, Ana Ripoll

2018

Abstract

In this work we present the bio-backfill scheduler, a backfill scheduler for bioinformatics workflows applications running on shared, heterogeneous clusters. Backfill techniques advance low-priority jobs in cluster queues, if doing so doesn't delay higher-priority jobs. They improve the resource utilization and turnaround achieved with classical policies such as First Come First Served, Longest Job First.. When attempting to implement backfill techniques such as Firstfit or Bestfit on bioinformatics workflows, we have found several issues. Backfill requires runtime predictions, which is particularly difficult for bioinformatics applications. Their performance varies substantially depending on input datasets and the values of its many configuration parameters. Furthermore, backfill approaches are mainly intended to schedule independent, rather than dependent tasks as those forming workflows. Backfilled jobs are chosen upon its number of processors and length runtime, but not by considering the amount of slowdown when the Degree of Multiprogramming of the nodes is greater than 1. To tackle these issues, we developed the bio-backfill scheduler. Based on a predictor generating performance predictions of each job with multiple resources, and a resource-sharing model that minimizes slowdown, we designed a scheduling algorithm capable of backfilling bioinformatics workflows applications. Our experiments show that our proposal can improve average workflow turnaround by roughly 9\% by and resource utilization by almost 4\%, compared to popular backfill strategies such as Firstfit or BestFit.

Download


Paper Citation


in Harvard Style

Badosa F., Espinosa A., Vera G. and Ripoll A. (2018). Bio-backfill: A Scheduling Policy Enhancing the Performance of Bioinformatics Workflows in Shared Clusters.In Proceedings of the 3rd International Conference on Complexity, Future Information Systems and Risk - Volume 1: COMPLEXIS, ISBN 978-989-758-297-4, pages 148-156. DOI: 10.5220/0006812901480156

in Bibtex Style

@conference{complexis18,
author={Ferran Badosa and Antonio Espinosa and Gonzalo Vera and Ana Ripoll},
title={Bio-backfill: A Scheduling Policy Enhancing the Performance of Bioinformatics Workflows in Shared Clusters},
booktitle={Proceedings of the 3rd International Conference on Complexity, Future Information Systems and Risk - Volume 1: COMPLEXIS,},
year={2018},
pages={148-156},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006812901480156},
isbn={978-989-758-297-4},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 3rd International Conference on Complexity, Future Information Systems and Risk - Volume 1: COMPLEXIS,
TI - Bio-backfill: A Scheduling Policy Enhancing the Performance of Bioinformatics Workflows in Shared Clusters
SN - 978-989-758-297-4
AU - Badosa F.
AU - Espinosa A.
AU - Vera G.
AU - Ripoll A.
PY - 2018
SP - 148
EP - 156
DO - 10.5220/0006812901480156