On using Longer RNA-seq Reads to Improve Transcript Prediction Accuracy

Anna Kuosmanen, Ahmed Sobih, Romeo Rizzi, Veli Mäkinen, Alexandru I. Tomescu

2016

Abstract

Over the past decade, sequencing read length has increased from tens to hundreds and then to thousands of bases. Current cDNA synthesis methods prevent RNA-seq reads from being long enough to entirely capture all the RNA transcripts, but long reads can still provide connectivity information on chains of multiple exons that are included in transcripts. We demonstrate that exploiting full connectivity information leads to significantly higher prediction accuracy, as measured by the F-score. For this purpose we implemented the solution to the Minimum Path Cover with Subpath Constraints problem introduced in (Rizzi et al., 2014), which is an extension of the classical Minimum Path Cover problem and was shown solvable by min-cost flows. We show that, under hypothetical conditions of perfect sequencing, our approach is able to use long reads more effectively than two state-of-the-art tools, StringTie and FlipFlop. Even in this setting the problem is not trivial, and errors in the underlying flow graph introduced by sequencing and alignment errors complicate the problem further. As such our work also demonstrates the need for a development of a good spliced read aligner for long reads. Our proof-of-concept implementation is available at http://www.cs.helsinki.fi/en/gsa/traphlor.

Download


Paper Citation


in Harvard Style

Kuosmanen A., Sobih A., Rizzi R., Mäkinen V. and Tomescu A. (2016). On using Longer RNA-seq Reads to Improve Transcript Prediction Accuracy . In Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2016) ISBN 978-989-758-170-0, pages 272-277. DOI: 10.5220/0005819702720277

in Bibtex Style

@conference{bioinformatics16,
author={Anna Kuosmanen and Ahmed Sobih and Romeo Rizzi and Veli Mäkinen and Alexandru I. Tomescu},
title={On using Longer RNA-seq Reads to Improve Transcript Prediction Accuracy},
booktitle={Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2016)},
year={2016},
pages={272-277},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005819702720277},
isbn={978-989-758-170-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2016)
TI - On using Longer RNA-seq Reads to Improve Transcript Prediction Accuracy
SN - 978-989-758-170-0
AU - Kuosmanen A.
AU - Sobih A.
AU - Rizzi R.
AU - Mäkinen V.
AU - Tomescu A.
PY - 2016
SP - 272
EP - 277
DO - 10.5220/0005819702720277