Metrics for Clustering Comparison in Bioinformatics

Giovanni Rossi

2016

Abstract

Developing from a concern in bioinformatics, this work analyses alternative metrics between partitions. From both theoretical and applicative perspectives, a useful and interesting distance between any two partitions is HD, which counts the number of atoms finer than either one but not both. While faithfully reproducing the traditional Hamming distance between subsets, HD is very sensible and computable through scalar products between Boolean vectors. It properly deals with complements and axiomatically resembles the entropy-based variation of information VI distance. Entire families of metrics (including HD and VI) obtain as minimal paths in the weighted graph given by the Hasse diagram: submodular weighting functions yield path-based distances visiting the join (of any two partitions), whereas supermodularity leads to visit the meet. This yields an exact (rather than heuristic) approach to the consensus partition (combinatorial optimization) problem.

Download


Paper Citation


in Harvard Style

Rossi G. (2016). Metrics for Clustering Comparison in Bioinformatics . In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-173-1, pages 299-308. DOI: 10.5220/0005707102990308

in Bibtex Style

@conference{icpram16,
author={Giovanni Rossi},
title={Metrics for Clustering Comparison in Bioinformatics},
booktitle={Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2016},
pages={299-308},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005707102990308},
isbn={978-989-758-173-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Metrics for Clustering Comparison in Bioinformatics
SN - 978-989-758-173-1
AU - Rossi G.
PY - 2016
SP - 299
EP - 308
DO - 10.5220/0005707102990308