Metrics for Clustering Comparison in Bioinformatics
Giovanni Rossi
2016
Abstract
Developing from a concern in bioinformatics, this work analyses alternative metrics between partitions. From both theoretical and applicative perspectives, a useful and interesting distance between any two partitions is HD, which counts the number of atoms finer than either one but not both. While faithfully reproducing the traditional Hamming distance between subsets, HD is very sensible and computable through scalar products between Boolean vectors. It properly deals with complements and axiomatically resembles the entropy-based variation of information VI distance. Entire families of metrics (including HD and VI) obtain as minimal paths in the weighted graph given by the Hasse diagram: submodular weighting functions yield path-based distances visiting the join (of any two partitions), whereas supermodularity leads to visit the meet. This yields an exact (rather than heuristic) approach to the consensus partition (combinatorial optimization) problem.
DownloadPaper Citation
in Harvard Style
Rossi G. (2016). Metrics for Clustering Comparison in Bioinformatics . In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-173-1, pages 299-308. DOI: 10.5220/0005707102990308
in Bibtex Style
@conference{icpram16,
author={Giovanni Rossi},
title={Metrics for Clustering Comparison in Bioinformatics},
booktitle={Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2016},
pages={299-308},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005707102990308},
isbn={978-989-758-173-1},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Metrics for Clustering Comparison in Bioinformatics
SN - 978-989-758-173-1
AU - Rossi G.
PY - 2016
SP - 299
EP - 308
DO - 10.5220/0005707102990308