ASSESSING PROGRESSIVE FILTERING TO PERFORM HIERARCHICAL TEXT CATEGORIZATION IN PRESENCE OF INPUT IMBALANCE

Andrea Addis, Giuliano Armano, Eloisa Vargiu

2010

Abstract

The more the amount of available data (e.g., in digital libraries), the greater the need for high-performance text categorization algorithms. So far, the work on text categorization has been mostly focused on “flat” approaches, i.e., algorithms that operate on non-hierarchical classification schemes. Hierarchical approaches are expected to perform better in presence of subsumption ordering among categories. In fact, according to the “divide et impera” strategy, they partition the problem into smaller subproblems, each being expected to be simpler to solve. In this paper, we illustrate and discuss the results obtained by assessing the “Progressive Filtering” (PF) technique, used to perform text categorization. Experiments, on the Reuters Corpus (RCV1- v2) and on DZMOZ datasets, are focused on the ability of PF to deal with input imbalance. In particular, the baseline is: (i) comparing the results to those calculated resorting to the corresponding flat approach; (ii) calculating the improvement of performance while augmenting the pipeline depth; and (iii) measuring the performance in terms of generalization- / specialization- / misclassification-error and unknown-ratio. Experimental results show that, for the adopted datasets, PF is able to counteract great imbalances between negative and positive examples.

Download


Paper Citation


in Harvard Style

Addis A., Armano G. and Vargiu E. (2010). ASSESSING PROGRESSIVE FILTERING TO PERFORM HIERARCHICAL TEXT CATEGORIZATION IN PRESENCE OF INPUT IMBALANCE . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010) ISBN 978-989-8425-28-7, pages 14-23. DOI: 10.5220/0003066300140023

in Bibtex Style

@conference{kdir10,
author={Andrea Addis and Giuliano Armano and Eloisa Vargiu},
title={ASSESSING PROGRESSIVE FILTERING TO PERFORM HIERARCHICAL TEXT CATEGORIZATION IN PRESENCE OF INPUT IMBALANCE},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)},
year={2010},
pages={14-23},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003066300140023},
isbn={978-989-8425-28-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)
TI - ASSESSING PROGRESSIVE FILTERING TO PERFORM HIERARCHICAL TEXT CATEGORIZATION IN PRESENCE OF INPUT IMBALANCE
SN - 978-989-8425-28-7
AU - Addis A.
AU - Armano G.
AU - Vargiu E.
PY - 2010
SP - 14
EP - 23
DO - 10.5220/0003066300140023