EFFICIENT RSS FEED GENERATION FROM HTML PAGES

Jun Wang, Kanji Uchino

2005

Abstract

Although RSS demonstrates a promising solution to track and personalize the flow of new Web information, many of the current Web sites are not yet enabled with RSS feeds. The availability of convenient approaches to “RSSify” existing suitable Web contents has become a stringent necessity. This paper presents EHTML2RSS, an efficient system that translates semi-structured HTML pages to structured RSS feeds, which proposes different approaches based on various features of HTML pages. For the information items with release time, the system provides an automatic approach based on time pattern discovery. Another automatic approach based on repeated tag pattern discovery is applied to convert the regular pages without the time pattern. A semi-automatic approach based on labelling is available to process the irregular pages or specific sections in Web pages according to the user’s requirements. Experimental results show that our system is efficient and effective in facilitating the RSS feed generation.

Download


Paper Citation


in Harvard Style

Wang J. and Uchino K. (2005). EFFICIENT RSS FEED GENERATION FROM HTML PAGES . In Proceedings of the First International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 972-8865-20-1, pages 311-318. DOI: 10.5220/0001230103110318

in Bibtex Style

@conference{webist05,
author={Jun Wang and Kanji Uchino},
title={EFFICIENT RSS FEED GENERATION FROM HTML PAGES},
booktitle={Proceedings of the First International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2005},
pages={311-318},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001230103110318},
isbn={972-8865-20-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the First International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - EFFICIENT RSS FEED GENERATION FROM HTML PAGES
SN - 972-8865-20-1
AU - Wang J.
AU - Uchino K.
PY - 2005
SP - 311
EP - 318
DO - 10.5220/0001230103110318