ALL ABOUT MICROTEXT - A Working Definition and a Survey of Current Microtext Research within Artificial Intelligence and Natural Language Processing

Jeffrey Ellen

2011

Abstract

This paper defines a new term, ‘Microtext’, and takes a survey of the most recent and promising research that falls under this new definition. Microtext has three distinct attributes that differentiate it from the traditional free-text or unstructured text considered within the AI and NLP communities. Microtext is text that is generally very short in length, semi-structured, and characterized by amorphous or informal grammar and language. Examples of microtext include chatrooms (such as IM, XMPP, and IRC), SMS, voice transcriptions, and micro-blogging such as Twitter(tm). This paper expands on this definition, and provides some characterizations of typical microtext data. Microtext is becoming more prevalent. It is the thesis of this paper that the three distinct attributes of microtext yield different results and require different techniques than traditional AI and NLP techniques on long-form free text. By creating a working definition for microtext, providing a survey of the current state of research in the area, it is the goal of this paper to create an understanding of microtext within the AI and NLP communities.

Download


Paper Citation


in Harvard Style

Ellen J. (2011). ALL ABOUT MICROTEXT - A Working Definition and a Survey of Current Microtext Research within Artificial Intelligence and Natural Language Processing . In Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-8425-40-9, pages 329-336. DOI: 10.5220/0003179903290336

in Bibtex Style

@conference{icaart11,
author={Jeffrey Ellen},
title={ALL ABOUT MICROTEXT - A Working Definition and a Survey of Current Microtext Research within Artificial Intelligence and Natural Language Processing },
booktitle={Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2011},
pages={329-336},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003179903290336},
isbn={978-989-8425-40-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - ALL ABOUT MICROTEXT - A Working Definition and a Survey of Current Microtext Research within Artificial Intelligence and Natural Language Processing
SN - 978-989-8425-40-9
AU - Ellen J.
PY - 2011
SP - 329
EP - 336
DO - 10.5220/0003179903290336