Japanese Text Classification by Character-level Deep ConvNets and Transfer Learning
Minato Sato, Ryohei Orihara, Yuichi Sei, Yasuyuki Tahara, Akihiko Ohsuga
2017
Abstract
Temporal (one-dimensional) Convolutional Neural Network (Temporal CNN, ConvNet) is an emergent technology for text understanding. The input for the ConvNets could be either a sequence of words or a sequence of characters. In the latter case there are no needs for natural language processing that depends on a language such as morphological analysis. Past studies showed that the character-level ConvNets worked well for news category classification and sentiment analysis / classification tasks in English and romanized Chinese text corpus. In this article we apply the character-level ConvNets to Japanese text understanding. We also attempt to reuse meaningful representations that are learned in the ConvNets from a large-scale dataset in the form of transfer learning, inspired by its success in the field of image recognition. As for the application to the news category classification and the sentiment analysis and classification tasks in Japanese text corpus, the ConvNets outperformed N-gram-based classifiers. In addition, our ConvNets transfer learning frameworks worked well for a task which is similar to one used for pre-training.
DownloadPaper Citation
in Harvard Style
Sato M., Orihara R., Sei Y., Tahara Y. and Ohsuga A. (2017). Japanese Text Classification by Character-level Deep ConvNets and Transfer Learning . In Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-220-2, pages 175-184. DOI: 10.5220/0006193401750184
in Bibtex Style
@conference{icaart17,
author={Minato Sato and Ryohei Orihara and Yuichi Sei and Yasuyuki Tahara and Akihiko Ohsuga},
title={Japanese Text Classification by Character-level Deep ConvNets and Transfer Learning},
booktitle={Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2017},
pages={175-184},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006193401750184},
isbn={978-989-758-220-2},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Japanese Text Classification by Character-level Deep ConvNets and Transfer Learning
SN - 978-989-758-220-2
AU - Sato M.
AU - Orihara R.
AU - Sei Y.
AU - Tahara Y.
AU - Ohsuga A.
PY - 2017
SP - 175
EP - 184
DO - 10.5220/0006193401750184