Japanese Text Classification by Character-level Deep ConvNets and Transfer Learning

Minato Sato, Ryohei Orihara, Yuichi Sei, Yasuyuki Tahara, Akihiko Ohsuga

2017

Abstract

Temporal (one-dimensional) Convolutional Neural Network (Temporal CNN, ConvNet) is an emergent technology for text understanding. The input for the ConvNets could be either a sequence of words or a sequence of characters. In the latter case there are no needs for natural language processing that depends on a language such as morphological analysis. Past studies showed that the character-level ConvNets worked well for news category classification and sentiment analysis / classification tasks in English and romanized Chinese text corpus. In this article we apply the character-level ConvNets to Japanese text understanding. We also attempt to reuse meaningful representations that are learned in the ConvNets from a large-scale dataset in the form of transfer learning, inspired by its success in the field of image recognition. As for the application to the news category classification and the sentiment analysis and classification tasks in Japanese text corpus, the ConvNets outperformed N-gram-based classifiers. In addition, our ConvNets transfer learning frameworks worked well for a task which is similar to one used for pre-training.

Download


Paper Citation


in Harvard Style

Sato M., Orihara R., Sei Y., Tahara Y. and Ohsuga A. (2017). Japanese Text Classification by Character-level Deep ConvNets and Transfer Learning . In Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-220-2, pages 175-184. DOI: 10.5220/0006193401750184

in Bibtex Style

@conference{icaart17,
author={Minato Sato and Ryohei Orihara and Yuichi Sei and Yasuyuki Tahara and Akihiko Ohsuga},
title={Japanese Text Classification by Character-level Deep ConvNets and Transfer Learning},
booktitle={Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2017},
pages={175-184},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006193401750184},
isbn={978-989-758-220-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Japanese Text Classification by Character-level Deep ConvNets and Transfer Learning
SN - 978-989-758-220-2
AU - Sato M.
AU - Orihara R.
AU - Sei Y.
AU - Tahara Y.
AU - Ohsuga A.
PY - 2017
SP - 175
EP - 184
DO - 10.5220/0006193401750184