AUTHOR ATTRIBUTION EVALUATION WITH NOVEL TOPIC CROSS-VALIDATION

Andrew I. Schein, Johnnie F. Caver, Randale J. Honaker, Craig H. Martell

2010

Abstract

The practice of using statistical models in predicting authorship (so-called author attribution models) is long established. Several recent authorship attribution studies have indicated that topic-specific cues impact author attribution machine learning models. The arrival of new topics should be anticipated rather than ignored in an author attribution evaluation methodology; a model that relies heavily on topic cues will be problematic in deployment settings where novel topics are common. We develop a protocol and test bed for measuring sensitivity to topic cues using a methodology called novel topic cross-validation. Our methodology performs a cross-validation where only topics unseen in training data are used in the test portion. Analysis of the testing framework suggests that corpora with large numbers of topics lead to more powerful hypothesis testing in novel topic evaluation studies. In order to implement the evaluation metric, we developed two subsets of the New York Times Annotated Corpus including one with 15 authors and 23 topics. We evaluated a maximum entropy classifier in standard and novel topic cross validation in order to compare the mechanics of the two procedures. Our novel topic evaluation framework supports automatic learning of stylometric cues that are topic neutral, and our test bed is reproducible using document identifiers available from the authors.

Download


Paper Citation


in Harvard Style

I. Schein A., F. Caver J., J. Honaker R. and H. Martell C. (2010). AUTHOR ATTRIBUTION EVALUATION WITH NOVEL TOPIC CROSS-VALIDATION . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010) ISBN 978-989-8425-28-7, pages 206-215. DOI: 10.5220/0003088402060215

in Bibtex Style

@conference{kdir10,
author={Andrew I. Schein and Johnnie F. Caver and Randale J. Honaker and Craig H. Martell},
title={AUTHOR ATTRIBUTION EVALUATION WITH NOVEL TOPIC CROSS-VALIDATION},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)},
year={2010},
pages={206-215},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003088402060215},
isbn={978-989-8425-28-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)
TI - AUTHOR ATTRIBUTION EVALUATION WITH NOVEL TOPIC CROSS-VALIDATION
SN - 978-989-8425-28-7
AU - I. Schein A.
AU - F. Caver J.
AU - J. Honaker R.
AU - H. Martell C.
PY - 2010
SP - 206
EP - 215
DO - 10.5220/0003088402060215