Automatically Determining Versions of Scholarly Articles
DOI:
https://doi.org/10.22230/src.2017v8n1a268Keywords:
Article versions, Document classification, Open access, Tools, Workflow managementAbstract
Background: Repositories of scholarly articles should provide authoritative information about the materials they distribute and should distribute those materials in keeping with pertinent laws. To do so, it is important to have accurate information about the versions of articles in a collection.
Analysis: This article presents a simple statistical model to classify articles as author manuscripts or versions of record, with parameters trained on a collection of articles that have been hand-annotated for version. The algorithm achieves about 94 percent accuracy on average (cross-validated).
Conclusion and implications: The average pairwise annotator agreement among a group of experts was 94 percent, showing that the method developed in this article displays performance competitive with human experts.
Published
Issue
Section
License
SRC embraces online publishing and open access to back issues under the Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 Licence. This license allows users to download an article and share it with others as long as authorship and original publication is acknowledged and a link is made (in electronic media) to the original article. The article can be quoted but not changed and presented differently.
