Russian Error-Annotated
Learner English Corpus

Essays written in
by Russian native speakers

800 pieces of students' writing

More than 200 thousand word tokens

Ten thousand annotated errors

4-level annotation scheme

Annotation levels:

by the grammatical rule violated by the error;

by the supposed cause of the error;

by linguistic and pragmatic damage caused by the error

Distribution of different types of errors in the corpus

The corpus is available under a Creative Commons Attribution-ShareAlike 4.0 International License.

Feel free to download the whole dataset!

Read more in this paper:

Elizaveta Kuzmenko, Andrey Kutuzov (2014)

Russian Error-Annotated Learner English Corpus: a Tool for Computer-Assisted Language Learning

(Proceedings of the third workshop on NLP for computer-assisted language learning at SLTC 2014, Uppsala University)