Use space or arrows to navigate...
800 pieces of students' writing
More than 200 thousand word tokens
Ten thousand annotated errors
4-level annotation scheme
by the grammatical rule violated by the error;
by the supposed cause of the error;
by linguistic and pragmatic damage caused by the error
Distribution of different types of errors in the corpus
The corpus is available under a Creative Commons Attribution-ShareAlike 4.0 International License.
Feel free to download the whole dataset!
Read more in this paper:
Elizaveta Kuzmenko, Andrey Kutuzov (2014)
(Proceedings of the third workshop on NLP for computer-assisted language learning at SLTC 2014, Uppsala University)