CoNLL-U format tests
This file contains an informal mixture of tests for various aspects of the CoNLL-U format.
Valid examples:
Multiword token ("haven't")
1 I I PRON PRN Num=Sing|Per=1 2 nsubj _ _
2-3 haven't _ _ _ _ _ _ _ _
2 have have VERB VB Tens=Pres 0 root _ _
3 not not ADV RB _ 2 neg _ _
4 a a DET DT _ 5 det _ _
5 clue clue NOUN NN Num=Sing 2 dobj _ _
6 . . PUNCT . _ 2 punct _ _
Additional dependencies (DEPS field)
1 They they PRON PRN Case=Nom|Num=Plur 2 nsubj 4:nsubj _
2 buy buy VERB VBP Num=Plur|Per=3|Tense=Pres 0 root _ _
3 and and CONJ CC _ 2 cc _ _
4 sell sell VERB VBP Num=Plur|Per=3|Tense=Pres 2 conj _ _
5 books book NOUN NNS Num=Plur 2 dobj 4:dobj _
6 . . PUNCT . _ 2 punct _ _
Multiple sentences
1 LONDRA Londra NOUN _ _ 0 root _ _
2 . . . _ _ 1 punct _ _
# This is a comment
1 Gas gas NOUN _ Gen=M|Num=N 0 root _ _
2-3 dalla _ _ _ _ _ _ _ _
2 da da ADP _ _ 1 adpmod _ _
3 la la DET _ Gen=F|Num=S 4 det _ _
4 statua statua NOUN _ Gen=F|Num=S 2 adpobj _ _
5 . . . _ _ 1 punct _ _
1 Evacuata evacuare VERB _ Gen=F|Mod=P|Num=S 3 partmod _ _
2 la il DET _ Gen=F|Num=S 3 det _ _
3 Tate Tate NOUN _ _ 0 root _ _
4 Gallery Gallery NOUN _ _ 3 mwe _ _
5 . . PUNCT _ _ 3 punct _ _
Multiple sentences and multiword token
# give the toys to the children
1 donner donner VERB _ VerbForm=Inf 0 root _ give
2 les le DET _ Definite=Def|Number=Plur 3 det _ the
3 jouets jouet NOUN _ Gender=Masc|Number=Plur 1 dobj _ toys
4-5 aux _ _ _ _ _ _ _ _
4 à à ADP _ _ 6 case _ to
5 les le DET _ Definite=Def|Number=Plur 6 det _ the
6 enfants enfant NOUN _ Gender=Masc|Number=Plur 1 nmod _ children
# now the parallel English tree
1 give donner VERB _ VerbForm=Inf 0 root _ give
2 the le DET _ Definite=Def|Number=Plur 3 det _ the
3 toys jouet NOUN _ Gender=Masc|Number=Plur 1 dobj _ toys
4 to à ADP _ _ 6 case _ to
5 the le DET _ Definite=Def|Number=Plur 6 det _ the
6 children enfant NOUN _ Gender=Masc|Number=Plur 1 nmod _ children
Sentence labels
# sentence-label 1
1 LONDRA Londra NOUN _ _ 0 root _ _
2 . . . _ _ 1 punct _ _
# sentence-label A
1 Gas gas NOUN _ Gen=M|Num=N 0 root _ _
2 . . . _ _ 1 punct _ _
# sentence-label B4
1 Tate Tate NOUN _ _ 0 root _ _
2 Gallery Gallery NOUN _ _ 1 mwe _ _
3 . . PUNCT _ _ 1 punct _ _
Custom styles
1 They they PRON PRN Case=Nom|Num=Plur 2 nsubj 4:nsubj _
2 buy buy VERB VBP Num=Plur|Per=3|Tense=Pres 0 root _ _
3 and and CONJ CC _ 2 cc _ _
4 sell sell VERB VBP Num=Plur|Per=3|Tense=Pres 2 conj _ _
5 books book NOUN NNS Num=Plur 2 dobj 4:dobj _
6 . . PUNCT . _ 2 punct _ _
Acceptable examples with loose parsing
Otherwise valid, but two spaces instead of single tab as field separator and no terminal newline:
1 I I PRON PRN Num=Sing|Per=1 2 nsubj _ _
2-3 haven't _ _ _ _ _ _ _ _
2 have have VERB VB Tens=Pres 0 root _ _
3 not not ADV RB _ 2 neg _ _
4 a a DET DT _ 5 det _ _
5 clue clue NOUN NN Num=Sing 2 dobj _ _
6 . . PUNCT . _ 2 punct _ _
Non-valid examples:
Non-valid examples from UD tools test cases.ambiguous-feature.conll
# not valid: feature definition is malformed / ambiguous (two "=" characters)
1 non-valid non-valid NOUN SP A=B=C 0 ROOT _ _
duplicate-feature.conll
# not valid: feature name occurs twice
1 non-valid non-valid NOUN SP Gen=M|Gen=M 0 ROOT _ _
duplicate-id.conll
# not valid: IDs must be sequential integers (1, 2, ...)
1 valid valid NOUN SP _ 0 ROOT _ _
1 . . . FS _ 1 p _ _
duplicate-value.conll
# not valid: feature value occurs twice
1 non-valid non-valid NOUN SP Gen=M,M 0 ROOT _ _
empty-head.conll
# not valid: HEAD must not be empty
1 have have VERB VB Tens=Pres root _ _
empty-field.conll
# not valid: no field can be empty.
1 valid NOUN SP _ 0 ROOT _ _
empty-sentence.conll
# not valid: sentences must contain at least one word.
# valid one-word sentence.
1 valid valid NOUN SP _ 0 ROOT _ _
extra-empty-line.conll
# valid one-word sentence.
1 valid valid NOUN SP _ 0 ROOT _ _
# format error: sentences must be separated by exactly one empty line
# valid one-word sentence.
1 valid valid NOUN SP _ 0 ROOT _ _
extra-field.conll
# not valid: 11 TAB-separated fields
1 non-valid non-valid NOUN SP _ 0 ROOT _ _ extra
id-starting-from-2.conll
# valid one-word sentence.
1 valid valid NOUN SP _ 0 ROOT _ _
# not valid: ID must start at 1 for each new sentence
2 valid valid NOUN SP _ 0 ROOT _ _
invalid-deps-id.conll
# not valid: HEAD must reference a valid ID
1 have have VERB VB Tens=Pres 0 root 3:nsubj _
2 . . . FS _ 1 punct _ _
invalid-deps-order.conll
# not valid: DEPS must be sorted by HEAD index.
1 They they PRON PRN Case=Nom|Num=Plur 2 nsubj 4:nsubj|2:xsubj _
2 buy buy VERB VBP Num=Plur|Per=3|Tense=Pres 0 root _ _
3 and and CONJ CC _ 2 cc _ _
4 sell sell VERB VBP Num=Plur|Per=3|Tense=Pres 2 conj _ _
5 books book NOUN NNS Num=Plur 2 dobj 4:dobj _
6 . . PUNCT . _ 2 punct _ _
invalid-deps-syntax.conll
# not valid: DEPS must be 'HEAD:REL' pairs separated by bars ('|')
1 have have VERB VB Tens=Pres 0 root 2 _
2 . . . FS _ 1 punct _ _
invalid-head.conll
# not valid: HEAD must reference a valid ID
1 have have VERB VB Tens=Pres 0 root _ _
2 . . . FS _ 3 punct _ _
invalid-range.conll
# not valid: (first-last) multiword ranges must have first <= last
1 I I PRON PRN Num=Sing|Per=1 2 nsubj _ _
2-1 haven't _ _ _ _ _ _ _ _
2 have have VERB VB Tens=Pres 0 root _ _
3 not not ADV RB _ 2 neg _ _
4 a a DET DT _ 5 det _ _
5 clue clue NOUN NN Num=Sing 2 dobj _ _
6 . . PUNCT . _ 2 punct _ _
lowercase-feature.conll
# not valid: feature names must have format '[A-Z0-9][a-zA-Z0-9]*'
# (see http://universaldependencies.github.io/docs/features.html)
1 non-valid non-valid NOUN SP lower=Nonvalid 0 ROOT _ _
lowercase-value.conll
# not valid: feature values must have format '[A-Z0-9][a-zA-Z0-9]*'
# (see http://universaldependencies.github.io/docs/features.html)
1 non-valid non-valid NOUN SP Lower=nonvalid 0 ROOT _ _
malformed_deps.conll
# This is a comment
1 Gas gas NOUN S Gen=M|Num=N 0 ROOT xxx _
misordered-feature.conll
# not valid: features must be ordered alphabetically (ignoring case)
# (see http://universaldependencies.github.io/docs/features.html)
1 non-valid non-valid NOUN SP XB=True|Xa=True 0 ROOT _ _
misordered-multiword.conll
# not valid: multiword tokens must appear before the first word in their
# range
1 I I PRON PRN Num=Sing|Per=1 2 nsubj _ _
2 have have VERB VB Tens=Pres 0 root _ _
2-3 haven't _ _ _ _ _ _ _ _
3 not not ADV RB _ 2 neg _ _
4 a a DET DT _ 5 det _ _
5 clue clue NOUN NN Num=Sing 2 dobj _ _
6 . . PUNCT . _ 2 punct _ _
misplaced-comment-mid.conll
# not valid: comment lines inside sentences are disallowed.
1 I I PRON PRN Num=Sing|Per=1 2 nsubj _ _
2-3 haven't _ _ _ _ _ _ _ _
# this comment should not be here
2 have have VERB VB Tens=Pres 0 root _ _
3 not not ADV RB _ 2 neg _ _
4 a a DET DT _ 5 det _ _
5 clue clue NOUN NN Num=Sing 2 dobj _ _
6 . . PUNCT . _ 2 punct _ _
misplaced-comment-end.conll
# not valid: comment lines should precede a sentence
1 I I PRON PRN Num=Sing|Per=1 2 nsubj _ _
2-3 haven't _ _ _ _ _ _ _ _
2 have have VERB VB Tens=Pres 0 root _ _
3 not not ADV RB _ 2 neg _ _
4 a a DET DT _ 5 det _ _
5 clue clue NOUN NN Num=Sing 2 dobj _ _
6 . . PUNCT . _ 2 punct _ _
# this comment should not be here as it does not precede a sentence.
missing final newline
1 Gas gas NOUN S Gen=M|Num=N 0 root _ _
multiword-with-pos.conll
# not valid: multiword tokens must have underscore ("_") for all fields
# except FORM.
1 I I PRON PRN Num=Sing|Per=1 2 nsubj _ _
2-3 haven't _ VERB _ _ _ _ _ _
2 have have VERB VB Tens=Pres 0 root _ _
3 not not ADV RB _ 2 neg _ _
4 a a DET DT _ 5 det _ _
5 clue clue NOUN NN Num=Sing 2 dobj _ _
6 . . PUNCT . _ 2 punct _ _
nonsequential-id.conll
# not valid: IDs must be sequential integers (1, 2, ...)
1 valid valid NOUN SP _ 0 ROOT _ _
3 . . . FS _ 1 p _ _
overlapping-multiword.conll
# not valid: multiword token ranges may not overlap
1 I I PRON PRN Num=Sing|Per=1 2 nsubj _ _
2-3 haven't _ _ _ _ _ _ _ _
2 have have VERB VB Tens=Pres 0 root _ _
3-4 nota _ _ _ _ _ _ _ _
3 not not ADV RB _ 2 neg _ _
4 a a DET DT _ 5 det _ _
5 clue clue NOUN NN Num=Sing 2 dobj _ _
6 . . PUNCT . _ 2 punct _ _
space-in-field.conll
# not valid: no field can contain space.
1 not valid valid NOUN SP _ 0 ROOT _ _
token_with_cols_filled.conll
# (TODO: is this the same general case as mutiword-with-pos.conll?)
# This is a comment
1 Gas gas NOUN S Gen=M|Num=N 0 ROOT _ _
2-3 dalla dalla _ _ _ 0 ROOT _ _
2 da da ADP EA _ 1 adpmod _ _
3 la la DET RD Gen=F|Num=S 4 det _ _
4 statua statua NOUN S Gen=F|Num=S 2 adpobj _ _
5 . . . FS _ 1 p _ _
trailing-tab.conll
# not valid: extra TAB before newline
1 non-valid non-valid NOUN SP _ 0 ROOT _ _