French TreeTagger part-of-speech tagset is available in French corpora annotated by the tool TreeTagger that was developed by Helmut Schmid in the Textual corpora project at the Institute for Computational Linguistics of the University of Stuttgart.
A part of speech (POS) tagset for the French language defines the different categories or types of words that can be used in sentences and how they should be labelled when processed by linguistic tools like part-of-speech taggers. The French language tagset can be quite detailed with variations depending on the linguistic framework used (e.g., Universal Dependencies, Treebank POS tags, etc.). Here’s an overview of the main categories found in a typical French POS tagset:
1. Nouns (N)
- NN: Common noun (e.g., chat, maison).
- NNS: Plural noun (e.g., chats, maisons).
- NC: Countable noun.
- NOMPROP: Proper noun (e.g., Paris, Marie).
2. Pronouns (PRON)
- PRP: Personal pronoun (e.g., je, tu, il).
- PRP$: Possessive pronoun (e.g., mon, ton, son).
- REFL: Reflexive pronoun (e.g., se, me).
3. Verbs (V)
- VB: Base form of a verb (infinitive, e.g., manger).
- VBD: Past tense verb (e.g., mangait).
- VBG: Gerund/participle form (e.g., mangeant).
- VBN: Past participle (e.g., mangé).
- VBP: Present tense verb, non-3rd person singular (e.g., je mange).
- VBZ: 3rd person singular present (e.g., il mange).
4. Adjectives (ADJ)
- JJ: Adjective (e.g., grand, beau).
- JJR: Comparative adjective (e.g., plus grand).
- JJS: Superlative adjective (e.g., le plus grand).
5. Adverbs (ADV)
- RB: Adverb (e.g., rapidement, très).
- RBR: Comparative adverb (e.g., plus rapidement).
- RBS: Superlative adverb (e.g., le plus rapidement).
6. Determiners (DET)
- DT: Determiner (e.g., le, une).
- PDT: Predeterminer (e.g., quelques).
- WDT: Wh-determiner (e.g., quel).
7. Prepositions (PREP)
- IN: Preposition (e.g., dans, avec).
8. Conjunctions (CONJ)
- CC: Coordinating conjunction (e.g., et, mais).
- IN: Subordinating conjunction (e.g., que, si).
9. Interjections (INTJ)
- UH: Interjection (e.g., oh, aïe).
10. Auxiliary Verbs (AUX)
- AUX: Auxiliary verb (e.g., être, avoir).
- AUXP: Auxiliary in the past participle (e.g., était).
11. Numbers (NUM)
- CD: Cardinal number (e.g., un, deux).
- OD: Ordinal number (e.g., premier, deuxième).
12. Symbols and Punctuation (SYM, PUNCT)
- SYM: Symbol (e.g., &, %, $).
- PUNCT: Punctuation (e.g., ., !, ?).
13. Other
- X: Other, not classified (often used for words or expressions not fitting standard categories).
- FW: Foreign word (e.g., pizza, souvenir).
Tagset Variations
Different linguistic resources might have slight variations in the POS tagset. For example:
- Universal Dependencies (UD): A standardized tagset for many languages, including French, which simplifies POS labels (e.g.,
NOUN
,VERB
,ADJ
,ADV
). - French Treebank: A detailed POS tagset that includes specific distinctions between types of nouns, verbs, and modifiers.
- CLAWS (Constituent Labeled Annotated Word Sense): Used for more nuanced tagging, often in research and language technology development.
When working with French language processing, it’s essential to choose a tagset that matches your linguistic analysis needs or the specific tool you’re using. An Example of a tag in the CQL concordance search box: [tag="VER:cond"]
searches all verb conditionals, e.g. serait, pourrait (note: please make sure that you use straight double quotation marks)
French TreeTagger part-of-speech tagset
French Language Tagset
Tag | Description |
ABR | abreviation |
ADJ | adjective |
ADV | adverb |
DET:ART | article |
DET:POS | possessive pronoun (ma, ta, …) |
INT | interjection |
KON | conjunction |
NAM | proper name |
NOM | noun |
NUM | numeral |
PRO | pronoun |
PRO:DEM | demonstrative pronoun |
PRO:IND | indefinite pronoun |
PRO:PER | personal pronoun |
PRO:POS | possessive pronoun (mien, tien, …) |
PRO:REL | relative pronoun |
PRP | preposition |
PRP:det | preposition plus article (au,du,aux,des) |
PUN | punctuation |
PUN:cit | punctuation citation |
SENT | sentence tag |
SYM | symbol |
VER:cond | verb conditional |
VER:futu | verb futur |
VER:impe | verb imperative |
VER:impf | verb imperfect |
VER:infi | verb infinitive |
VER:pper | verb past participle |
VER:ppre | verb present participle |
VER:pres | verb present |
VER:simp | verb simple past |
VER:subi | verb subjunctive imperfect |
VER:subp | verb subjunctive present |
Source: https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/french-tagset.html