Part of Speech tagset for French language

Introduction

French TreeTagger part-of-speech tagset is available in French corpora annotated by the tool TreeTagger that was developed by Helmut Schmid in the Textual corpora project at the Institute for Computational Linguistics of the University of Stuttgart.

A part of speech (POS) tagset for the French language defines the different categories or types of words that can be used in sentences and how they should be labelled when processed by linguistic tools like part-of-speech taggers. The French language tagset can be quite detailed with variations depending on the linguistic framework used (e.g., Universal Dependencies, Treebank POS tags, etc.). Here’s an overview of the main categories found in a typical French POS tagset:

1. Nouns (N)

  • NN: Common noun (e.g., chat, maison).
  • NNS: Plural noun (e.g., chats, maisons).
  • NC: Countable noun.
  • NOMPROP: Proper noun (e.g., Paris, Marie).

2. Pronouns (PRON)

  • PRP: Personal pronoun (e.g., je, tu, il).
  • PRP$: Possessive pronoun (e.g., mon, ton, son).
  • REFL: Reflexive pronoun (e.g., se, me).

3. Verbs (V)

  • VB: Base form of a verb (infinitive, e.g., manger).
  • VBD: Past tense verb (e.g., mangait).
  • VBG: Gerund/participle form (e.g., mangeant).
  • VBN: Past participle (e.g., mangé).
  • VBP: Present tense verb, non-3rd person singular (e.g., je mange).
  • VBZ: 3rd person singular present (e.g., il mange).

4. Adjectives (ADJ)

  • JJ: Adjective (e.g., grand, beau).
  • JJR: Comparative adjective (e.g., plus grand).
  • JJS: Superlative adjective (e.g., le plus grand).

5. Adverbs (ADV)

  • RB: Adverb (e.g., rapidement, très).
  • RBR: Comparative adverb (e.g., plus rapidement).
  • RBS: Superlative adverb (e.g., le plus rapidement).

6. Determiners (DET)

  • DT: Determiner (e.g., le, une).
  • PDT: Predeterminer (e.g., quelques).
  • WDT: Wh-determiner (e.g., quel).

7. Prepositions (PREP)

  • IN: Preposition (e.g., dans, avec).

8. Conjunctions (CONJ)

  • CC: Coordinating conjunction (e.g., et, mais).
  • IN: Subordinating conjunction (e.g., que, si).

9. Interjections (INTJ)

  • UH: Interjection (e.g., oh, aïe).

10. Auxiliary Verbs (AUX)

  • AUX: Auxiliary verb (e.g., être, avoir).
  • AUXP: Auxiliary in the past participle (e.g., était).

11. Numbers (NUM)

  • CD: Cardinal number (e.g., un, deux).
  • OD: Ordinal number (e.g., premier, deuxième).

12. Symbols and Punctuation (SYM, PUNCT)

  • SYM: Symbol (e.g., &, %, $).
  • PUNCT: Punctuation (e.g., ., !, ?).

13. Other

  • X: Other, not classified (often used for words or expressions not fitting standard categories).
  • FW: Foreign word (e.g., pizza, souvenir).

Tagset Variations

Different linguistic resources might have slight variations in the POS tagset. For example:

  • Universal Dependencies (UD): A standardized tagset for many languages, including French, which simplifies POS labels (e.g., NOUN, VERB, ADJ, ADV).
  • French Treebank: A detailed POS tagset that includes specific distinctions between types of nouns, verbs, and modifiers.
  • CLAWS (Constituent Labeled Annotated Word Sense): Used for more nuanced tagging, often in research and language technology development.

When working with French language processing, it’s essential to choose a tagset that matches your linguistic analysis needs or the specific tool you’re using. An Example of a tag in the CQL concordance search box[tag="VER:cond"] searches all verb conditionals, e.g. serait, pourrait (note: please make sure that you use straight double quotation marks)

French TreeTagger part-of-speech tagset

French Language Tagset

Tag Description
ABR abreviation
ADJ adjective
ADV adverb
DET:ART article
DET:POS possessive pronoun (ma, ta, …)
INT interjection
KON conjunction
NAM proper name
NOM noun
NUM numeral
PRO pronoun
PRO:DEM demonstrative pronoun
PRO:IND indefinite pronoun
PRO:PER personal pronoun
PRO:POS possessive pronoun (mien, tien, …)
PRO:REL relative pronoun
PRP preposition
PRP:det preposition plus article (au,du,aux,des)
PUN punctuation
PUN:cit punctuation citation
SENT sentence tag
SYM symbol
VER:cond verb conditional
VER:futu verb futur
VER:impe verb imperative
VER:impf verb imperfect
VER:infi verb infinitive
VER:pper verb past participle
VER:ppre verb present participle
VER:pres verb present
VER:simp verb simple past
VER:subi verb subjunctive imperfect
VER:subp verb subjunctive present

Source: https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/french-tagset.html