Part of Speech tagset for French language

Estimated read time 3 min read
Knowledge Sharing

French TreeTagger part-of-speech tagset is available in French corpora annotated by the tool TreeTagger that was developed by Helmut Schmid in the Textual corpora project at the Institute for Computational Linguistics of the University of Stuttgart.

A part of speech (POS) tagset for the French language defines the different categories or types of words that can be used in sentences and how they should be labelled when processed by linguistic tools like part-of-speech taggers. The French language tagset can be quite detailed with variations depending on the linguistic framework used (e.g., Universal Dependencies, Treebank POS tags, etc.). Here’s an overview of the main categories found in a typical French POS tagset:

1. Nouns (N)

  • NN: Common noun (e.g., chat, maison).
  • NNS: Plural noun (e.g., chats, maisons).
  • NC: Countable noun.
  • NOMPROP: Proper noun (e.g., Paris, Marie).

2. Pronouns (PRON)

  • PRP: Personal pronoun (e.g., je, tu, il).
  • PRP$: Possessive pronoun (e.g., mon, ton, son).
  • REFL: Reflexive pronoun (e.g., se, me).

3. Verbs (V)

  • VB: Base form of a verb (infinitive, e.g., manger).
  • VBD: Past tense verb (e.g., mangait).
  • VBG: Gerund/participle form (e.g., mangeant).
  • VBN: Past participle (e.g., mangé).
  • VBP: Present tense verb, non-3rd person singular (e.g., je mange).
  • VBZ: 3rd person singular present (e.g., il mange).

4. Adjectives (ADJ)

  • JJ: Adjective (e.g., grand, beau).
  • JJR: Comparative adjective (e.g., plus grand).
  • JJS: Superlative adjective (e.g., le plus grand).

5. Adverbs (ADV)

  • RB: Adverb (e.g., rapidement, très).
  • RBR: Comparative adverb (e.g., plus rapidement).
  • RBS: Superlative adverb (e.g., le plus rapidement).

6. Determiners (DET)

  • DT: Determiner (e.g., le, une).
  • PDT: Predeterminer (e.g., quelques).
  • WDT: Wh-determiner (e.g., quel).

7. Prepositions (PREP)

  • IN: Preposition (e.g., dans, avec).

8. Conjunctions (CONJ)

  • CC: Coordinating conjunction (e.g., et, mais).
  • IN: Subordinating conjunction (e.g., que, si).

9. Interjections (INTJ)

  • UH: Interjection (e.g., oh, aïe).

10. Auxiliary Verbs (AUX)

  • AUX: Auxiliary verb (e.g., être, avoir).
  • AUXP: Auxiliary in the past participle (e.g., était).

11. Numbers (NUM)

  • CD: Cardinal number (e.g., un, deux).
  • OD: Ordinal number (e.g., premier, deuxième).

12. Symbols and Punctuation (SYM, PUNCT)

  • SYM: Symbol (e.g., &, %, $).
  • PUNCT: Punctuation (e.g., ., !, ?).

13. Other

  • X: Other, not classified (often used for words or expressions not fitting standard categories).
  • FW: Foreign word (e.g., pizza, souvenir).

Tagset Variations

Different linguistic resources might have slight variations in the POS tagset. For example:

  • Universal Dependencies (UD): A standardized tagset for many languages, including French, which simplifies POS labels (e.g., NOUN, VERB, ADJ, ADV).
  • French Treebank: A detailed POS tagset that includes specific distinctions between types of nouns, verbs, and modifiers.
  • CLAWS (Constituent Labeled Annotated Word Sense): Used for more nuanced tagging, often in research and language technology development.

When working with French language processing, it’s essential to choose a tagset that matches your linguistic analysis needs or the specific tool you’re using. An Example of a tag in the CQL concordance search box[tag="VER:cond"] searches all verb conditionals, e.g. serait, pourrait (note: please make sure that you use straight double quotation marks)

French TreeTagger part-of-speech tagset

French Language Tagset

TagDescription
ABRabreviation
ADJadjective
ADVadverb
DET:ARTarticle
DET:POSpossessive pronoun (ma, ta, …)
INTinterjection
KONconjunction
NAMproper name
NOMnoun
NUMnumeral
PROpronoun
PRO:DEMdemonstrative pronoun
PRO:INDindefinite pronoun
PRO:PERpersonal pronoun
PRO:POSpossessive pronoun (mien, tien, …)
PRO:RELrelative pronoun
PRPpreposition
PRP:detpreposition plus article (au,du,aux,des)
PUNpunctuation
PUN:citpunctuation citation
SENTsentence tag
SYMsymbol
VER:condverb conditional
VER:futuverb futur
VER:impeverb imperative
VER:impfverb imperfect
VER:infiverb infinitive
VER:pperverb past participle
VER:ppreverb present participle
VER:presverb present
VER:simpverb simple past
VER:subiverb subjunctive imperfect
VER:subpverb subjunctive present

Source: https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/french-tagset.html

chakir.mahjoubi https://lexsense.net

Knowledge engineer with expertise in natural language processing, Chakir's work experience spans, language corpus creation, software localisation, data lineage, patent translation, glossary creation and statistical analysis of experimentally obtained results.

You May Also Like

More From Author