Parts of Speech Tagging With NLTK

Estimated read time 6 min read

In corpus linguistics, POS Tagging (Parts of Speech Tagging) also called grammaticaltaggingis a process of marking up words in text format for a particular part of a speech based on its definition and context. It is responsible for text reading in a language and assigning some specific token (Parts of Speech) to each word. POS-tagging algorithms fall into two distinctive groups: rule-based and stochastic. E. Brill’s tagger, one of the first and most widely used English POS-taggers, employs rule-based algorithms. Parts of speech tagging can be important for syntactic and semantic analysis.

Rule Based POS Tagging

One of the oldest techniques of tagging is rule-based POS tagging. Rule-based taggers use a dictionary or lexicon for getting possible tags for each word. If the word has more than one possible tag, then rule-based taggers use hand-written rules to identify the correct tag. Disambiguation can also be performed in rule-based tagging by analyzing the linguistic features of a word along with its preceding as well as following words. For example, suppose if the preceding word of a word is an article then word must be a noun.

Stochastic POS Tagging

Another technique of tagging is Stochastic POS Tagging. Now, the question that arises here is which model can be stochastic. The model that includes frequency or probability (statistics) can be called stochastic. Any number of different approaches to the problem of part-of-speech tagging can be referred to as stochastic tagger. The simplest stochastic tagger applies the following approaches for POS tagging:

Word Frequency Approach

In this approach, the stochastic taggers disambiguate the words based on the probability that a word occurs with a particular tag. We can also say that the tag encountered most frequently with the word in the training set is the one assigned to an ambiguous instance of that word.

Tag Sequence Probalities

It is another approach of stochastic tagging, where the tagger calculates the probability of a given sequence of tags occurring. It is also called n-gram approach. It is called so because the best tag for a given word is determined by the probability at which it occurs with the n previous tags.

Part-of-speech tagging is harder than just having a list of words and their parts of speech, because some words can represent more than one part of speech at different times, and because some parts of speech are complex or unspoken, a large percentage of word-forms are ambiguous. For example, even “dogs”, which is usually thought of as just a plural noun, can also be a verb:

The sailor dogs the hatch

Correct grammatical tagging will reflect that “dogs” is here used as a verb, not as the more common plural noun. Grammatical context is one way to determine this; semantic analysis can also be used to infer that “sailor” and “hatch” implicate “dogs” as 1) in the nautical context and 2) an action applied to the object “hatch” (in this context, “dogs” is a nautical term meaning “fastens (a watertight door securely.

So, for something like the sentence above the word can has several semantic meanings. One being a model for question formation, another being a container for holding food or liquid, and yet another being a verb denoting the ability to do something.

Let’s learn with a NLTK Part of Speech example:

POS tag list:

CC coordinating conjunction

CD cardinal digit

DT determiner

IN preposition/subordinating conjunction

JJ adjective ‘big’

JJR adjective, comparative ‘bigger’

JJS adjective, superlative ‘biggest’

MD modal could, will

NN noun, singular ‘desk’

NNS noun plural ‘desks’

NNP proper noun, singular ‘Harrison’

NNPS proper noun, plural ‘Americans’

PRP personal pronoun I, he, she

PRP$ possessive pronoun my, his, hers

RB adverb very, silently,

RBR adverb, comparative better

RBS adverb, superlative best

UH interjection errrrrrrrm

VB verb, base form take

WRB wh-abverb where, when

Ullamcorper erat dictumst vivamus. Nec feugiat natoque habitasse habitasse varius habitant ornare. Nonummy molestie quisque praesent sollicitudin varius tortor libero, proin massa integer.Imperdiet orci aliquet ullamcorper diam euismod per et conubia. Ornare proin sem interdum volutpat tortor habitasse arcu nisi magnis diam. Sagittis, nullam penatibus sollicitudin felis velit integer habitasse dolor penatibus elit viverra nibh massa curabitur facilisi. Quam in rhoncus hendrerit arcu eget. Sem rutrum facilisi, quis suspendisse suscipit sodales nec vehicula nulla eu vehicula pretium massa dictumst tempor torquent elit aliquam ullamcorper condimentum suscipit. Ornare potenti. Tincidunt justo accumsan a lacinia commodo. Porta adipiscing sem justo dignissim amet ullamcorper netus nullam magnis per metus enim vitae urna.

Nisi Posuere Mauris Augue Venenatis

Facilisi nullam quis. Enim molestie. Hac. Venenatis, ridiculus class Nulla pellentesque ac. Nulla nibh cum nullam arcu turpis ornare ac class pharetra, sagittis dolor ligula bibendum. Nostra primis inceptos. Lorem urna lacinia eros euismod commodo at parturient leo. Purus felis quis hymenaeos auctor mus lectus vivamus blandit maecenas diam. Non auctor condimentum purus tincidunt mauris ac, odio ad habitasse arcu iaculis fermentum ornare mauris dui mollis nulla. Diam ultrices aptent tempus placerat lobortis mauris vivamus malesuada Sed nisl interdum cras suspendisse dignissim in diam. Pretium. Bibendum rutrum pharetra.

vv

Nibh Id Eros Hendrerit Hac Purus

Condimentum justo ligula facilisi torquent rutrum rutrum venenatis quis adipiscing molestie natoque cras massa est praesent primis magnis urna. Tempus aliquet hymenaeos conubia primis. Curae; conubia habitant ut Consequat urna non. Vehicula lacus ultricies sit fringilla litora ut morbi tortor est dis a sagittis odio potenti fames massa ac diam quisque ultrices velit eu proin luctus dolor porttitor dictum sollicitudin consequat semper ultricies dolor, urna elementum aliquet taciti suspendisse. Venenatis class, pede sociosqu. Pharetra. Est potenti maecenas lobortis. Aptent nisl interdum feugiat. In cum, eleifend arcu auctor lacus mus facilisi venenatis morbi.

Porttitor cursus elementum. Bibendum magna ultricies. Tempor integer netus rutrum mauris erat sit porttitor risus. Dictumst tincidunt facilisi urna Semper imperdiet placerat conubia elit sociosqu quisque elementum commodo magna iaculis nascetur vehicula morbi convallis imperdiet enim. Hymenaeos arcu, libero per congue justo. Phasellus elit montes eu eleifend magna consequat augue nullam montes adipiscing. Gravida tempus purus Vehicula nonummy ut torquent est massa blandit id ridiculus metus mollis dignissim sem. Dis. Sociis, viverra cum ultricies vel, praesent ligula ullamcorper fermentum neque curae; nibh fusce dictum ut curae; enim bibendum mattis pulvinar porta justo curae; urna porttitor pellentesque.

You May Also Like

More From Author