Related Article: Word similarity matching using soundex in python
How POS Tagging works?
from nltk import word_tokenize, pos_tag
print(pos_tag(word_tokenize("I love NLP")))
Output:
[('I', 'PRP'), ('love', 'VBP'), ('NLP', 'RB')]
Here for the sentence “I love NLP”, NLTK POS tagger successfully tagged
- I as PRP (pronoun, personal)
- Love as VBP (verb, present tense, not 3rd person singular)
- NLP as RB (adverb)
import nltk
nltk.help.upenn_tagset()
Note: Don’t forget to download help data/ corpus from NLTK
Related Article: How to download NLTK corpus Manually
POS tags | Full Form of POS tags | Example |
---|---|---|
CC | conjunction, coordinating | &, ‘n, and, both etc. |
CD | numeral, cardinal | mid-1890, nine-thirty, zero, two etc. |
DT | determiner | all, an, another, any, both etc. |
EX | existential there | there |
FW | foreign word | gemeinschaft, quibusdam, fille etc. |
IN | preposition or conjunction, subordinating | astride, among, uppon, whether etc. |
JJ | adjective or numeral, ordinal | battery-powered, pre-war, multi-disciplinary etc. |
JJR | adjective, comparative | braver, cleaner, brighter etc. |
JJS | adjective, superlative | cheapest, closest, darkest etc. |
LS | list item marker | SP-44002, SP-4005 etc. |
MD | modal auxiliary | can, cannot, could, couldn’t, shouldn’t etc. |
NN | noun, common, singular or mass | cabbage, afghan, slick etc. |
NNP | noun, proper, singular | Ranzer, Shannon, CTCA, Light etc. |
NNPS | noun, proper, plural | Americans, Indians, Australians etc. |
NNS | noun, common, plural | undergraduates, scotches, bodyguards etc. |
PDT | pre-determiner | all, many, quite, such etc. |
POS | genitive marker | ‘, s etc. |
PRP | pronoun, personal | hers, herself, him, himself etc. |
PRP$ | pronoun, possessive | her, his, mine, my etc. |
RB | adverb | occasionally, adventurously, professedly etc. |
RBR | adverb, comparative | further, longer, louder etc. |
RBS | adverb, superlative | best, biggest, largest etc. |
RP | particle | about,along, apart etc. |
SYM | symbol | %, &, ‘, ”,*,+ etc. |
TO | “to” as preposition or infinitive marker | to |
UH | interjection | Goodbye, Gosh, Wow etc. |
VB | verb, base form | ask, assemble, assign etc. |
VBD | verb, past tense | dipped, halted, registered etc. |
VBG | verb, present participle or gerund | telegraphing, judging, erasing etc. |
VBN | verb, past participle | used, unsettled, dubbed etc. |
VBP | verb, present tense, not 3rd person singular | glisten, obtain, comprise etc. |
VBZ | verb, present tense, 3rd person singular | marks, mixes, seals etc. |
WDT | WH-determiner | that, what, whatever, which and whichever |
WP | WH-pronoun | that, what, whatever, whatsoever, which, who, whom and whosoever |
WP$ | WH-pronoun, possessive | whose |
WRB | Wh-adverb | how, however, whence, whenever, where, whereby, whereever, wherein, whereof and why |
Application of POS:
Okay now back to the topic.
In this topic I will only explain how to extract custom keywords from sentence using POS tagging.
Extract Custom keywords by POS tagging:
comment = ["I am using Mi note5 it is working great",
"My Samsung s7 is hanging very often",
"My friend is using Motorola g5 for last 5 years, he is happy with it"]
for i in range(0,3):
print(pos_tag(word_tokenize(comment[i])))
print('n')
Output:
[('I', 'PRP'), ('am', 'VBP'), ('using', 'VBG'), ('Mi', 'NNP'), ('note5', 'NN'), ('it', 'PRP'), ('is', 'VBZ'), ('working', 'VBG'), ('great', 'JJ')]
[('My', 'PRP
# Extracting all Nouns from a text file using nltk
for i in range(0,3):
token_comment = word_tokenize(comment[i])
tagged_comment = pos_tag(token_comment)
print( [(word, tag) for word, tag in tagged_comment if (tag=='NNP')])
Output:
[('Mi', 'NNP')]
[('Samsung', 'NNP')]
[('Motorola', 'NNP')]
See now I am able to extract those entity (Mi, Samsung and Motorola) what I was trying to do.
Extract patterns from lists of POS tagged words in NLTK:
[('I', 'PRP'), ('am', 'VBP'), ('using', 'VBG'), ('Mi', 'NNP'), ('note5', 'NN'), ('it', 'PRP'), ('is', 'VBZ'), ('working', 'VBG'), ('great', 'JJ')]
[('My', 'PRP
# Function to extract two pattern tags
def match2(token_pos,pos1,pos2):
for subsen in token_pos:
# avoid index error and catch last three elements
end = len(subsen) - 1
for ind, (a, b) in enumerate(subsen, 1):
if ind == end:
break
if b == pos1 and subsen[ind][1] == pos2:
yield ("{} {}".format(a, subsen[ind][0], subsen[ind + 1][0]))
# Print company and model no for each sentence
for i in range(0,3):
tokens = word_tokenize(comment[i]) # Generate list of tokens
tokens_pos = pos_tag(tokens)
a = [tokens_pos]
print(list(match2(a,'NNP','NN')))
Output:
['Mi note5']
['Samsung s7']
['Motorola g5']
Yes now we got exactly what we wanted.
Full code:
# Define an array of comments to test.
comment = ["I am using Mi note5 it is working great",
"My Samsung s7 is hanging very often",
"My friend is using Motorola g5 for last 5 years, he is happy with it"]
# Print POS tags of all sentences.
for i in range(0,3):
print(pos_tag(word_tokenize(comment[i])))
print('n')
# Extract only company name from all sentences
for i in range(0,3):
token_comment = word_tokenize(comment[i])
tagged_comment = pos_tag(token_comment)
print( [(word, tag) for word, tag in tagged_comment if (tag=='NNP')])
# Extract company name and model no. both.
# Function to extract two pattern tags
def match2(token_pos,pos1,pos2):
for subsen in token_pos:
# avoid index error and catch last three elements
end = len(subsen) - 1
for ind, (a, b) in enumerate(subsen, 1):
if ind == end:
break
if b == pos1 and subsen[ind][1] == pos2:
yield ("{} {}".format(a, subsen[ind][0], subsen[ind + 1][0]))
# Print company and model no for each sentence
for i in range(0,3):
tokens = word_tokenize(comment[i]) # Generate list of tokens
tokens_pos = pos_tag(tokens)
a = [tokens_pos]
print(list(match2(a,'NNP','NN')))
Conclusion:
- How to tokenize a sentence
- How to tag Parts-of-Speech
- How to extract only Nouns (you can apply same thing for anything like CD, JJ etc.)
- How to extract pattern from list of POS tagged words.
Do you have any question?
), (‘Samsung’, ‘NNP’), (‘s7’, ‘NN’), (‘is’, ‘VBZ’), (‘hanging’, ‘VBG’), (‘very’, ‘RB’), (‘often’, ‘RB’)]
[(‘My’, ‘PRP
Output:
See now I am able to extract those entity (Mi, Samsung and Motorola) what I was trying to do.
Extract patterns from lists of POS tagged words in NLTK:
Output:
Yes now we got exactly what we wanted.
Full code:
Conclusion:
- How to tokenize a sentence
- How to tag Parts-of-Speech
- How to extract only Nouns (you can apply same thing for anything like CD, JJ etc.)
- How to extract pattern from list of POS tagged words.
Do you have any question?
), (‘friend’, ‘NN’), (‘is’, ‘VBZ’), (‘using’, ‘VBG’), (‘Motorola’, ‘NNP’), (‘g5’, ‘NN’), (‘for’, ‘IN’), (‘last’, ‘JJ’), (‘5’, ‘CD’), (‘years’, ‘NNS’), (‘,’, ‘,’), (‘he’, ‘PRP’), (‘is’, ‘VBZ’), (‘happy’, ‘JJ’), (‘with’, ‘IN’), (‘it’, ‘PRP’)]
Output:
See now I am able to extract those entity (Mi, Samsung and Motorola) what I was trying to do.
Extract patterns from lists of POS tagged words in NLTK:
Output:
Yes now we got exactly what we wanted.
Full code:
Conclusion:
- How to tokenize a sentence
- How to tag Parts-of-Speech
- How to extract only Nouns (you can apply same thing for anything like CD, JJ etc.)
- How to extract pattern from list of POS tagged words.
Do you have any question?
), (‘Samsung’, ‘NNP’), (‘s7’, ‘NN’), (‘is’, ‘VBZ’), (‘hanging’, ‘VBG’), (‘very’, ‘RB’), (‘often’, ‘RB’)]
[(‘My’, ‘PRP
Output:
Yes now we got exactly what we wanted.
Full code:
Conclusion:
- How to tokenize a sentence
- How to tag Parts-of-Speech
- How to extract only Nouns (you can apply same thing for anything like CD, JJ etc.)
- How to extract pattern from list of POS tagged words.
Do you have any question?
), (‘Samsung’, ‘NNP’), (‘s7’, ‘NN’), (‘is’, ‘VBZ’), (‘hanging’, ‘VBG’), (‘very’, ‘RB’), (‘often’, ‘RB’)]
[(‘My’, ‘PRP
Output:
See now I am able to extract those entity (Mi, Samsung and Motorola) what I was trying to do.
Extract patterns from lists of POS tagged words in NLTK:
Output:
Yes now we got exactly what we wanted.
Full code:
Conclusion:
- How to tokenize a sentence
- How to tag Parts-of-Speech
- How to extract only Nouns (you can apply same thing for anything like CD, JJ etc.)
- How to extract pattern from list of POS tagged words.
Do you have any question?
), (‘friend’, ‘NN’), (‘is’, ‘VBZ’), (‘using’, ‘VBG’), (‘Motorola’, ‘NNP’), (‘g5’, ‘NN’), (‘for’, ‘IN’), (‘last’, ‘JJ’), (‘5’, ‘CD’), (‘years’, ‘NNS’), (‘,’, ‘,’), (‘he’, ‘PRP’), (‘is’, ‘VBZ’), (‘happy’, ‘JJ’), (‘with’, ‘IN’), (‘it’, ‘PRP’)]
Output:
See now I am able to extract those entity (Mi, Samsung and Motorola) what I was trying to do.
Extract patterns from lists of POS tagged words in NLTK:
Output:
Yes now we got exactly what we wanted.
Full code:
Conclusion:
- How to tokenize a sentence
- How to tag Parts-of-Speech
- How to extract only Nouns (you can apply same thing for anything like CD, JJ etc.)
- How to extract pattern from list of POS tagged words.
Do you have any question?
), (‘friend’, ‘NN’), (‘is’, ‘VBZ’), (‘using’, ‘VBG’), (‘Motorola’, ‘NNP’), (‘g5’, ‘NN’), (‘for’, ‘IN’), (‘last’, ‘JJ’), (‘5’, ‘CD’), (‘years’, ‘NNS’), (‘,’, ‘,’), (‘he’, ‘PRP’), (‘is’, ‘VBZ’), (‘happy’, ‘JJ’), (‘with’, ‘IN’), (‘it’, ‘PRP’)]
Output:
Yes now we got exactly what we wanted.
Full code:
Conclusion:
- How to tokenize a sentence
- How to tag Parts-of-Speech
- How to extract only Nouns (you can apply same thing for anything like CD, JJ etc.)
- How to extract pattern from list of POS tagged words.
Do you have any question?
), (‘Samsung’, ‘NNP’), (‘s7’, ‘NN’), (‘is’, ‘VBZ’), (‘hanging’, ‘VBG’), (‘very’, ‘RB’), (‘often’, ‘RB’)]
[(‘My’, ‘PRP
Output:
See now I am able to extract those entity (Mi, Samsung and Motorola) what I was trying to do.
Extract patterns from lists of POS tagged words in NLTK:
Output:
Yes now we got exactly what we wanted.
Full code:
Conclusion:
- How to tokenize a sentence
- How to tag Parts-of-Speech
- How to extract only Nouns (you can apply same thing for anything like CD, JJ etc.)
- How to extract pattern from list of POS tagged words.
Do you have any question?
), (‘friend’, ‘NN’), (‘is’, ‘VBZ’), (‘using’, ‘VBG’), (‘Motorola’, ‘NNP’), (‘g5’, ‘NN’), (‘for’, ‘IN’), (‘last’, ‘JJ’), (‘5’, ‘CD’), (‘years’, ‘NNS’), (‘,’, ‘,’), (‘he’, ‘PRP’), (‘is’, ‘VBZ’), (‘happy’, ‘JJ’), (‘with’, ‘IN’), (‘it’, ‘PRP’)]
Output:
See now I am able to extract those entity (Mi, Samsung and Motorola) what I was trying to do.
Extract patterns from lists of POS tagged words in NLTK:
Output:
Yes now we got exactly what we wanted.
Full code:
Conclusion:
- How to tokenize a sentence
- How to tag Parts-of-Speech
- How to extract only Nouns (you can apply same thing for anything like CD, JJ etc.)
- How to extract pattern from list of POS tagged words.
Do you have any question?
Of course, what a fantastic blog and instructive posts, I definitely will bookmark your blog.All the Best!
You present a provocative argument! Good job with this blog post
I ran into this page accidentally, surprisingly, this is a amazing blog :-). The site owner has carried out a superb job of putting it together, the info here is really insightful. You just secured myself a guarenteed reader.