Interest in Natural Language Processing (NLP) is growing due to increasing number of interesting applications like Machine Translator, Chatbot, Image Captioning etc.
There are lots of tools to work with NLP. Some popular of them are:
- TextBlob
What is TextBlob?
- Noun phrase extraction
- Classification (by using Naive Bayes, Decision Tree)
- Language translation and detection powered by Google Translate
- Spelling correction
- Tokenization (splitting text into words and sentences)
- Word and phrase frequencies
- Parsing
- n-grams
- Word inflection (pluralization and singularization) and lemmatization
- Add new models or languages through extensions
- WordNet integration
Install and setup TextBlob for Python
- Convert any string to TextBlob object
- Call functions of TextBlob to do a specific task
Tokenization with TextBlob
Tokenization refers to dividing text into sequence of tokens
from textblob import TextBlob text = ''' TextBlob is a Python (2 and 3) library for processing textual data. It provides API to do natural language processing (NLP) such as part-of-speech tagging, noun phrase extraction, sentiment analysis, etc. ''' blob_obj = TextBlob(text) # Divide into sentence blob_obj.sentences
—
# Print tokens/words blob_obj.tokens
—
POS tagging with TextBlob
- PatternTagger (uses the same implementation as the pattern library)
- NLTKTagger which uses NLTK’s TreeBank tagger
# By using TreeBank tagger from textblob.taggers import NLTKTagger nltk_tagger = NLTKTagger() blob_obj = TextBlob(text, pos_tagger=nltk_tagger) blob_obj.pos_tags # By using Pattern Tagger from textblob.taggers import PatternTagger pattern_tagger = PatternTagger() blob_obj = TextBlob(text, pos_tagger=pattern_tagger) blob_obj.pos_tags
Noun Phrase Extraction using TextBlob
for np in blob_obj.noun_phrases: print (np)
—
Word Inflection and Lemmatization using TextBlob
# Singularize form print('Prvious: ', blob_obj.words[13], ' After: ', blob_obj.words[13].singularize()) # Pluralize form print('Prvious: ', blob_obj.words[7], ' After: ', blob_obj.words[7].pluralize())
Prvious: library After: libraries
N-grams using TextBlob
## 3-gram blob_obj.ngrams(n=3)
Sentiment Analysis using TextBlob
- Polarity (range -1 to 1)
- Subjectivity (range 0 to 1)
text = "I hate this phone" blob_obj = TextBlob(text) blob_obj.sentiment
—
text = "I love this phone" blob_obj = TextBlob(text) blob_obj.sentiment
Spelling Correction using TextBlob
blob_obj = TextBlob("speling") blob_obj.words[0].spellcheck() [('spelling', 1.0)]
How spelling corrector works in TextBlob
# def P(word, N=sum(WORDS.values())): # "Probability of `word`." # return WORDS[word] / N
—
Language detection and Translation using TextBlob
## Detect Language text = "I hate this phone" blob_obj = TextBlob(text) blob_obj.detect_language()
>>‘en’
Now if you are trying to use this code in your office computer you may get TimeoutError called:
URLError: <urlopen error [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time or established connection failed because connected host has failed to respond>
In this case you have to define your proxy address before above code like below (as it is fetching result from web api):
## Detect Language with proxy import nltk # Set up your proxy address nltk.set_proxy('http://111.199.236.103:8080') text = "I hate this phone" blob_obj = TextBlob(text) blob_obj.detect_language()
## Translate to bengali language blob_obj.translate(to="bn")
—
Language Name
|
Code
|
Language Name
|
Language Code
|
Afrikaans
|
af
|
Irish
|
ga
|
Albanian
|
sq
|
Italian
|
it
|
Arabic
|
ar
|
Japanese
|
ja
|
Azerbaijani
|
az
|
Kannada
|
kn
|
Basque
|
eu
|
Korean
|
ko
|
Bengali
|
bn
|
Latin
|
la
|
Belarusian
|
be
|
Latvian
|
lv
|
Bulgarian
|
bg
|
Lithuanian
|
lt
|
Catalan
|
ca
|
Macedonian
|
mk
|
Chinese Simplified
|
zh-CN
|
Malay
|
ms
|
Chinese Traditional
|
zh-TW
|
Maltese
|
mt
|
Croatian
|
hr
|
Norwegian
|
no
|
Czech
|
cs
|
Persian
|
fa
|
Danish
|
da
|
Polish
|
pl
|
Dutch
|
nl
|
Portuguese
|
pt
|
English
|
en
|
Romanian
|
ro
|
Esperanto
|
eo
|
Russian
|
ru
|
Estonian
|
et
|
Serbian
|
sr
|
Filipino
|
tl
|
Slovak
|
sk
|
Finnish
|
fi
|
Slovenian
|
sl
|
French
|
fr
|
Spanish
|
es
|
Galician
|
gl
|
Swahili
|
sw
|
Georgian
|
ka
|
Swedish
|
sv
|
German
|
de
|
Tamil
|
ta
|
Greek
|
el
|
Telugu
|
te
|
Gujarati
|
gu
|
Thai
|
th
|
Haitian Creole
|
ht
|
Turkish
|
tr
|
Hebrew
|
iw
|
Ukrainian
|
uk
|
Hindi
|
hi
|
Urdu
|
ur
|
Hungarian
|
hu
|
Vietnamese
|
vi
|
Icelandic
|
is
|
Welsh
|
cy
|
Indonesian
|
id
|
Yiddish
|
yi
|
Conclusion
- What is TextBlob?
- Install and setup TextBlob for Python
- Tokenization with TextBlob
- POS tagging with TextBlob
- Noun Phrase Extraction using TextBlob
- Word Inflection and Lemmatization using TextBlob
- N-grams using TextBlob
- Sentiment Analysis using TextBlob
- Spelling Correction using TextBlob
- How spelling corrector works in TextBlob
- Language detection and Translation using TextBlob
To the stage and written well, tyvm for that info.
Brilliant, thank you, I will subscribe to you RSS now!
Hello :-), quite wonderful weblog. Finally any person offers helpful details.
Good read
Excellent goods from you, man. Ive understand your stuff previous to and youre just extremely excellent. I really like what you have acquired here, certainly like what youre stating and the way in which you say it. You make it enjoyable and you still care for to keep it smart. I can not wait to read much more from you. This is really a tremendous web site.
A complete article for Natural Language Processing Using Textblob. Thanks
Complete understanding of Natural Language Processing. nlp textblob. Thanks
I love the theme youre using in your blog Im so grateful with this post and thank you a lot for sharing it with us. Will definately keep close track of these pages.