Interest of Natural Language Processing (NLP) is growing due to increasing number of interesting applications like Machine Translator, Chatbot, Image Captioning etc.
There are lots of tools to work with NLP. Some popular of them are:
- NLTK
- Spacy
- Stanford Core NLP
- TextBlob
What is TextBlob?
- Part-of-speech tagging
- Noun phrase extraction
- Sentiment analysis
- Classification (by using Naive Bayes, Decision Tree)
- Language translation and detection powered by Google Translate
- Spelling correction
- Tokenization (splitting text into words and sentences)
- Word and phrase frequencies
- Parsing
- n-grams
- Word inflection (pluralization and singularization) and lemmatization
- Add new models or languages through extensions
- WordNet integration
Install and setup TextBlob for Python
- Convert any string to TextBlob object
- Call functions of TextBlob to do a specific task
Tokenization with TextBlob
Tokenization refers to dividing text into sequence of tokens
from textblob import TextBlob
text = '''
TextBlob is a Python (2 and 3) library for processing textual data.
It provides API to do natural language processing (NLP)
such as part-of-speech tagging, noun phrase extraction, sentiment analysis, etc.
'''
blob_obj = TextBlob(text)
# Divide into sentence
blob_obj.sentences
<! –– In article Ad start ––>
<! –– In article Ad End ––>
# Print tokens/words
blob_obj.tokens
POS tagging with TextBlob
- PatternTagger (uses the same implementation as the pattern library)
- NLTKTagger which uses NLTK’s TreeBank tagger
# By using TreeBank tagger
from textblob.taggers import NLTKTagger
nltk_tagger = NLTKTagger()
blob_obj = TextBlob(text, pos_tagger=nltk_tagger)
blob_obj.pos_tags
# By using Pattern Tagger
from textblob.taggers import PatternTagger
pattern_tagger = PatternTagger()
blob_obj = TextBlob(text, pos_tagger=pattern_tagger)
blob_obj.pos_tags
Noun Phrase Extraction using TextBlob
for np in blob_obj.noun_phrases:
print (np)
<! –– In article Ad start ––>
<! –– In article Ad End ––>
Word Inflection and Lemmatization using TextBlob
# Singularize form
print('Prvious: ', blob_obj.words[13], ' After: ', blob_obj.words[13].singularize())
# Pluralize form
print('Prvious: ', blob_obj.words[7], ' After: ', blob_obj.words[7].pluralize())
Prvious: library After: libraries
N-grams using TextBlob
## 3-gram
blob_obj.ngrams(n=3)
Sentiment Analysis using TextBlob
- Polarity (range -1 to 1)
- Subjectivity (range 0 to 1)
text = "I hate this phone"
blob_obj = TextBlob(text)
blob_obj.sentiment
<! –– In article Ad start ––>
<! –– In article Ad End ––>
text = "I love this phone"
blob_obj = TextBlob(text)
blob_obj.sentiment
Spelling Correction using TextBlob
blob_obj = TextBlob("speling")
blob_obj.words[0].spellcheck()
[('spelling', 1.0)]
How spelling corrector works in TextBlob
# def P(word, N=sum(WORDS.values())):
# "Probability of `word`."
# return WORDS[word] / N
<! –– In article Ad start ––>
<! –– In article Ad End ––>
Language detection and Translation using TextBlob
## Detect Language
text = "I hate this phone"
blob_obj = TextBlob(text)
blob_obj.detect_language()
>>‘en’
Now if you are trying to use this code in your office computer you may get TimeoutError called:
URLError: <urlopen error [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time or established connection failed because connected host has failed to respond>
In this case you have to define your proxy address before above code like below (as it is fetching result from web api):
## Detect Language with proxy
import nltk
# Set up your proxy address
nltk.set_proxy('http://111.199.236.103:8080')
text = "I hate this phone"
blob_obj = TextBlob(text)
blob_obj.detect_language()
## Translate to bengali language
blob_obj.translate(to="bn")
<! –– In article Ad start ––>
<! –– In article Ad End ––>
Language Name
|
Code
|
Language Name
|
Language Code
|
Afrikaans
|
af
|
Irish
|
ga
|
Albanian
|
sq
|
Italian
|
it
|
Arabic
|
ar
|
Japanese
|
ja
|
Azerbaijani
|
az
|
Kannada
|
kn
|
Basque
|
eu
|
Korean
|
ko
|
Bengali
|
bn
|
Latin
|
la
|
Belarusian
|
be
|
Latvian
|
lv
|
Bulgarian
|
bg
|
Lithuanian
|
lt
|
Catalan
|
ca
|
Macedonian
|
mk
|
Chinese Simplified
|
zh-CN
|
Malay
|
ms
|
Chinese Traditional
|
zh-TW
|
Maltese
|
mt
|
Croatian
|
hr
|
Norwegian
|
no
|
Czech
|
cs
|
Persian
|
fa
|
Danish
|
da
|
Polish
|
pl
|
Dutch
|
nl
|
Portuguese
|
pt
|
English
|
en
|
Romanian
|
ro
|
Esperanto
|
eo
|
Russian
|
ru
|
Estonian
|
et
|
Serbian
|
sr
|
Filipino
|
tl
|
Slovak
|
sk
|
Finnish
|
fi
|
Slovenian
|
sl
|
French
|
fr
|
Spanish
|
es
|
Galician
|
gl
|
Swahili
|
sw
|
Georgian
|
ka
|
Swedish
|
sv
|
German
|
de
|
Tamil
|
ta
|
Greek
|
el
|
Telugu
|
te
|
Gujarati
|
gu
|
Thai
|
th
|
Haitian Creole
|
ht
|
Turkish
|
tr
|
Hebrew
|
iw
|
Ukrainian
|
uk
|
Hindi
|
hi
|
Urdu
|
ur
|
Hungarian
|
hu
|
Vietnamese
|
vi
|
Icelandic
|
is
|
Welsh
|
cy
|
Indonesian
|
id
|
Yiddish
|
yi
|
Conclusion
- What is TextBlob?
- Install and setup TextBlob for Python
- Tokenization with TextBlob
- POS tagging with TextBlob
- Noun Phrase Extraction using TextBlob
- Word Inflection and Lemmatization using TextBlob
- N-grams using TextBlob
- Sentiment Analysis using TextBlob
- Spelling Correction using TextBlob
- How spelling corrector works in TextBlob
- Language detection and Translation using TextBlob