Master Text Processing in Django with Simple NLP Techniques
Django, a high-level Python web framework, and Natural Language Processing (NLP), the study of computational methods to analyze and generate human language, might seem like an odd couple at first. However, when building web applications that handle textual data, combining Django’s robustness with the power of NLP can lead to remarkable outcomes. In this blog post, we will explore some ways Django can work in tandem with NLP techniques to handle and process textual data.
Table of Contents
1. Setting Up
Before we dive in, ensure you have Django set up. If not, you can easily install it using `pip`:
```bash pip install django ```
For our NLP tasks, we’ll use the `spaCy` library. Install it and its English model with:
```bash pip install spacy python -m spacy download en_core_web_sm ```
2. Text Extraction from User Input
Suppose you’re building a blog platform with Django. Your models might look something like this:
```python
from django.db import models
class Blog(models.Model):
title = models.CharField(max_length=200)
content = models.TextField()
```
Users will input text that you might want to process. For example, extracting named entities like person names, organizations, or places.
```python
import spacy
from .models import Blog
nlp = spacy.load("en_core_web_sm")
def extract_named_entities(blog_id):
blog = Blog.objects.get(id=blog_id)
doc = nlp(blog.content)
entities = [(ent.text, ent.label_) for ent in doc.ents]
return entities
```
This function takes a blog’s ID, retrieves its content, and then identifies the named entities.
3. Automatic Content Summarization
Let’s say you want to provide a concise summary for each blog post. One way is to use extractive summarization which extracts salient sentences from the content.
```python
from spacy.lang.en.stop_words import STOP_WORDS
def summarize_content(blog_id):
blog = Blog.objects.get(id=blog_id)
doc = nlp(blog.content)
# Tokenize and remove stopwords
tokenized = [token.text for token in doc if token.text not in STOP_WORDS]
sentence_strength = {}
for sent in doc.sents:
for word in sent:
if word.text in tokenized:
if sent in sentence_strength:
sentence_strength[sent] += 1
else:
sentence_strength[sent] = 1
# Extract top 5 strongest sentences
summarized = sorted(sentence_strength.items(), key=lambda x: x[1], reverse=True)[:5]
return ' '.join([str(sent[0]) for sent in summarized])
```
4. Content Tagging based on Named Entities
One can automatically tag blog content based on the named entities present in the text. For instance, if “Apple” is mentioned, it could be tagged as “Company”.
```python
def auto_tag(blog_id):
entities = extract_named_entities(blog_id)
tags = set()
for entity, label in entities:
if label == "ORG":
tags.add("Company")
elif label == "PERSON":
tags.add("Person")
elif label == "GPE":
tags.add("Place")
return tags
```
5. Sentiment Analysis
To get feedback about the mood of the content, you can perform sentiment analysis on your blogs.
```python
def get_sentiment(blog_id):
from spacytextblob.spacytextblob import SpacyTextBlob
nlp.add_pipe('spacytextblob')
blog = Blog.objects.get(id=blog_id)
doc = nlp(blog.content)
if doc._.sentiment.polarity > 0.5:
return "Positive"
elif doc._.sentiment.polarity < -0.5:
return "Negative"
else:
return "Neutral"
```
Make sure to install `spacytextblob` with `pip install spacytextblob`.
6. Search Optimization using Lemmatization
Enhance your blog platform’s search by using lemmatization, which reduces words to their base form.
```python
def enhanced_search(query):
doc = nlp(query)
lemmatized_query = ' '.join([token.lemma_ for token in doc])
# Use the lemmatized query to search in your database
results = Blog.objects.filter(content__icontains=lemmatized_query)
return results
```
Conclusion
As we’ve seen, combining Django with NLP libraries like `spaCy` can enrich web applications by adding capabilities like named entity recognition, content summarization, sentiment analysis, and more. By integrating such features, you can make your Django applications more dynamic, insightful, and user-friendly. With the vast landscape of NLP techniques available, the possibilities are limitless!
Table of Contents


