Django Functions

 

Master Text Processing in Django with Simple NLP Techniques

Django, a high-level Python web framework, and Natural Language Processing (NLP), the study of computational methods to analyze and generate human language, might seem like an odd couple at first. However, when building web applications that handle textual data, combining Django’s robustness with the power of NLP can lead to remarkable outcomes. In this blog post, we will explore some ways Django can work in tandem with NLP techniques to handle and process textual data.

Master Text Processing in Django with Simple NLP Techniques

1. Setting Up

Before we dive in, ensure you have Django set up. If not, you can easily install it using `pip`:

```bash
pip install django
```

For our NLP tasks, we’ll use the `spaCy` library. Install it and its English model with:

```bash
pip install spacy
python -m spacy download en_core_web_sm
```

2. Text Extraction from User Input

Suppose you’re building a blog platform with Django. Your models might look something like this:

```python
from django.db import models

class Blog(models.Model):
    title = models.CharField(max_length=200)
    content = models.TextField()
```

Users will input text that you might want to process. For example, extracting named entities like person names, organizations, or places.

```python
import spacy
from .models import Blog

nlp = spacy.load("en_core_web_sm")

def extract_named_entities(blog_id):
    blog = Blog.objects.get(id=blog_id)
    doc = nlp(blog.content)
    entities = [(ent.text, ent.label_) for ent in doc.ents]
    return entities
```

This function takes a blog’s ID, retrieves its content, and then identifies the named entities.

3. Automatic Content Summarization

Let’s say you want to provide a concise summary for each blog post. One way is to use extractive summarization which extracts salient sentences from the content.

```python
from spacy.lang.en.stop_words import STOP_WORDS

def summarize_content(blog_id):
    blog = Blog.objects.get(id=blog_id)
    doc = nlp(blog.content)

    # Tokenize and remove stopwords
    tokenized = [token.text for token in doc if token.text not in STOP_WORDS]
    sentence_strength = {}

    for sent in doc.sents:
        for word in sent:
            if word.text in tokenized:
                if sent in sentence_strength:
                    sentence_strength[sent] += 1
                else:
                    sentence_strength[sent] = 1

    # Extract top 5 strongest sentences
    summarized = sorted(sentence_strength.items(), key=lambda x: x[1], reverse=True)[:5]
    return ' '.join([str(sent[0]) for sent in summarized])
```

4. Content Tagging based on Named Entities

One can automatically tag blog content based on the named entities present in the text. For instance, if “Apple” is mentioned, it could be tagged as “Company”.

```python
def auto_tag(blog_id):
    entities = extract_named_entities(blog_id)
    tags = set()

    for entity, label in entities:
        if label == "ORG":
            tags.add("Company")
        elif label == "PERSON":
            tags.add("Person")
        elif label == "GPE":
            tags.add("Place")
    
    return tags
```

5. Sentiment Analysis

To get feedback about the mood of the content, you can perform sentiment analysis on your blogs.

```python
def get_sentiment(blog_id):
    from spacytextblob.spacytextblob import SpacyTextBlob
    nlp.add_pipe('spacytextblob')

    blog = Blog.objects.get(id=blog_id)
    doc = nlp(blog.content)
    
    if doc._.sentiment.polarity > 0.5:
        return "Positive"
    elif doc._.sentiment.polarity < -0.5:
        return "Negative"
    else:
        return "Neutral"
```

Make sure to install `spacytextblob` with `pip install spacytextblob`.

6. Search Optimization using Lemmatization

Enhance your blog platform’s search by using lemmatization, which reduces words to their base form.

```python
def enhanced_search(query):
    doc = nlp(query)
    lemmatized_query = ' '.join([token.lemma_ for token in doc])

    # Use the lemmatized query to search in your database
    results = Blog.objects.filter(content__icontains=lemmatized_query)
    return results
```

Conclusion

As we’ve seen, combining Django with NLP libraries like `spaCy` can enrich web applications by adding capabilities like named entity recognition, content summarization, sentiment analysis, and more. By integrating such features, you can make your Django applications more dynamic, insightful, and user-friendly. With the vast landscape of NLP techniques available, the possibilities are limitless!

Previously at
Flag Argentina
Argentina
time icon
GMT+2
Experienced Full-stack Developer with a focus on Django, having 7 years of expertise. Worked on diverse projects, utilizing React, Python, Django, and more.