Django Functions

Elevate Your Django Projects with Web Scraping Techniques

Web scraping is the process of extracting information from websites. This is often done to collect data from websites that do not offer APIs or other forms of structured data access. With Python being a prominent language for web scraping, it’s no surprise that Django, a popular Python web framework, can be integrated with web scraping tools for more dynamic applications.

Table of Contents

In this blog post, we will explore how to combine Django with web scraping, creating a simple application that fetches and stores data.

1. Tools and Libraries

To achieve our goal, we will use the following libraries:

– Django: Our web framework.

– BeautifulSoup: A library to parse HTML and XML documents.

– Requests: A library to make HTTP requests.

2. Setting Up the Project

Let’s start by creating a new Django project:

```bash
django-admin startproject webscraping_project
cd webscraping_project
```

3. Building the Model

We will scrape article titles and links from a sample blog. Our Django model will represent these articles:

```python
from django.db import models

class Article(models.Model):
    title = models.CharField(max_length=300)
    link = models.URLField()

    def __str__(self):
        return self.title
```

Run migrations after creating this model:

```bash
python manage.py makemigrations
python manage.py migrate
```

4. Web Scraping Logic

Let’s create a function to extract article titles and links. For this example, let’s consider a hypothetical blog.

```python
import requests
from bs4 import BeautifulSoup

def fetch_articles_from_blog():
    URL = "https://sample-blog-website.com/"
    response = requests.get(URL)
    soup = BeautifulSoup(response.content, "html.parser")

    articles = []

    # Assuming articles are within 'article' tags and have 'a' tags for links
    for article in soup.find_all("article"):
        title = article.find("h2").text
        link = article.find("a")["href"]

        articles.append({
            "title": title,
            "link": link
        })

    return articles
```

5. Integrating with Django

With the web scraping function ready, let’s integrate it with Django:

5.1. Create a View

This view will scrape the articles and save them to the database.

```python
from django.shortcuts import render, redirect
from .models import Article

def fetch_and_store_articles(request):
    articles_data = fetch_articles_from_blog()

    for article_data in articles_data:
        # Avoiding duplicate entries by checking if title already exists
        if not Article.objects.filter(title=article_data["title"]).exists():
            Article.objects.create(title=article_data["title"], link=article_data["link"])

    return redirect("/")
```

5.2. URL Configuration

Create a URL for the view:

```python
from django.urls import path
from . import views

urlpatterns = [
    path('fetch/', views.fetch_and_store_articles, name="fetch_articles"),
]
```

5.3. Displaying the Articles

Create a template to display the fetched articles:

```html
{% for article in articles %}
    <h2><a href="{{ article.link }}">{{ article.title }}</a></h2>
{% endfor %}
```

And the corresponding view:

```python
def display_articles(request):
    articles = Article.objects.all()
    return render(request, "template_name.html", {"articles": articles})
```

Add the URL configuration:

```python
path('', views.display_articles, name="display_articles"),
```

Conclusion

This example demonstrates how Django can be integrated with web scraping to fetch and store data. Combining Django’s ORM capabilities with Python’s web scraping prowess makes it a powerful tool for data extraction and presentation.

Remember to always respect the `robots.txt` file of websites when scraping and ensure you are not violating any terms of service.

With a growing number of applications requiring dynamic data access, combining Django and web scraping opens a realm of possibilities. Whether you’re building a personal project or a data-intensive application, this integration can be a valuable asset.

Table of Contents

Previously at

About

Sebastian

Senior Django Developer Ex-Meta

Argentina

GMT+2

Experienced Full-stack Developer with a focus on Django, having 7 years of expertise. Worked on diverse projects, utilizing React, Python, Django, and more.