obtain all johnny depp movies from imdb and filter them by raiting in python

To obtain all Johnny Depp movies from IMDb and filter them by rating in Python, we can use web scraping techniques along with the IMDb website.

First, we need to import the necessary libraries such as requests and BeautifulSoup. We then need to send a GET request to IMDb's search page with Johnny Depp's name as the query parameter. We then parse the HTML content of the search page using BeautifulSoup to obtain the list of search results which contains all Johnny Depp movies.

Next, we need to filter out all irrelevant search results that are not movies. We can do this by checking the result type for each search result.

Finally, we can sort the list of Johnny Depp movies by ratings and print out the top-rated movies. Here is an example code snippet that achieves this:

main.py
import requests
from bs4 import BeautifulSoup


def get_johnny_depp_movies():
    url = 'https://www.imdb.com/find'
    params = {'q': 'Johnny Depp', 's': 'tt'}
    response = requests.get(url, params=params)
    soup = BeautifulSoup(response.content, 'html.parser')
    search_results = soup.find_all('div', {'class': 'findResult'})
    movies = []
    for result in search_results:
        result_type = result.find('span', {'class': 'findSearchType'}).text
        if 'Actor' not in result_type and 'Actress' not in result_type:
            continue
        link = result.find('a')['href']
        title = result.find('a').text
        movies.append({'title': title, 'link': link})
    return movies


def filter_by_rating(movies, min_rating):
    for movie in movies:
        link = 'https://www.imdb.com' + movie['link']
        response = requests.get(link)
        soup = BeautifulSoup(response.content, 'html.parser')
        rating = soup.find('span', {'itemprop': 'ratingValue'}).text
        if float(rating) >= min_rating:
            movie['rating'] = rating
        else:
            movies.remove(movie)
    return sorted(movies, key=lambda x: x['rating'], reverse=True)


johnny_depp_movies = get_johnny_depp_movies()
top_rated_movies = filter_by_rating(johnny_depp_movies, 8.0)

for movie in top_rated_movies:
    print(movie['title'], movie['rating'])
1370 chars
40 lines

In this example code snippet, we define two functions:

  • get_johnny_depp_movies: sends a GET request to IMDb's search page for Johnny Depp and obtains the list of all search results that are movies.
  • filter_by_rating: filters out all movies with a rating less than the given min_rating parameter.

We then call these two functions to obtain and filter the list of Johnny Depp movies. We then sort the filtered movies in descending order based on their rating and print out the top-rated movies along with their ratings.

gistlibby LogSnag