code to data wrangle the john hopkins coronavirus website by country in python

To code to data wrangle the John Hopkins Coronavirus website by country in Python, we need to do web scraping. Here are the steps to achieve this:

  1. First, we need to install the required libraries: requests, BeautifulSoup, pandas.
main.py
!pip install requests beautifulsoup4 pandas
44 chars
2 lines
  1. Next, we will send a GET request to the John Hopkins Coronavirus website using requests library and get the HTML content.
main.py
import requests

url = "https://coronavirus.jhu.edu/data/new-cases-50-states"
response = requests.get(url)
content = response.content
134 chars
6 lines
  1. After that, we will use BeautifulSoup library to parse the HTML content and extract the required data.
main.py
from bs4 import BeautifulSoup

soup = BeautifulSoup(content, "html.parser")
table = soup.find("table")

headers = [th.text.strip() for th in table.find_all("th")]

data = []
for tr in table.find_all("tr"):
    row = [td.text.strip() for td in tr.find_all("td")]
    if row:
        data.append(row)
299 chars
13 lines
  1. Once we have extracted the data, we will convert it into a pandas dataframe and perform data wrangling as per our requirement.
main.py
import pandas as pd

df = pd.DataFrame(data, columns=headers)
df.dropna(inplace=True)
df["New Cases"] = df["New Cases"].str.replace(",", "").astype(int)
df = df.groupby("State").sum().sort_values(by="New Cases", ascending=False)
229 chars
7 lines

This code will give you a pandas dataframe with the total new cases sorted by state. You can modify the code as per your requirement to extract the data for a specific country.

gistlibby LogSnag