extract stadistical data from inegi in python

To extract statistical data from INEGI (National Institute of Statistics and Geography in Mexico) using Python, we need to first identify the specific data we are interested in and find the corresponding URL of the data source. Once we have the URL, we can use Python's web-scraping library BeautifulSoup to extract the data and pandas library to clean, manipulate, and analyze the data.

Here's an example code to extract population data from INEGI's website:

main.py
import requests
from bs4 import BeautifulSoup
import pandas as pd

# URL of the population data from INEGI
url = "https://www.inegi.org.mx/app/tabulados/default.aspx?nc=est&pf=proy&ef=05&type=div&anio=2021&bas=1"

# Send a GET request to the webpage
reqs = requests.get(url)

# Extract the content of the page
soup = BeautifulSoup(reqs.content, 'html.parser')

# Extract the table from the page
table = soup.find_all('table')[0]

# Convert the table into a pandas dataframe
df = pd.read_html(str(table))[0]

# Clean and manipulate the data
df = df.iloc[2:]    # Remove unnecessary rows
df = df[:-2]        # Remove unnecessary rows
df = df.drop([1, 3], axis=1)     # Drop unnecessary columns
df.columns = ['State', 'Population']      # Rename columns
df['Population'] = df['Population'].str.replace(",", "").astype(int)     # Remove commas and convert to int

# Analyze the data
total_population = df['Population'].sum()
average_population = df['Population'].mean()
966 chars
30 lines

In this example, we use the URL of the population data for Mexican states in 2021, send a GET request to the webpage, and use BeautifulSoup to extract the table containing the population data. We then convert the table into a pandas dataframe, clean and manipulate the data by removing unnecessary rows and columns, renaming columns, and converting the population column from string to integer after removing commas. Finally, we can analyze the data using pandas functions such as sum and mean.

gistlibby LogSnag