extract stadistical data from inegi in python

To extract statistical data from INEGI (National Institute of Statistics and Geography in Mexico) using Python, we need to first identify the specific data we are interested in and find the corresponding URL of the data source. Once we have the URL, we can use Python's web-scraping library BeautifulSoup to extract the data and pandas library to clean, manipulate, and analyze the data.

Here's an example code to extract population data from INEGI's website:


main.py
import requests
from bs4 import BeautifulSoup
import pandas as pd

# URL of the population data from INEGI
url = "https://www.inegi.org.mx/app/tabulados/default.aspx?nc=est&pf=proy&ef=05&type=div&anio=2021&bas=1"

# Send a GET request to the webpage
reqs = requests.get(url)

# Extract the content of the page
soup = BeautifulSoup(reqs.content, 'html.parser')

# Extract the table from the page
table = soup.find_all('table')[0]

# Convert the table into a pandas dataframe
df = pd.read_html(str(table))[0]

# Clean and manipulate the data
df = df.iloc[2:]    # Remove unnecessary rows
df = df[:-2]        # Remove unnecessary rows
df = df.drop([1, 3], axis=1)     # Drop unnecessary columns
df.columns = ['State', 'Population']      # Rename columns
df['Population'] = df['Population'].str.replace(",", "").astype(int)     # Remove commas and convert to int

# Analyze the data
total_population = df['Population'].sum()
average_population = df['Population'].mean()
966 chars
30 lines

In this example, we use the URL of the population data for Mexican states in 2021, send a GET request to the webpage, and use BeautifulSoup to extract the table containing the population data. We then convert the table into a pandas dataframe, clean and manipulate the data by removing unnecessary rows and columns, renaming columns, and converting the population column from string to integer after removing commas. Finally, we can analyze the data using pandas functions such as sum and mean.

similar python code snippets

calculator of the standard error of measurement, the standard error of a series of measurements, the confidence interval of measurements and the relative error in python

python code for the standard error of a single measurement, the standard error of the result of a series of measurements, confidence interval of measurements and relative error in python

calculator of the root-mean-square error, confidence interval and relative error as a percentage in python

open a file in python

send an email in python

find urls in a string in python

throw and catch errors in python

connect to secrets manager in python

how to create a class in python

loop in python

related categories