parse a table on a wikipedia page in python

To parse a table on a Wikipedia page in Python, you can use the BeautifulSoup library to extract the HTML code of the page and then locate the table element using its CSS class or other attributes. Here is an example of how to extract a table from a Wikipedia page:

main.py
import requests
from bs4 import BeautifulSoup

# the url of the wikipedia page with the table
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_population_(United_Nations)'

# send a GET request to the page and get its HTML content
response = requests.get(url)
html_content = response.content

# parse the HTML content with BeautifulSoup
soup = BeautifulSoup(html_content, 'html.parser')

# locate the table element by its class or other attributes
table = soup.find('table', {'class': 'wikitable sortable'})

# extract the data from the table rows
for row in table.find_all('tr'):
    cells = row.find_all('td')
    if len(cells) > 0:
        # extract the data from the cells and process it as needed
        country = cells[1].text.strip()
        population = cells[2].text.strip()
        print(country, population)
829 chars
25 lines

In this example, we extract the table from the wikipedia page that contains a list of countries by population, and then loop through its rows to extract the data from each cell. We use the find() method of BeautifulSoup to locate the table element by its CSS class wikitable sortable. Finally, we extract the country name and population from the second and third cells of each row, respectively, and print them out.

gistlibby LogSnag