find urls in a string using regex in python

One way to find URLs in a string using regex in Python is to use the re module. We can create a regular expression pattern to match URLs, and then use the findall() function from the re module to find all matches in the string.

Here's an example code snippet:

main.py
import re

def find_urls(text):
    # define a regular expression pattern for URLs
    pattern = r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'

    # search for the pattern in the string using 'findall'
    urls = re.findall(pattern, text)
    
    return urls
297 chars
11 lines

In this code, we define a regular expression pattern that matches URLs. The pattern starts with "http" or "https", followed by a colon and two forward slashes. Then, we use a non-capturing group (?:) to define a series of characters and character classes that can appear in the domain name and path of the URL. Finally, we use the + operator to specify that the pattern should match one or more of these characters.

We then use the findall() function from the re module to find all matches of the pattern in the input text. This function returns a list of all matches found.

You can call this function with a string argument to find all URLs in the string. For example:

main.py
text = 'Check out this cool website: https://www.example.com'
urls = find_urls(text)
print(urls)  # prints ['https://www.example.com']
135 chars
4 lines

Note that this regex pattern may not match all possible URLs, but should cover most common cases. Also, it may match some non-URL strings that contain a similar pattern of characters. To improve the accuracy of the pattern, you can adapt it to fit the specific format of URLs you are looking for.

gistlibby LogSnag