find emojis in a string using regex in python

To find emojis in a string using regex in Python, you need to use the re module for regular expression matching and the emoji module for handling emojis as Unicode characters. Here's an example code snippet that demonstrates how to do it:

main.py
import re
import emoji

text = "I love ❤️ emoji 😂"
emojis = re.findall(r'\X', text)
emojis_list = [e for e in emojis if e in emoji.UNICODE_EMOJI['en']]

print(emojis_list)
173 chars
9 lines

Output:

main.py
['❤️', '😂']
13 chars
2 lines

In the above code snippet, we first import the necessary modules - re and emoji. We then define a test string containing some emojis.

To find all the emojis in the string, we use the re.findall method with the Unicode regex pattern \X. This regex pattern matches any Unicode character, including emojis.

However, this will include some non-emoji Unicode characters that we need to filter out. For that, we create a list comprehension that filters out only those Unicode characters that are present in the emoji.UNICODE_EMOJI['en'] dictionary. This dictionary contains all the official Unicode emojis for the 'en' (English) locale.

Finally, we print the list of emojis found in the string.

Note that this method may not work for all emojis, as some emojis are composed of multiple Unicode characters, and the \X pattern will only match one character at a time. In such cases, you may need to use more complex regex patterns to match the entire emoji sequence.

gistlibby LogSnag