<sup>Date: September 18, 2024</sup> It’s hard to remember why, but as a Computer Science student, I was scared of regular expressions. Something about the combination of symbols just made them look ugly and unapproachable to me, I guess. I got the sense that a lot of my peers shared this feeling. I didn’t really take the time to understand them until I used them regularly in my job; I wish I had learned them earlier. Now that I understand them better, I find use cases all the time in my side projects. There are also some really cool tools like [Regexr](https://regexr.com) that help you visualize your pattern as you write it, and explain why something does or doesn't match. Regexr is even [open-source](https://github.com/gskinner/regexr/), and you can self-host it if you want! This post isn’t meant to be comprehensive, and it does assume you have some familiarity with regular expressions already. Rather, I’ll just share a bit about how I use them in my work and side projects. > [!note] I’m no expert! > I’m no expert in regular expressions, or anything that I write about on [gabbert.me](https://gabbert.me)—I’m just a guy who enjoys writing code and writing about that experience. So if there’s a better or more efficient way of doing things, reach out. I’d love to hear your ideas! ## My general methodology 1. Find a use case for regular expressions. For me, this is most often one of two general use cases: - Checking whether strings match a pattern - Extracting strings from a larger block of text 2. Gather as much data as you can that you want to match. 3. Throw that data into [Regexr](https://regexr.com/) and start crafting a pattern. - This works best if you have true positive and false positive data to test against. For example, maybe you want to match `8.8.8.8` (a valid IP address) but not `123.456.789.123` (not a valid IP address) Regular expressions *can* be pretty complicated, but they can also be simple and still be incredibly powerful. ## Features I actually use, with examples ### Character classes Character classes are defined between square brackets. ```regex [\w-_]+ ``` This pattern will match one or more (`+`) **w**ord characters (letter or number), dash, or underscore. ### Negated character sets Negated character sets are similar, but start with a carat `^` character between the brackets to negate the set. ```regex [^\\]+ ``` This pattern will match one or more (`+`) non-backslash characters. This example can be useful for matching any text between backslashes, such as within a Windows file path. ### Optional characters `?` A question mark makes the previous character, set, or group optional. ```regex (\d+\[?\.\]?){3}\d+ ``` In this example, the square bracket characters are made optional. When matching IP addresses from a piece of text, they may be defanged so that they aren’t clickable. Then again, they may not be. The question marks make the square brackets, used to defang, optional. This pattern will match `8.8.8.8` (not defanged), `8.8.8[.]8`, `8[.]8[.]8[.]8`, and more. <sub>Note: I’ve simplified the rest of the expression specifically to emphasize the optional characters. There are better ways to match a valid IP address.</sub> ## Additional practical examples ### Banking/transaction notifications I’ve created an iOS shortcut that uses Optical Character Recognition (OCR) to get on-screen text and parse it for transaction data that can be entered into my budget spreadsheet. I wrote about this shortcut [[Automating budget spreadsheet entries with iOS Shortcuts|here]]. In my experience, PayPal's iOS purchase notification text can take several forms: ``` PayPal Cashback MasterCard You made a purchase with your PayPal Cashback Mastercard for $10.00. ``` ``` Your purchase with PayPal You authorized a payment of $10.00. Tap for more details. ``` ``` Your purchase at Obsidian.md Your $10.00 payment was completed. Tap for more details. ``` Using capture groups, we can extract relevant information like the payee and amount of the transaction. ```regex You(?:r\s+|\s+made\s+a\s+)purchase\s+(?:at|with(?:\s+your)?)\s+([^\n]+)\n*.*\n*.*(?:Your?|for)\s+.*(\$\d+\.\d+) ``` This is what it looks like when testing the pattern in Regexr: ![[PayPal Regexr.png]] In the bottom pane, you can even see an example of how capture groups are used to extract relevant data from the matched text. - Capture group `$1` gives us the payee. If the notification doesn’t contain that data, we just get “PayPal” so we can use that condition to prompt for manual entry in my Shortcut. - Capture group `$2` gives us the amount spent.