What the ^\s+$ does this mean?
If you thought that was swearing then you haven't entered the world of coding with regular expressions.
TLDR;
Stay curious.
Keep learning.
There’s so much out there that we all just don’t know about.
Earning a Super Power
How many times have you said; “I never knew that was possible. That’s amazing!” after being told a new trick or life hack?
This is the exact response I gave when I learnt about “regular expressions” over 20 years ago and I’m still amazed by it today.
It’s like a super power.
Regular expressions, more commonly known as regex, is a way of searching for text using a special syntax.
Now you might think that searching for text in a document is a trivial task which everyone knows how to do. On the whole you’d be correct but you’ve only ever been taught how to search a body of text in one way.
If no one has ever shown you how to search in another way then of course you’d be oblivious to other methods.
With regex you move away from searching for exact text to matching for a particular pattern in the text. This is the mindset shift that I’m going to show you today.
Regex in action
Let’s say you had a huge legal document, 900 pages in length. You want to change all the occurrences of the main witnesses name to something else. How would you do it?
The simplest way is to search for “THE NAME” and replace with “REDACTED”. Nothing mind blowing about that, right. “Where is this going?”, I can hear you say.
Let’s take the same huge document and remove all the email addresses from it. What do you do now? Not so straight forward is it?
You could search for all the occurrences of the “@” symbol and then manually change it to “REDACTED” but that could take hours depending on the number of times that symbol appears in the document.
It’s problems like this that the mind of a programmer, like myself, like to solve. Let me take you through that process.
What do you know about emails addresses?
Well, they usually start with letters or numbers and sometimes have punctuation like - . _ to list a few.
Then there’s always an “@” symbol.
This is followed by more letters, numbers and sometimes dashes but never underscores.
Finally there’s the . (dot) and some more letters which is always two or more but never numbers.
We have a pattern we can use to recognise a valid email address just by looking at it.
So can a computer do that?
Well, yes of course it can. In fact searching for text patterns in programming is so common that Stephen Kleene created a language to do just that back in the 1950s. That language is still used today.
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
That’s not a typo above. This is the regex pattern that you can use to search for a valid email address. It’s similar to what a website would use to check if the email field is filled in correctly.
I’ll give some details in to what it all means but once you understand it, it makes sense and feels like a secret that only you and a handful of others share.
Let me break it down in to 5 parts.
[a-zA-Z0-9._%+-]+ : This first bit covers the range of letters, numbers and punctuation that are valid. The [ ] bit defines this as a list. The + at the end means one or more of these characters are needed to match the pattern. So this will match “az-987” but not “#”
Then there is an actual “@” symbol.
[a-zA-Z0-9.-]+ : This is followed by another range of letters, numbers and punctuation. Notice the list of valid punctuation is only . and - now. Do you remember what the + at the end means?
\. : Then there is an actual dot character. The dot by itself means something different to \. but that’s beyond what I want to cover here.
[a-zA-Z]{2,} : Finally there’s another [ ] list of valid characters. Notice there’s no numbers in the list. This time there must be 2 or more of these letters at the end.
So now you know. You’re a regex pro, well at least on your way there.
Can you think of other examples where you’d need to search for a pattern rather than an exact word?
What about searching for all occurrences of the $ symbol and ensuring that the number contains 2 decimal places.
I’ve give you a glimpse of how it’s possible to search for a pattern but not how to replace it with a pattern. That will have to be for another day. Comment below if you’d like me to take you through it.
I need more
Well, what do you say? I’m guessing it’s something similar to what I said 20 odd years ago.
“I never knew that was possible. That’s amazing!”
There’s so much to learn in tech that it can bamboozle a sane minded person and can feel overwhelming a lot of time. But just knowing what is possible is a huge leg up and opens many doors.
If you would also like to receive my tech tips and insights through this newsletter, join the hundreds of other smart people who absolutely love it today.
If you enjoyed reading this article:
share it with your friends 🔄
click the ❤️ button on this post so more people can discover it on Substack
leave a comment 💬 giving your opinion