Introducing REGEX Excel Functions - Extract, Clean, and Format Data Easily! (NEW!)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Microsoft just announced new REGEX functions in  Excel, and they're going to make your life a lot   simpler. So, what exactly is REGEX? It stands  for Regular Expressions. It's a powerful way to   search and manipulate text. Before you say, "This  is probably for some geeky Excel users. I'm never   going to understand this. I'm never going to need  this," wait because you're probably going to come   across cases where you will need exactly this  function. So, just imagine your boss sends you   a list of data, and you need to find all the email  addresses in those cells, hidden in the middle of   all this text. Now, instead of going through this  manually and copying and pasting email addresses   for each single cell, you'll use the REGEXEXTRACT  Excel function to get all the instances of email   addresses in a dynamic way. So here, if I add "and  also info@xelplus.com," we see the second email   automatically extracted. You can use this function  to extract any text between brackets. It doesn't   matter if the brackets are in the middle or at  the end. You can use it to extract dates in the   middle of text, even multiple dates from a single  cell. You can extract website addresses from long   sentences. You can break up text that's stuck  together like we have here. So as you can see,   whenever we have a pattern—and a pattern  is anything that you can explain logically   to someone—you can use the REGEXEXTRACT  function to extract that pattern. Now,   sometimes you might want to replace text based on  a pattern. You might download data from an app,   and it comes with special characters, and you want  to get rid of those. You can use the REGEXREPLACE   function to replace the characters you don't want,  so you end up with clean data. You can use this to   get rid of leading zeros in your data. You can  also format numbers, like credit card numbers,   by adding a dash after every fourth digit.  Anything you can explain that has a logic to   it can be replaced or extracted using these two  new Excel functions. Now let's work through these   in Excel from scratch, and I'm going to show you  how the functions work. You can grab the file from   the link in the description of this video. If you  don't have these functions yet, don't worry. It's   a beta feature, and it's going to take some time  until it rolls out to everyone. I've also included   a REGEX cheat sheet to your downloadable file, so  in case you want to know how this function works,   you have all you need here. But let's just  quickly do some examples. We're going to   extract email addresses from text. In this case, I  have information about some projects and the next   steps, and sometimes I have one email address,  sometimes I have two email addresses. I want   to extract these email addresses and put them in  a separate column, and doing this using standard   Excel functions that we had access to until  now is not going to be easy. But from now on,   it is because we have the new REGEX functions. The  function that we need here is the REGEXEXTRACT.   Why? Because we want to extract a piece of  information from this text. Now, this function   just requires two things that are mandatory; the  rest are optional. So, the first thing it requires   is the text. This is where we want to extract our  information from, which is this one. Next is the   pattern. Now, the pattern is something you  provide inside quotation marks. If I just   put nothing there and press Enter, I get nothing  returned. Now optimally, what I'd love to be able   to do is to tell it, "Extract email addresses," or  just describe the pattern, like, "Match a sequence   of letters that's followed by the '@' sign and  extract everything until you get to the dot and   everything after that until you get to a space,"  or just some type of logic, right? But you can't   do that. What you need to do is give it a Regular  Expression pattern in the way, in the code,   that it understands. Now, Regular Expressions  have been around for a while. They're part of most   programming languages. So, the good news is that  AI knows REGEX pretty well, so what you can do is   get the help of AI. And that's what I did in this  example. You can use Copilot or ChatGPT. So here,   I'm in Copilot, and I told it, "Give me the  REGEX pattern to extract email from text."   So if I have "Great progress, please email Dave  at dave@thecompany.com for the latest update,"   I want to return "dave@thecompany.com." So it's  good practice to give it an example and what you   want returned. Then Copilot went ahead and said,  "To extract an email address from a given text,   you can use the following Regular Expression."  That's the expression. All I need to do is   copy it, go back to Excel, and put it inside  the quotation marks, press Enter, and voila,   it works. Let's just drag this down. What I get  here is the first email address, but I want all of   them. This is where we can take advantage of the  optional arguments. Here we get the first match,   which was by default, right? If we don't add any  optional arguments, we just always get the first   match. But we can switch this and get all matches.  So if I put a 1 here, it should return all   matches. If I drag this down, this spills. I get  "support@helpme.com" and also "joho@helpme.com,"   but it spills downwards, right? So the problem is  if I want to drag this formula down, I'm going to   get a #SPILL! error. I'm going to work around  that. So what I'm going to do is put this function   inside the TEXTJOIN function. With TEXTJOIN,  I can define the delimiter. So how do I want   the results to be separated? I want them to be  separated with a comma and space. Then if I want   to ignore empty cells or include empty cells, I'm  just going to go with ignore empty cells. Okay,   then for the text, that's going to be our REGEX  argument. Let's close the bracket and press Enter.   I'm just going to double-click to send this down  until here. Let's double-click again to send this   all the way down. So now everything looks good,  except whenever we don't have an email address,   we get the #N/A error. So I'll just wrap this  inside the IFERROR function. If we have an error,   we want to return nothing. Close the bracket,  double-click to send this down. The great thing   is because it's a function, everything is  dynamic, right? So if we add "and contact   info@xelplus.com," our results update immediately.  Now, the pattern does look crazy for anyone who   isn't familiar with REGEX. You can ask Copilot  or ChatGPT to step through this. You can also   reference our cheat sheet to become familiar  with what these letters and symbols mean. Now,   once you get the hang of it and you understand the  basics, it becomes easier to describe your pattern   in this way. But honestly, AI works really well  here, so why not take that shortcut? Next, let's   take a look at how we can extract text from inside  brackets. So here, for our project description,   we have parts of the description inside brackets,  and I want to grab that part and put it separately   here. So because we're going to extract something,  we need the REGEXEXTRACT function. The text is   here. The pattern, quotation, quotation, because  I have no idea, I'm going to go to ChatGPT this   time and ask it for help. Over here in ChatGPT, I  gave it the prompt to give me the REGEX pattern to   extract text from inside brackets, and I gave it  one example and what I want returned. The pattern   it gave me is this. Optimally, I would just ask  it for the entire Excel function, but ChatGPT,   until now, has no idea that Excel has gotten  these functions, so we're just going to go   with the pattern for now. I'm going to copy this,  and I'm going to put the pattern in here. Press   Enter, and I get the content that I want. Now, the  cool thing is instead of just referencing B2, I   can also make everything spill by referencing the  entire range. Now, you might be wondering, "But   I don't want these brackets. I just want the text  inside the brackets." Here, we can take advantage   of the other optional arguments. Here, we can  return the first match, which is the default,   or all matches, which we saw before. Now, we could  select 2 instead, which captures groups of the   first match. Now, when I put a 2 and press Enter,  I get the content that's inside the brackets. Now,   if you're confused about what this argument really  does, the capture groups of the first match,   let's take a look at our next example. We want  to extract dates from the middle of text. So,   for example, here we have different project  milestones, and we have some dates mentioned.   I want to grab these dates and split them into  separate year, month, and day columns. Notice   sometimes the months have one digit, sometimes  they have two digits. The same for days. The   pattern should account for that. Now, because  we want to extract something based on our logic,   we're going to use the REGEXEXTRACT. Text is  this. The pattern that I found is this. Close the   bracket, press Enter, and we get the dates back.  What this basically means is look for four digits,   then either one or two digits for the months, and  then one or two digits for the days. Now remember,   I want them separately though. This is where I  can use the optional argument, Capture groups   of first match. Each of these is a separate  group. So when I put a 2 there and press Enter,   I get each group separately. The problem is  they spill vertically. I want them to spill   horizontally. So what I'm going to do is put  this inside the TOROW function, press Enter,   and drag this down. And I have my dates split into  a separate year, month, and a day column. Now,   what if I have multiple dates and I want to  extract all of them? For example, here I have   three separate dates. I want each date separately  in its own cell. So what I'm going to do is use   the exact same function that we had before. So my  text is this. My pattern is exactly what we had   before, except for return mode, I'm going to go  with all matches. So I'm going to put a 1, close   the bracket, press Enter. Now as I start to pull  this down, notice we can see again that everything   is spilling vertically, right? So this is going  to give me the #SPILL! error. What I want to   do is spill these horizontally, so I'm going  to use the TOROW function. Close the bracket,   press Enter. Now I can easily drag this down.  If you wanted all the dates in the same cell,   you can use the TEXTJOIN function. Now, how about  extracting web addresses from text? So here I'm   dealing with an Excel table. We can also see how  these functions work in tables. I have a feedback   column. The feedback includes web addresses.  Sometimes these addresses start with "https,"   sometimes they start with "www.," and sometimes  they're just "itcrowd.com." I want to write a   function here that extracts these. So we're  going to go with REGEXEXTRACT. The text is here.   The pattern in quotations is this. Now, if I just  close the bracket and press Enter, I get the first   web address back. In general, I think I just have  one, except this one where I have "ERORR404.com"   and "google.com." But notice I'm not even  getting the "ERORR404." There, any uppercase   letters are missing. Why? Because my pattern  doesn't include uppercase. So I could go and   update this to include uppercase, or I can take  advantage of that last argument in the function,   which accounts for whether I want everything to be  case sensitive or case insensitive. So by default,   everything is case sensitive. I'm going to go with  case insensitive, and now I get everything back.   But again, notice I only get the first web address  back. If I want to get more than one and return   all matches, I have to put a 1 in this argument.  But the moment you do that, you're going to end up   with a #SPILL! error because everything is spilling  vertically. What you want to do is put this inside   a function that can bring everything together.  So I'm going to use the TEXTJOIN function,   split everything based on the comma delimiter,  and close the bracket and press Enter. And now   we get everything back. Now you might find that  you need to update your patterns because something   might not work. So, for example, here, if I  have "/courses," the entire web address isn't   extracted. I will need to update that pattern.  Okay, so let's just reverse that, and I've done   it over here. So now we get to extract the entire  web address with this pattern. So you're going to   find yourself adding a pattern, and it works on  the data set, but then when new data come, maybe   it doesn't work on that. Figure out how you can  update it so it works for all cases. And again,   you can use AI for this. Split text that's stuck  together into multiple words. That's another thing   you can do with the REGEXEXTRACT function. So here  I have employee names that are stuck together,   but notice there's a pattern. The first letter  of the last name is always a capital letter. I'm   going to use that pattern as my REGEX pattern  and use that to create proper names. So even   if I add another name to this, the name  of my dog, it ends up as a separate word.   Now, how about using REGEX to replace leading  zeros? So here we have SAP product code numbers,   and they have leading zeros. Sometimes it's three,  sometimes two, sometimes one, and we just want to   see the number. We want to get rid of these. So  this time, we can use the REGEXREPLACE function.   Here it is. The text is this. The pattern that  we want is this in this case. Now we have one   additional mandatory argument. We need to tell  the function what we want to replace it with. Now,   I just want to get rid of the zeros, so I want  to replace it with nothing. I'm going to put   quotation, quotation, close the bracket, and press  Enter. And these are my values. Now, of course,   you can make this spill if you want. I could  select the entire range and make this spill.   These are actually currently seen as text and not  numbers, so if I want them to be seen as numbers,   I can put these inside the VALUE function. And  now I have my product codes without the leading   zeros. Cleaning up messy text is another great use  case for the REGEXREPLACE function. So here, for   my project descriptions, sometimes I have emojis,  sometimes I have special characters that I want to   remove. I want to get the description without  the junk. I'm going to use the REGEXREPLACE   function. The text is this. The pattern I came  up with is this. I'm going to put it inside   quotations. The replacement is nothing. If I want  everything to spill, I can actually reference the   entire range. So now when I press Enter, I get  everything returned clean. For performance,   that star is gone. For report, these special  characters are removed. Now, if a cell doesn't   have any special character, I get a hash. So I  could put this inside the IFERROR function. So   if I happen to have an error, I want to get the  project description returned. Close the bracket   and Enter. Now, if at any point in time you decide  that you want to ignore something, so for example,   I actually want to keep this ampersand, all I  have to do is add it to my pattern. So right here,   inside the square brackets, I'm going to put in  the ampersand. When I press Enter, the ampersand   stays. Insert a character at specific positions.  This is great for formatting credit card numbers   or phone numbers. So here I have a list of  credit card numbers. It's a mess. Sometimes   the number is seen as text, sometimes it's  seen as numbers, but because it's so big,   Excel shows it in this format. I want to format  it so I get a dash after every fourth character   but not a dash at the very end. So what I'm going  to do is use the REGEXREPLACE function. Text is   this. The pattern is this right here. I'm going  to put it in quotations. Now for replacement,   we're not going to go with nothing because if I go  with nothing, I just get the last four characters.   Instead, what I want to do is reference the  group that's being captured, which is every   fourth digit. And to do that, I can use the symbol  $1. So this references the first capturing group,   which in this case is my only capturing group, and  then I want a dash. And then when I press Enter,   I get these properly formatted. To find out  more about these symbols and how they work,   take a look at the REGEX Cheat Sheet that we've  included in our downloadable file. The link to   it is in the description of the video. Okay, so  this was my review of the new REGEX functions.   Let me know what you think in the comments  below. If you want to learn more about Excel,   if you're new to Excel, I have lots of videos on  this channel. I have some playlists that you can   check out. If you want to learn in a structured  way, check out my courses over on XelPlus.com.   Thank you for being here. Thank you for watching,  and I'm going to catch you in the next video.
Info
Channel: Leila Gharani
Views: 131,504
Rating: undefined out of 5
Keywords: XelplusVis, Leila Gharani, Advanced Excel tricks, Excel online course, Excel tips and tricks, Excel for analysts, Microsoft Excel tutorials, Microsoft Excel, Excel 2016, Excel 2013, Excel 2019, XelPlus, Microsoft 365, Excel 365, regex, regular expression, excel extract email, excel extract web address, extract url, format credit card numbers, format numbers, new excel functions 2024, regexextract, regexreplace, replace leading zeros
Id: YFnXV2be9eg
Channel Id: undefined
Length: 19min 21sec (1161 seconds)
Published: Wed May 22 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.