Removing HTML from a Cell in Google Sheets

Last updated on: Published by: Remedy667 0

So, I do a lot of editing website data and HTML in Google Sheets. A lot of the time it’s a lot of old stuff that someone else populated, sometimes it’s been extracted from a vendor website, and it’s almost always in need of some sort of repair.

Recently I came across this beauty:

Excessive HTML in a Google Sheets Cell

Which made me think about a way that I would first need to wipe all of the HTML. Usually I would just do a Find and Replace for <br /> or <p> tags and knock them out one at a time, but as you can see from the image about. This is a particularly special case, and I found this solution over on Stack Overflow that I’d like to share here for my own personal use, and to hopefully help out anyone else who experiences this problem.

In the Stack Overflow post, they mention 2 ways of accomplishing this, but I’m only going go over one of them (and modify it a bit):

=REGEXREPLACE(A1,"</?[^<>]*>","")

So looking at what Google says about this tag (REGEXREPLACE(text, regular_expression, replacement)), it’s pretty easy to see what we’re doing. Here, text refers to the cell we’re searching for HTML in (here we used A1, but the cell in my example image would have been K14625), and replacement is telling it to be empty, but what’s going on in the regular_expression?

  • </? = Finds all lesser than symbols to make them the starting point
  • [^<>] = Negates the character class, causing it to match a single character not listed in the character class (greater than & lesser than).
  • *> = Finds all greater than symbols to make that the stopping point (the asterisk helps it to ignore everything after the greater than)

If you go back into the original post, you’ll see that I removed the \S+ in the middle which i believe tells it to ignore whitespace, but that wasn’t relevant for what I was trying to accomplish, and my expression seemed to work just fine. If you’re experiencing problems, try it with the \S+ which can be found using the link above.

Finding and Fixing “Unwanted Dates” with Fractional Data in Sheets

Last updated on: Published by: Remedy667 0

So, I’ve been tasked with sorting a bunch of data that all has to do with measurements… the only problem is, some of it is fractional, some of it is decimal, and some of it is metric. Which creates a couple of separate issues but right now I’m just going to talk about checking for a specific character in a cell.

Decimal Conversion Chart

So, I’ve got this fancy chart for doing my conversions, and my plan is to use the FILTER function so that I can convert them all to decimal… because Sheets does some weird stuff with fractions. For this example I’m going to add the inch symbol to my fractional using this formula:

=A2&”””

This returns the contents of cell A2, and adds the inch marker to the right, like so.

Perfect. Right? What about the fractions of less that 31 that could be easily read as dates?

Highlighted Cells that give undesired result.

Now we’re just going to take the visually simplest route to figure out which cells need fixed, because they all sort of blend together. I’m going to use this formula:

=IF(SEARCH(“/”,CELL),””,””)

You don’t need the IF function, I just wanted any cells that contained the forward slash to return no result. Anything that doesn’t will error out, like so…

Use #VALUE! error to see cells that don’t fraction

So, now that we have those blocks sorted out we can use a different formula on them, to clear the year from their date formatting, and replace it with the inch mark.

=SUBSTITUTE(CELL,”/2020″,””)&””””

Giving us the desired result… and fixing the #VALUE! error…

The story goes on from here, as I begin sorting my blend of Fractional, Decimal and Metric numbers, but I won’t bore you with that for now.

Split & Trim with Google Sheets using ArrayFormula

Last updated on: Published by: Remedy667 0

The issue is white space.

=split(CELL,”DELIMITER”)

Using the standard SPLIT formula, to get an unwanted white space at the beginning of each cell. Now, I don’t know about you, but that drives me absolutely nuts, and I’m always forgetting the fix. Ultimately, running through this sequence… over and over and over…

In the following example, I’m attempting to use a formula to split the Category column (K), into its subcategories.

Using the standard SPLIT formula you can see, I’m getting an extra white space at the beginning of each cell in columns D, E & F.

=split(K8,”||”)

It seems that adding an extra space to the end of the delimiter would fix this, only that splits it at every space. I still try to do this sometimes, even though it always fails.

=split(K8,”|| “)

Using TRIM with the formula only returns the data up to the first SPLIT, though I frequently repeat this mistake.

=trim(split(K8,”||”))

We’re so close here, we’re just missing the ARRAYFORMULA function. (ArrayFormula) “Enables the display of values returned from an array formula into multiple rows and/or columns and the use of non-array functions with arrays.” So, in this case, it basically allows you to apply the TRIM function to each cell populated by your SPLIT.

=ArrayFormula(trim(split(K8,”||”)))