Removing HTML from a Cell in Google Sheets

Last updated on: Published by: Remedy667 0

So, I do a lot of editing website data and HTML in Google Sheets. A lot of the time it’s a lot of old stuff that someone else populated, sometimes it’s been extracted from a vendor website, and it’s almost always in need of some sort of repair.

Recently I came across this beauty:

Excessive HTML in a Google Sheets Cell

Which made me think about a way that I would first need to wipe all of the HTML. Usually I would just do a Find and Replace for <br /> or <p> tags and knock them out one at a time, but as you can see from the image about. This is a particularly special case, and I found this solution over on Stack Overflow that I’d like to share here for my own personal use, and to hopefully help out anyone else who experiences this problem.

In the Stack Overflow post, they mention 2 ways of accomplishing this, but I’m only going go over one of them (and modify it a bit):

=REGEXREPLACE(A1,"</?[^<>]*>","")

So looking at what Google says about this tag (REGEXREPLACE(text, regular_expression, replacement)), it’s pretty easy to see what we’re doing. Here, text refers to the cell we’re searching for HTML in (here we used A1, but the cell in my example image would have been K14625), and replacement is telling it to be empty, but what’s going on in the regular_expression?

  • </? = Finds all lesser than symbols to make them the starting point
  • [^<>] = Negates the character class, causing it to match a single character not listed in the character class (greater than & lesser than).
  • *> = Finds all greater than symbols to make that the stopping point (the asterisk helps it to ignore everything after the greater than)

If you go back into the original post, you’ll see that I removed the \S+ in the middle which i believe tells it to ignore whitespace, but that wasn’t relevant for what I was trying to accomplish, and my expression seemed to work just fine. If you’re experiencing problems, try it with the \S+ which can be found using the link above.

Counting in Google Sheets for SEO

Last updated on: Published by: Remedy667 0

So, I’m working on this project where I’m optimizing descriptions for a client using Google Sheets. I need these descriptions to follow a very specific format, and to be sure that they do… I’m using Google Sheets. I’m also using it because I’m going to be populating them through an import with a CSV file. It just makes sense to do it all in one place.

Here are my format restrictions:

  • Description #1 must be over 200 words (with other content being about 120 words, so 80)
  • Description #1 must contain a specific keyword 1%-3% times
  • and the meta description must be less than 160 characters

Pretty simple stuff, just the start of a description and a meta description to accompany it. Here is the information I have been provided:

  • Custom URL fragment
  • Unedited Description

Find the Keyword

So, first to figure out my keyword. To do this I’m going to take my URL fragment:

/tools/power-tools-and-accessories/drills-and-drivers/impact-drivers/

I know my keyword is going to be “Impact Driver”, but I need Google sheets to know that. So I’m going to use the following code to isolate this phrase. My URL fragment is in position A2, because A1 is the title I will need for my CSV import.

I’ll figure out a cleaner way to do this in the future, but for now I need 2 cells. In B2 I’ll add this formula to convert the dashes to spaces, and remove the final slash on the right…

=LEFT(SUBSTITUTE(A2,"-"," "),LEN(A2)-1)

It’s super important that we remove the final slash with LEFT, otherwise the formula in cell C2 would return a value of nothing. Because it is returning whatever information is provided after the final slash.

=trim(RIGHT(A2,LEN(A2)-FIND("@",SUBSTITUTE(A2,"/","@",LEN(A2)-LEN(SUBSTITUTE(A2,"/",""))),1)))

Size Up the Description

Now we should have a cell (C2) containing our keyword, impact drivers, and it’s time to check our unedited description to see if it’s formatting complies with the restrictions I mentioned earlier. So, to test this, I’m going to past the current description in position D2.

DEWALT offers a full line of cordless impact drivers and wrenches for the professional user. Ranging in voltage from 9.6 volts to 20 volts, including the 12V MAX* and 20V MAX* impacts, DEWALT has the right cordless impact driver or impact wrench for completing a wide range of fastening applications. When looking for an impact driver or wrench with innovation, performance and durability to withstand the rigors of the jobsite, DEWALT cordless has you covered.

This will ultimately become our description (D2) & meta description(H2) for this page in our website, but first we need to see how they stack up. In position E2, we’re going to see how many words are in our description, and to do this we need a pretty simple tag that will essentially just count up the spaces.

=COUNTA(SPLIT(D2, " "))

The next formula for position F2 is going to tell us how many times our specific keyword is used. I’m using SUBSTITUTE because I have 2 words combined as a phrase to use as my keyword, and I’m using LEFT to remove the plural, “s” so that I can get the singular version of the word(s) as well.

=COUNTA(SPLIT(SUBSTITUTE(D2,left(C2,len(C2)-1)," &&&&& "),"&&&&"))-1

Figure Out the Percentage

Then, G2 is where we will simply calculate the percentage. This column will need to be formatted as a percentage by either clicking the “%” button, or going through the Google menu to Format > Number > Percent.

=F2/E2

Meta Description Optimization

We know know all the information we need to modify this paragraph and make our description. So, the meta description should be pretty easy from here. Once you’ve got the description how you like it, copy and paste it (D2) into the meta description cell (H2). Then, in cell I2 place this formula:

=len(H2)

…which will tell you how long it is. Now all I have to do is chop up the copy until it’s just under 160 characters.

I know it’s a pretty roundabout way of explaining this, but this is how I’m going about it for this project, and like I said… I’ll likely figure out a better way of doing this in the future, and I’ll make sure to redirect people to that page when I do. Thanks for reading along, feel free to comment below with any hints or criticisms.