Total Recrawl: My Experience With the “Japanese Keyword Hack”

arnold schwarzenegger in total recall

I had some fun towards the end of last year when a client of mine was an unfortunate victim to what is often known as the “Japanese keyword hack“, whereby autogenerated content and fake/cloaked affiliate pages are spun up in the host website and are often quickly indexed in search.

I take some umbrage in the sole and somewhat biased use of the word “Japanese” here to describe this event, as in my experience the hack came in the form of a variety of character-based languages, though this seems to be the name used across the web.

The main goal of the hack is to assumedly monetise any subsequent traffic to target pages during the short period in which the pages may be live. This may seem a bit old school and farfetched in lieu of modern-day spam practices, however monetisation or not, the adjacent impact of the indexation of thousands of spammy pages and short-term tanking of existing rankings can have a pretty traumatic impact SEO-wise. The hack also deploys heavy use of cloaking in SERPs, which of course left unchecked and unresolved for some time, could leave you in trouble.

There is already plenty of documentation out there detailing what is seemingly a rather common hack that appears to primarily target WordPress websites with security vulnerabilities. Aside from the web.dev article I linked to above, there are thorough and useful guides on what it is and how to fix it here and here which proved useful in the identification, solution and damage limitation phases of my one particularly fun week in September last year.

I’d like to take you through my experience with it though, which I hope may prove useful for other SEO consultants or WordPress webmasters potentially facing this issue.

How did I know something was wrong?

The proof in the pudding came relatively quickly. These hacks aren’t hard to identify if you’re carrying out your regular day-to-day SEO checks and assessments, particularly on WordPress sites. However, I can appreciate how they may be missed by some and left to fester for longer than desired.

With this client engagement in particular, I have full WordPress admin rights and a lot of my day to day involves jumping in and fixing/optimising things. The first sign that something was amiss was when I was presented with a 404 error upon attempting to log into WordPress. This has happened before, though the real confoundment came shortly after when I jumped into Google Search Console and saw a massive spike of indexed pages over the last day.

I’d share a screenshot of this, but the native pages indexation report in Google Search Search console only goes back a few months. Here’s the current “Not Indexed” report, however, to give you an idea of what things looked like in reverse back in September on what is a relatively small site:

Looking at the current “Crawled – currently not indexed” report also provides insight as to what things looked like, and indeed a view on the nature of the URLs that were being generated by the hack:

Ignoring my comically-bad blacking out of the client domain name (my artistry lies primarily in music and writing) you can see what I mean, with some patterns emerging under certain /search/ and /?s= subfolders and parameters that became important to the fixing stage of the exercise later on.

(Thankfully, despite the “Crawled – not currently indexed” report still returning some hefty numbers in terms of URLs not indexed, the offending examples from the hack haven’t been re-crawled since September 2023, which is when the incident occurred and was subsequently fixed. Phew.)

I then turned to a third-party keyword reporting tool which then inevitably showed a huge spike in “organic traffic” over the previous day. I was tempted to share a no-context hockey stick graph on LinkedIn to highlight my SEO prowess, though the cynicism inevitably crept in. This spike was caused by the site quickly starting to rank for affiliate type keywords in Japanese (and other character-based languages) selling luxury clothes, makeup and laser hair removal services. Despite the goldmine of fresh rankings, this wasn’t in line with the E-EAT mission of the site that we’d been pursuing so diligently.

A quick site search on the domain then presented me with further evidence in the form of cloaked content taking the guise of this affiliate content on the website’s top traffic URLs. I don’t have a saved example of what this looked like for this site in question (sparing you more clumsily blacked-out domain names), but here’s what it kind of looked like courtesy of this blog:

I was impressed at how quickly the spun-up content became indexed. We’re talking less than a day here; with the hack occurring overnight. Once I was able to get back into WordPress however, the creation of several spammy sitemaps (among other things) usurping our previous sitemaps helped answer this. There was also the creation of several mysterious-looking user profiles that had played their part in the injection of the relevant malware that made this happen so quickly.

These files weren’t hard to find within the WordPress dashboard, and the client’s WordFence plugin was also in danger of going into meltdown. As if I needed further evidence, it was time to act.

What Did We Do To Resolve This?

After gathering all the confounding evidence, it was time to get on the phone with the hosting provider. Some of the aforementioned guides on this hack give you detailed instructions on how to replace and update the corrupted .htaccess file and manually remove the malware scripts.

I only really had to ensure that the latter of the two was properly done following a brief interaction with the hosting provider however, as a lot of the resolutions on the website side of things came from a simple restoring of the site to a back-up version stored a week before. We hadn’t made huge changes to the site going back to this period, so the solution was obvious, and this then removed most of the relevant malware from the site, bar a few miscellaneous lingering scripts which I had to manually purge as mentioned.

There was a lot more clean-up work to be done from an SEO standpoint however, as the damage was already present in the SERPs and from Google’s point of view as a whole. So without further ado, here’s a quick-fire list of actions I took to fight back:

  • Refreshed the pre-existing sitemaps to replace the now-deleted spammy ones that sped up the indexation of the hacked content, then submitted to Google Search Console
  • Created rules within WordPress to 410 (Gone) the aforementioned URL parameters and subfolders (/search/ and /?s=). Almost all of the offending spammy pages fell under these patterns, and there was no preexisting internal site search so nothing was at risk.
  • These strings were also blocked in the client’s robots.txt file.
  • Daily monitoring of SERP updates to see what was still ranking in terms of the cloaked content. To help speed up the indexation of the previous URLs’ content, I made some small updates to the content of the URLs still suffering and then resubmitted them to Search Console to help accentuate recrawling.

There were plenty more minor steps I had to take across this period, but these were the main actions that were most effective at clearing up the damage done in the SERPs. Some of these guides recommend deploying the use of the URL removal tool in Google Search Console, but I found the 410 solution to be more effective here. My initial response here was actually to deploy 404s rather than 410s, though I was dissatisfied with the time it took Google to drop these out of the index and stop crawling so I moved to the harder solution.

Was There Any Tangible Long-Term Impact?

In this instance, Google was in general well-behaved in recognising the site refresh and the quick steps taken to send the right signals in terms of URL indexation. Things were back to normal within a week, and despite some scary short-term keyword ranking and traffic nosedives, this wasn’t long-lasting and SEO recovery stabilised and returned to its status quo prior to the hack. We were able to act quickly, however I have heard stories of more complex, similar hacks hitting sites and doing real, permanent damage to businesses.

Somewhat ironically (and this is where I get to deploy the brilliant pun “Total Recrawl”, which I have funny SEO Twitter user ohgm to thank for), the site actually benefitted shortly after from the dramatic increase in crawl requests during the hack and shortly after while I was hurriedly fixing things.

As we made content updates to the hacked URLs to encourage Google to return the original uncloaked versions to the SERPs, this invariably saw a few visibility bumps on refreshed content across a few keywords we were targeting. Bad luck, hackers.

How Can We Prevent This From Happening?

The solutions to stop this from happening are relatively obvious and well documented (see again guides). Notwithstanding my lack of hacking knowledge and solely non black hat SEO methods (😇), I believe the vulnerability in this case came from an out of date security plugin that didn’t have its plan renewed.

This was rectified quickly of course, as were additional security measures such as relevant IP blocking of the aforementioned nefarious users as well as the creation of fresh user profiles with password resets and fresh two factor login authentication across the board.

I suspect these hacks are becoming much less common these days, though you do sometimes come across the odd occasion, reflected largely in the continual updating and publishing of blogs and guides around the topic (like this one).

Thanks For Reading.

I hope this piece was useful or insightful in some way.

Now as you may know, I’m a Chinese speaker, and this allows me to be able to read roughly one third of the Japanese written language (kanji, one of three Japanese character types, which uses Chinese characters). I now feel a strong urge to purchase a laser hair removal kit from a dodgy affiliate page.

Stay safe out there.