A soft 404 error is one where the website shows a 404 Not Found error page BUT then completes with a HTTP status code 200 (OK). Here’s how to fix the issue.
Google doesn’t like soft 404’s. It refers to them as Cypto-404’s. Google realises that most of these cypto 404’s are the result of a cock-up rather than a deliberate attempt to mislead. But they do regard them (at best) as bad practice and at worst black hat SEO.
Evidently they treat the missing page as a 404 Not Found but also then potentially penalise the website for having the temerity for issuing a 200 status code for the missing page.
404 Not Found Error on a Static site
The standard way on a on a static site to issue a customised 404 (without a HTTP 200 success code) is to use the .htaccess file to redirect the 404 to a custom 404 page.
This can then chastise the user for entering the wrong URL. This custom 404 page can also give the user some other links into your site to try.
Anyway to implement this in .htaccess is simple – you just add this line to your .htaccess file.
ErrorDocument 404 /404.php
or if it is flat html
ErrorDocument 404 /404.html
Obviously you do also need to build a 404.php or 404.html page in your website root directory.
Simple eh? In fact it was so simple I thought I would “improve” it.
Sometimes even I despair about my stupidity myself. Whatever you do DO NOT do this:
ErrorDocument 404 https://www.yoursite.com/404.php #don't do this!
But the error page is fully pathed! What could possibly go wrong?
Well, just about everything. If it is fully pathed then it leaves the site and re- accesses the 404 page as if it were a redirection. i.e. like a normal page. In other words it accesses the 404 page and returns the dreaded HTTP status code 200 (success).
Never Ever fully path to the URL for a 404 page. It defeats the whole point of the exercise.
So what about 404 Not Found on WordPress sites.
In the past I believe they were particularly bad for returning the HTTP 200 after a 404. But it looks like that is now history. Without modification with a number of different themes the 404.php file in wordpress handles the 404 correctly for me. (i.e. no HTTP 200 success)
If though you find differently with your site then you can add this to the very top of the 404.php file:
<?php header(HTTP/1.0 404 Not Found) ?>
If your WordPress site doesn’t have a 404.php file then maybe it is so old you need to look at updating the theme. But in the interim you could build your own and then get .htaccess to route to it as if it were a static site.
404 Not Found – How to Test
You may well be asking how to test this beyond typing in a garbage URL and seeing it land on your 404 page or not. How can you see the returned statuses easily?.
Well, there is a very nice resource here http://httpstatus.io and it is free. It might be nice to chuck them a couple of Dollars/Quids/etc if they really help you out via their “donate” link.
Put your garbage URL into the field and hit “Check Status”
If the 404 returned correctly it’ll look like this:
If however you have a soft 404 (or as Google calls them – a crypto 404 ) You will get something like this:
In this case it thinks it is a 302 (temporary) redirect – followed by the dreaded HTTP 200 success. This is in fact the result of the cock-up I described above with the htaccess 404 full pathing.
This is a totally fictitious page. But if there had been a page that Google had previously indexed (or was listed in sitemap.xml) and it was missing then the cock-up would return exactly the same statuses as the above duff page did. In Google Webmasters Search Console this page would be flagged as a soft 404 in the coverage section.
Anyway the http://httpstatus.io tool is rather nice eh? Give the guy(s) a hand!