{"id":"gevlkblefwtt8lu","title":"Google Called It a Soft 404 — What Crawlers See in Your Journal Prerender Shell","slug":"journal-soft-404-prerender-body","summary":"Google flagged my journal posts as Soft 404 because crawlers saw title and summary only — not the article body. Here is the curl test, the prerender shell fix, and the GSC validation workflow I use before requesting indexing.","imageUrl":"https://briancrabtree.me/images/journal-journal-soft-404-prerender-body.webp","category":"Web Development","date":"2026-06-09T18:00:00.000Z","featured":false,"likes":8,"author":"Brian Crabtree","content":"<h2>Soft 404 is not a real 404</h2>\n\n<p>A Soft 404 is Google’s label for a URL that returns HTTP 200 but looks empty, generic, or like an error page. Your server thinks everything is fine. Search Console disagrees. The page may sit in Page indexing under Soft 404 while URL Inspection’s live test sometimes says the page can be indexed. Both can be true at once — the index tab reflects an older crawl; the live test sees today’s HTML.</p>\n\n<p>That mismatch is why I stopped trusting green checkmarks in Chrome and started with <code>curl</code>. Real 404 means not found. Soft 404 means “you sent a success status but gave me nothing worth indexing.” For a React journal on a prerender shell, the failure mode is subtle: correct canonical, correct title, hero image, meta description — and almost no article prose in the first HTML response.</p>\n\n<h2>Why DevTools lies (again)</h2>\n\n<p>After hydration, journal posts on <a href=\"https://briancrabtree.me/\">briancrabtree.me</a> look complete. The router mounted <code>BlogPost</code>. The public API returned full <code>content</code>. Headings, code blocks, internal links — all there. View source in a normal browser session often shows the hydrated DOM, not the bytes nginx served on first request.</p>\n\n<p>Disable JavaScript or fetch the URL without executing the bundle:</p>\n\n<pre><code>curl -sL \"https://briancrabtree.me/journal/my-post/\" \\\n  | rg -i '&lt;title&gt;|prerender-post-summary|prerender-post-prose|&lt;h2'</code></pre>\n\n<p>Before this fix, that command showed title, hero, and a speakable summary paragraph — roughly 190 characters. No <code>&lt;h2&gt;</code>. No article body. Firestore and the public API still held six to seven kilobytes of HTML per post. Crawlers that do not wait for the client fetch saw a thin page. That is exactly the Soft 404 pattern Google warned me about in email when validation failed on earlier “fixes.”</p>\n\n<p>I wrote about the earlier failure mode — missing per-slug shells and homepage canonical on every URL — in <a href=\"/journal/react-spa-crawled-not-indexed-fix/\">React SPA Crawled, Not Indexed: The Fix That Actually Moved GSC</a>. Fixing canonicals and deploying one hundred two slug directories under <code>/journal/</code> was necessary. It was not sufficient. The shell still hid the body behind JavaScript.</p>\n\n<h2>What the shell had right</h2>\n\n<p>The prerender pipeline from <code>injectJournalMetaCli.js</code> already did the hard SEO work: per-post <code>&lt;title&gt;</code>, meta description from the summary file, canonical with trailing slash, Open Graph and JSON-LD, LCP hero preload, and a journal layout that mirrors the React article chrome. Live tests passed. Indexing requests went through — until Google re-crawled and decided the HTML was still too thin.</p>\n\n<p>Two posts showed up repeatedly in my GSC notes: <a href=\"/journal/everything-your-website-needs-to-actually-succeed/\">Everything Your Website Needs to Actually Succeed</a> and <a href=\"/journal/lab-vs-field-data-pagespeed/\">Lab vs Field Data — When PageSpeed Scores Mislead You</a>. Both returned 200 with correct head tags. Both had full content after hydration. Both triggered Soft 404 on stale crawl data, then failed validation when Google expected more substance in the initial response.</p>\n\n<h2>The fix: embed the article in the shell</h2>\n\n<p>The change is small and mechanical. At build and publish time, pass the same sanitized HTML stored in Firestore into <code>buildJournalPostShellBody</code>. Inject it after the speakable summary:</p>\n\n<pre><code>&lt;div class=\"page-post-body prerender-post-prose\"&gt;\n  …full post HTML from content/journal/{slug}.html…\n&lt;/div&gt;</code></pre>\n\n<p><code>sanitizeJournalShellHtml</code> strips scripts, inline event handlers, and <code>javascript:</code> URLs. The shell stays <code>aria-hidden=\"true\"</code> — it is for crawlers and first paint, not duplicate interactive UI. React still hydrates and owns navigation, likes, and admin-driven updates. The shell is the crawl contract.</p>\n\n<p>This is the same philosophy as <a href=\"/journal/static-prerender-shells-spa/\">Static Prerender Shells: SPAs That Paint Before JavaScript</a>, pushed one level deeper: not just meta and hero, but the actual article prose Google uses to judge whether a 200 response is a real document or a placeholder.</p>\n\n<h2>GSC workflow before Request indexing</h2>\n\n<p>When Search Console emails that Soft 404 fixes failed, do not spam Request indexing on thin URLs. That burns daily quota and trains nothing useful.</p>\n\n<ol>\n  <li>Deploy the shell fix and verify with <code>curl</code> — expect <code>prerender-post-prose</code> and multiple <code>&lt;h2&gt;</code> tags in the first response.</li>\n  <li>Run <code>npm run verify:edge</code> so Cloudflare challenge does not poison live tests.</li>\n  <li>URL Inspection → Test live URL on affected posts — confirm “Page can be indexed.”</li>\n  <li>Page indexing → Soft 404 → <strong>Validate fix</strong> on the issue row, not only per-URL nudges.</li>\n  <li>Wait for validation to move to Started or Passed before batch indexing requests.</li>\n</ol>\n\n<p>Validation can take days. The Page indexing report may still show a last-updated date from weeks ago while live tests are already clean. That lag is normal. The email about failed validation means Google re-crawled and still did not like what it saw — which, for me, meant summary-only shells.</p>\n\n<h2>Checklist I run on every journal deploy</h2>\n\n<pre><code>- [ ] curl slug URL — prerender-post-prose contains h2 + paragraphs\n- [ ] canonical and title match sitemap trailing-slash URL\n- [ ] public API /api/public/posts/{slug} returns content field\n- [ ] injectJournalMetaCli wrote dist/journal/{slug}/index.html\n- [ ] full deploy (not --no-build) — journal dir count >= 50\n- [ ] Soft 404 Validate fix submitted in GSC after prose deploy\n- [ ] indexing requests only after validation gate + verify:edge</code></pre>\n\n<p>For performance discipline on the same stack — head order, CSS before module JS, prerender hero — see <a href=\"/journal/website-performance-audit-2026-react/\">Website Performance Audit Checklist (2026)</a>, <a href=\"/journal/react-css-before-module-js-pagespeed/\">Why Your React Site Loads Unstyled</a>, and <a href=\"/journal/performance-checks-before-handoff/\">Performance Checks Before Handoff</a>. Soft 404 is an SEO crawl signal, not a Lighthouse score. Fix the HTML Google fetches first; then chase indexing quota.</p>\n\n<h2>When this is your bottleneck</h2>\n\n<p>If Search Console shows Soft 404 on URLs that look fine in Chrome, send the production slug and the output of the curl command above at <a href=\"/contact?ref=journal-soft-404\">/contact</a>. I will tell you whether you need prerender body injection, canonical repair, or something else entirely. See <a href=\"/services/\">services</a> for how I scope React SEO and deploy hygiene.</p>","tags":["google-search-console","soft-404","prerender","react-spa","technical-seo"],"views":36}