DTC Wayne

What actually caused Shopify product page fallback for Googlebot on locale subpaths

Wayne · 2026-05-04 · #shopify #i18n #seo #cloudflare #debugging

Site: KryoZon (www.kryozon.com), Shopify Horizon theme, Markets locale subpaths Time spent: about 4 hours (from first detection to the worker going live + all checks green) Fix cost: 1 Cloudflare Worker (about 80 lines of JS) + 1 DNS proxy switch + $0 operating cost


1. The starting point: a monitoring script exposed something odd

I wanted end-to-end verification that the locale subpaths were actually rendering localized pages. Before that, I had only checked translation completeness at the Admin API layer with detect_hidden. So I wrote i18n_live_diff.js. It simulated two clients, a user browser and Googlebot. It tested each one with and without Accept-Language across 6 locales × 5 key URLs. It checked <html lang>, hreflang, canonical, and whether the English sentinel showed up.

The first run already caught it:

Verdict
- user:      30/30 clean
- googlebot: 24/30 (6 红:全是 /products/...)

In Googlebot mode, every product page came back as <html lang="en">, with English content and a canonical pointing at the English URL. Homepage, blog list, collection, and pages were all fine. Only the product template was broken.

SEO impact: in Google’s stack, canonical has higher priority than hreflang. If the crawler gets a fallback response for a product page, it can treat /ja/products/X as a duplicate of /products/X instead. That means the 6 locale product URLs may never get indexed as separate pages. That would wipe out most of the multilingual SEO work.


2. Four rounds of wrong attribution

Round 1: I suspected the Googlebot UA was triggering bot protection

My first guess was that the Googlebot UA was sending Shopify down a different code path.

How I ruled it out: I manually tested 4 combinations: Chrome/Googlebot UA × with/without Accept-Language. It turned out UA had nothing to do with it. The deciding factor was only the Accept-Language header.

Round 2: I suspected the product translations were incomplete

The theory was simple. If meta_title / body_html / meta_description were missing in Translate & Adapt, Shopify might fail to find a translation and fall back to the source language.

What I did: I wrote a script and scanned all 9 products × 6 locales × 6 fields. I found that the 4 main translated fields for 8 active products, title, body_html, meta_title, and meta_description, were already translated. The only missing field was product_type (5 locales × 8 products = 40 rows). I filled those 40 rows while I was there, then tested again. The fallback was still there.

Conclusion: translation completeness had nothing to do with the fallback.

Round 3: I suspected a missing webPresence in Shopify Markets

I queried Admin GraphQL for Markets and found 3 markets in the store: US, International, and Japan. The odd part was that the 6 locale subpaths were only attached to Japan Market’s webPresence. US and International were both webPresence: null.

The theory was that US and International had no subpath configuration, so Cloudflare routing might send US and EU traffic to those markets by default. Those markets would not know what /ja/ meant, so they would fall back.

What I did: I suggested adding the 6 alternate locales to the US and International markets. The user went further and collapsed everything into one International Market, with webPresence for all 7 languages. I tested again. The fallback was still there.

Conclusion: the Markets config was not the key variable.

Round 4: I suspected a platform limitation in Shopify’s product template flow

At that point my theory changed again. Product pages need market-scoped price and inventory data. So maybe Shopify had to fully resolve Market first, and when the client had no AL it trusted only the market’s primary language. There was no Admin toggle to change that. I treated it as platform behavior.

That led me to a workaround: inject Accept-Language at the edge with a Cloudflare Worker.

I wrote worker.js, deployed it, and tested again. lang was still en. But the worker metrics showed 168 invocations, 0 errors, and about 1ms CPU time. The worker was running. It just was not fixing the problem.

So I added debug response headers, x-kryozon-worker and x-kryozon-path, to see the worker’s internal state:

ja-prod,noAL    worker=no-locale  path-seen=/products/...   ← 期望 /ja/products/...
ja-prod,AL=ja   worker=has-al:ja  path-seen=/ja/products/...

That was weird. It was the same URL, but without AL the worker saw a pathname with no /ja/ prefix. With AL, the pathname was normal.

Then I added redirect: 'manual' and tested again:

noAL → status=302
       Location: https://www.kryozon.com/products/X (英文 URL)

So Shopify really was returning a 302 from /ja/products/X to /products/X. Node fetch follows redirects by default. I had been looking at the second hop all along.

But the worker was also reporting has-al:ja, which meant it thought an AL header was present. I had not sent one.

So I added one more header, x-kryozon-al-in, to expose the actual AL value the worker received:

noAL request → al-seen=*    ← 上游注入了通配符 "*"

That was the actual root cause.

Shopify CF for SaaS, inside Shopify’s own Cloudflare account and sitting in front of the custom-domain SSL layer for KryoZon, was automatically injecting Accept-Language: * into every request that came in without AL. Shopify origin read * as “the client has no language preference”, then product pages triggered a 302 fallback to the source-language URL.


3. The fix

I changed the worker logic from “if AL exists, pass it through” to “check whether AL actually prefers the locale in the URL”:

function alPrefersLocale(al, locale) {
  if (!al) return false;
  const top = al.split(',')[0].split(';')[0].trim().toLowerCase();
  return top === locale || top.startsWith(locale + '-');
}

// 主逻辑
const alIn = request.headers.get('accept-language');
const locale = detectLocale(url.pathname);

if (!locale) return passthrough();
if (alPrefersLocale(alIn, locale)) return passthrough();

// 否则覆盖:无 AL / AL=* / AL=en-US 等都进这里
const newHeaders = new Headers(request.headers);
newHeaders.set('accept-language', LOCALE_MAP[locale]);
return fetch(url, { method, headers: newHeaders, redirect: 'manual' });

Three deployment conditions mattered. All three were required:

  1. www CNAME had to be switched to orange cloud (Cloudflare Proxied). Otherwise the request would bypass the user’s Cloudflare and the worker would never run.
  2. SSL/TLS mode had to be Full. Flexible would loop forever.
  3. The worker route had to be www.kryozon.com/*, with failure mode set to fail open. If the worker broke, requests would still go straight to Shopify and the site would stay up.

Verification result:

30 pages × 2 modes (user / googlebot)
- user:      29 clean, 0 HIGH, 0 MED
- googlebot: 29 clean, 0 HIGH, 0 MED  ← 修复前是 24 clean, 12 HIGH
- ✓ PASS

4. What made this hard

1. Cloudflare for SaaS was an invisible layer

Shopify uses Cloudflare for SaaS to issue SSL certificates for custom domains. That makes the traffic path look like this:

浏览器 → Shopify 自家 CF(CF for SaaS)→ [若用户启了自家 CF 代理] 用户 CF → Shopify origin

That middle Shopify CF for SaaS layer exists by default. It is not visible. The docs do not call out this side effect. It injects AL: *. If I do not know that layer exists, I never think to blame it.

The easiest clue was the cf-ray response header. If a domain is DNS only in Cloudflare but the response still has cf-ray, that cf-ray came from Shopify’s Cloudflare account, not mine.

2. Accept-Language: * was an undocumented oddball

In the HTTP spec, * means “any language is acceptable”. Almost nobody sends that value on purpose. Shopify treated it as a “no preference” signal and triggered the product-page Markets fallback.

Normal browsers do not send *. Normal crawlers do not send it either. A middle layer like CF for SaaS does.

3. Browser testing could not validate the SEO path

During the whole 4-hour debugging session, /ja/products/X always looked fine in an incognito browser. It stayed Japanese every time. That happened because:

Only clients with no AL could reproduce the fallback. That included some Googlebot crawl setups, some SEO tools, and plain curl. That is why i18n_live_diff had to simulate Googlebot mode.

4. Multi-layer systems need per-layer state checks

This problem touched several layers:

Each layer had its own state. I got the attribution wrong four times because I reasoned first and isolated state second.

The moment I added debug headers in round 4, the whole thing opened up. I was finally looking at the system’s actual state instead of the state I assumed.

5. Wrong attribution was expensive to unwind

From the user’s point of view, the process looked like this:

Claude also updated memory after each round as if the current theory was already the answer. Each one got overturned.

My takeaway for AI collaborators was simple: do not write to memory while attribution is still unstable. Memory is for stable knowledge. Unverified theories should stay in a handoff doc or some other staging area until they pass verification.


5. Regression prevention


6. Quick technical references

7. Expected SEO timeline

← Back to home