The worst scraping failure is the one you don’t notice

A 403 error is obvious. A CAPTCHA page is obvious. But what about this?

HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8

<html>
<head><title></title></head>
<body></body>
</html>

Status code: 200. No error. Your scraper thinks it succeeded. Your pipeline processes the empty HTML. Your database gets populated with blank records. And you don’t notice until days later when someone asks why all the data is missing.

This is the silent anti-bot block. It’s the most dangerous failure mode in web scraping, and it’s becoming the default response from modern anti-bot systems like DataDome, Akamai, and Cloudflare.

Why anti-bot systems return empty HTML instead of 403

Anti-bot vendors got smart. They realized that scrapers check for 403 status codes and retry. So they changed their strategy:

  1. Return 200 OK so the scraper thinks it succeeded
  2. Serve an empty or minimal HTML shell with no actual content
  3. Embed a JavaScript challenge that only real browsers can execute
  4. Load the actual page content only after the challenge passes

From the anti-bot system’s perspective, this is genius. The scraper gets a “successful” response, processes it, and moves on — never realizing it got nothing. No retry logic triggers. No error alerts fire. The scraper is defeated and doesn’t even know it.

The three flavors of empty HTML blocks

1. JavaScript challenge pages

The most common variant. The server returns a page that looks empty but contains obfuscated JavaScript:

<html>
<head>
<script>
  // 500 lines of obfuscated JS that:
  // - Collects browser fingerprints
  // - Generates proof-of-work tokens
  // - Sets anti-bot cookies
  // - Redirects to the real page
</script>
</head>
<body>
<!-- Content loads after JS execution -->
</body>
</html>

If your scraper doesn’t execute JavaScript — or uses a headless browser that gets detected — you get the shell without the content.

2. Client-side rendered pages

Many modern websites use React, Vue, or Angular. The initial HTML is an empty container:

<html>
<head>...</head>
<body>
  <div id="root"></div>
  <script src="/app.bundle.js"></script>
</body>
</html>

The actual content renders after JavaScript loads and executes API calls. Standard HTTP scrapers (requests, urllib, even curl) will always get empty HTML from these sites. This isn’t anti-bot — it’s just modern web architecture. But the result is the same: empty data.

3. Delayed content loading with bot detection

The most sophisticated variant combines both approaches:

  1. Page loads with empty shell
  2. JavaScript executes and performs anti-bot checks
  3. If checks pass, AJAX requests fetch the actual data
  4. If checks fail, the page stays empty — no error, no redirect, just silence

DataDome is particularly aggressive with this pattern. Their JavaScript SDK runs multiple detection checks before allowing any API calls to proceed. If their SDK detects automation, it simply never triggers the data-loading calls.

Bright Data returns empty HTML and charges you for it

This is where it gets infuriating. Bright Data’s Web Unlocker charges per request, not per successful request. Their definition of “success” is based on HTTP status codes, not content quality.

When Bright Data sends your request through their headless browser farm:

  1. The headless browser hits the target site
  2. The anti-bot system serves the JavaScript challenge
  3. Bright Data’s headless Chrome attempts to execute the challenge
  4. The anti-bot system detects the headless browser (because it always does)
  5. The JavaScript challenge never completes
  6. The page content never loads
  7. Bright Data returns the empty HTML shell to you
  8. Status code: 200. Bright Data calls it a success. You get charged $0.025.

At $25.10 per 1,000 requests, paying for empty HTML adds up fast. Run 50,000 requests against a DataDome-protected site and get back 50,000 empty pages? That’s $1,255 for literally nothing.

ScraperAPI does the same thing. Their API returns whatever the target site gives them. If the site gives empty HTML with a 200 status, ScraperAPI reports success. You pay. You get nothing.

Oxylabs, ZenRows — same pattern. They all use headless browser farms that anti-bot systems have long since learned to detect. Empty HTML is the default result on protected sites.

Why headless browsers produce empty HTML

Every major headless browser framework has detectable signatures:

Puppeteer / Playwright

  • navigator.webdriver is true (or was recently set to true)
  • window.chrome object is missing or incomplete
  • CDP (Chrome DevTools Protocol) artifacts are detectable
  • navigator.plugins is empty or has automation-specific entries
  • Canvas and WebGL fingerprints differ from real Chrome

Headless Chrome (--headless=new)

  • GPU rendering is different (or absent)
  • User-Agent contains “HeadlessChrome” (if not spoofed)
  • Window dimensions and screen properties are wrong
  • Font rendering differences due to missing system fonts
  • Audio context fingerprint is missing or generic

Selenium

  • All of the above, plus:
  • $cdc_ and $wdc_ variables present in the DOM
  • document.$cdc_asdjflasutopfhvcZLmcfl_ detection
  • Selenium-specific attributes on elements

Anti-bot systems maintain databases of these signatures. When DataDome’s JavaScript SDK runs in a Puppeteer-controlled browser, it detects dozens of automation markers in milliseconds. The result: the challenge fails silently, and you get empty HTML.

Bright Data, ScraperAPI, and every other proxy service runs these same headless browsers. Different IPs, same detectable automation fingerprints.

How real Chrome browsers solve the empty HTML problem

The solution is straightforward in concept and brutal in execution: use real Chrome browsers that anti-bot systems cannot distinguish from human-operated browsers.

UltraWebScrapingAPI doesn’t use Puppeteer. Doesn’t use Playwright. Doesn’t use headless Chrome. We run real Chrome browser instances with:

  1. Genuine GPU rendering — not emulated, not software-rendered. Real GPU output that produces authentic canvas and WebGL fingerprints.

  2. Complete browser environments — real navigator.plugins, real window.chrome objects, real audio contexts, real font lists from actual OS installations.

  3. No automation frameworks — no CDP connections, no WebDriver protocol, no Selenium artifacts. The browser is controlled through methods that don’t leave automation traces.

  4. Full JavaScript execution — every script runs completely. Anti-bot challenges are solved because the browser environment is genuine. DataDome’s SDK runs, checks everything, finds nothing suspicious, and releases the content.

  5. Per-site optimization — we analyze each target site’s specific anti-bot configuration and tune our browser environment to pass every check. Akamai on an airline site has different checks than Akamai on a retail site. We handle both.

The result: actual content, not empty shells

When UltraWebScrapingAPI requests a DataDome-protected page:

  1. Real Chrome browser navigates to the URL
  2. DataDome’s JavaScript SDK loads and runs
  3. It checks browser fingerprints — everything looks genuine
  4. It checks behavioral signals — everything looks human
  5. DataDome clears the request
  6. The page’s JavaScript loads the actual content
  7. We wait for the content to fully render
  8. You get the complete HTML with all data present

Success rate: 99%+. Not 99% of requests returning 200 status codes — 99% of requests returning pages with actual content.

How to detect if you’re getting empty HTML

Before switching services, verify the problem. Check your scraped HTML for:

# Red flags that you're getting empty HTML:
if len(response.text) < 1000:
    print("Suspiciously short response")

if '<div id="root"></div>' in response.text and 'data' not in response.text:
    print("Empty React shell — JS didn't execute")

if 'challenge' in response.text.lower() or '_abck' in response.text:
    print("Anti-bot challenge page, not real content")

if response.text.count('<') < 20:
    print("Minimal HTML — likely blocked")

If any of these trigger, you’re getting blocked silently. Your data pipeline is processing empty pages. And if you’re paying Bright Data or ScraperAPI, you’re paying for every one of them.

Stop paying for empty pages

Empty HTML responses are not a mystery. They’re a well-understood anti-bot technique. The fix is not more proxies, not more retries, not more IP rotation. The fix is a browser environment that anti-bot systems cannot detect.

Try UltraWebScrapingAPI in our free playground — paste any URL that’s returning empty HTML and see the full page content come back. Compare what you’re getting from Bright Data to what we deliver. The difference is immediately obvious.

You’ve been paying for empty pages long enough. Get actual data.