MOJDOC – Linux/Playwright.md

Yes—but not with plain HTTP::Tiny (or curl) alone. If the page’s final HTML is produced by JavaScript, you need a real browser engine (headless Chromium/Firefox) to execute that JS and then extract from the rendered DOM. Perl can absolutely orchestrate that; the usual pattern is:

Drive a headless browser (Chromium) from Perl
Wait for the page to render
Pull the final HTML (or run JS in-page)
Apply your regex / extract links

Practical options from Perl

Option A: Perl drives headless Chromium via Playwright (recommended)

Playwright is modern and reliable for JS-heavy sites. From Perl you can either:

use a Perl module wrapper (exists, but quality varies), or
call the Playwright CLI / a tiny Node helper from Perl (most robust in practice).

High-level shape (no extra “relative URL” stuff, same idea as your script):

launch chromium headless
goto($url)
wait for networkidle (or a selector)
content() to get rendered HTML
run your regex on that HTML

Option B: Selenium (WebDriver) from Perl

This is the “classic” way. You run chromedriver + Selenium from Perl (Selenium::Remote::Driver), navigate, then fetch page source.

Pros: widely known Cons: more moving parts (driver versions, etc.)

Option C: “HTML snapshot” services / prerender endpoints

Sometimes a site already serves a “rendered” version (for bots) or has an API. This can avoid headless browsing entirely—but it’s site-specific.

A minimal, realistic approach: Perl + Playwright (Node helper)

If you’re okay with having Node installed, you can keep Perl as the main tool and use a tiny helper to render:

render.mjs (Node + Playwright):

import { chromium } from 'playwright';

const url = process.argv[2];
if (!url) {
  console.error('Usage: render.mjs <url>');
  process.exit(2);
}

const browser = await chromium.launch({ headless: true });
const page = await browser.newPage({
  userAgent: 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
});
await page.goto(url, { waitUntil: 'networkidle' });
const html = await page.content();
console.log(html);
await browser.close();

Then your Perl script just does:

my $html = qx(node render.mjs "$url");
apply regex and print matches

This gives you “curl-like” behavior but with a real JS runtime.

What you should know / tradeoffs

Speed: headless browsers are slower than HTTP GET.
Detection: some “fancy” sites try to detect automation; Playwright/Selenium can still work, but sometimes you need extra steps (viewport, delays, stealthy settings).
Auth / cookies / consent banners: if the content appears only after a click or login, you’ll need to script that interaction.
Legality/ToS: some sites forbid automated scraping; worth checking.

If you tell me which direction you prefer:

Perl + Selenium
Perl + Playwright (Node helper)
Pure Perl module approach (if you really want “all Perl”)

…I’ll give you a drop-in script that matches your original CLI style: script <url> <regex> and prints unique matches, but backed by a headless browser for JS-rendered pages.