Yes—but not with plain HTTP::Tiny (or curl) alone. If the page’s final HTML is produced by JavaScript, you need a real browser engine (headless Chromium/Firefox) to execute that JS and then extract from the rendered DOM. Perl can absolutely orchestrate that; the usual pattern is:
- Drive a headless browser (Chromium) from Perl
- Wait for the page to render
- Pull the final HTML (or run JS in-page)
- Apply your regex / extract links
Practical options from Perl
Option A: Perl drives headless Chromium via Playwright (recommended)
Playwright is modern and reliable for JS-heavy sites. From Perl you can either:
- use a Perl module wrapper (exists, but quality varies), or
- call the Playwright CLI / a tiny Node helper from Perl (most robust in practice).
High-level shape (no extra “relative URL” stuff, same idea as your script):
- launch chromium headless
goto($url)- wait for
networkidle(or a selector) content()to get rendered HTML- run your regex on that HTML
Option B: Selenium (WebDriver) from Perl
This is the “classic” way. You run chromedriver + Selenium from Perl (Selenium::Remote::Driver), navigate, then fetch page source.
Pros: widely known Cons: more moving parts (driver versions, etc.)
Option C: “HTML snapshot” services / prerender endpoints
Sometimes a site already serves a “rendered” version (for bots) or has an API. This can avoid headless browsing entirely—but it’s site-specific.
A minimal, realistic approach: Perl + Playwright (Node helper)
If you’re okay with having Node installed, you can keep Perl as the main tool and use a tiny helper to render:
render.mjs (Node + Playwright):
import { chromium } from 'playwright';
const url = process.argv[2];
if (!url) {
console.error('Usage: render.mjs <url>');
process.exit(2);
}
const browser = await chromium.launch({ headless: true });
const page = await browser.newPage({
userAgent: 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
});
await page.goto(url, { waitUntil: 'networkidle' });
const html = await page.content();
console.log(html);
await browser.close();
Then your Perl script just does:
my $html = qx(node render.mjs "$url");- apply regex and print matches
This gives you “curl-like” behavior but with a real JS runtime.
What you should know / tradeoffs
- Speed: headless browsers are slower than HTTP GET.
- Detection: some “fancy” sites try to detect automation; Playwright/Selenium can still work, but sometimes you need extra steps (viewport, delays, stealthy settings).
- Auth / cookies / consent banners: if the content appears only after a click or login, you’ll need to script that interaction.
- Legality/ToS: some sites forbid automated scraping; worth checking.
If you tell me which direction you prefer:
- Perl + Selenium
- Perl + Playwright (Node helper)
- Pure Perl module approach (if you really want “all Perl”)
…I’ll give you a drop-in script that matches your original CLI style: script <url> <regex> and prints unique matches, but backed by a headless browser for JS-rendered pages.