robots.txt validator: checks if a website's robots.txt file is valid. in typescript

You can use the robots-parser package to check if a website's robots.txt file is valid. Here's an example code snippet in TypeScript:

index.ts
import * as robotsParser from 'robots-parser';

const robotsUrl = 'https://example.com/robots.txt';
const userAgent = '*';
const robotsContent = await fetch(robotsUrl).then(res => res.text());
const robots = robotsParser(robotsUrl, robotsContent);
const canCrawl = robots.isAllowed('/', userAgent);

if (canCrawl) {
  console.log('This page is allowed to be crawled.');
} else {
  console.warn('This page is NOT allowed to be crawled.');
}
440 chars
14 lines

In this example, we're using the robots-parser package to parse the robots.txt file fetched with fetch(). We're then checking if the root path / is allowed to be crawled with the specified userAgent.

Note that this example assumes that you're using Node.js as the runtime environment. If you're using a browser instead, you may need to use a different way to fetch the robots.txt file, such as XMLHttpRequest or fetch() with browser-specific options.

gistlibby LogSnag