# Puppeteer Web Scraper

Puppeteer is a Node.js library, controls Chrome/Chromium through the DevTools Protocol in headless mode. Keep in mind that when scraping websites, **you should always review and comply with the website's terms of service and policies to ensure ethical and legal use of the data**.

## Scrape One URL

1. *(Optional)* Connect [**Text Splitter**](https://tailwindsdocs.innovativesol.com/readme/chatflows/langchain/text-splitters).
2. Input desired URL to be scraped.

## Crawl & Scrape Multiple URLs

Visit [**Web Crawl**](https://github.com/innovativeSol/tailwinds-docs/blob/main/integrations/use-cases/web-crawl.md) guide to allow scraping of multiple pages.

## Output

Loads URL content as Document

## Resources

* [LangChain JS Puppeteer](https://js.langchain.com/docs/integrations/document_loaders/web_loaders/web_puppeteer)
* [Puppeteer](https://pptr.dev/)
