Apify is a web scraping and data extraction platform that provides an app store with more than a thousand ready-made cloud tools called Actors.
The Website Content Crawler Actor can deeply crawl websites, clean their HTML by removing a cookies modals, footers, or navigation, and then transform the HTML into Markdown. This Markdown can then be stored in a vector database for semantic search or Retrieval-Augmented Generation (RAG).