Skip to content

Bundled Components

Scraipe comes with a collection of powerful scrapers and analyzers that will keep expanding. If you need more functionality, see custom components.

Scrapers

  • TextScraper: Extracts visible text from HTML content using aiohttp for fetching and BeautifulSoup for parsing.
  • RawScraper: Retrieves unmodified website content using aiohttp.
  • MultiScraper^: Uses ingress rules to determine the appropriate scraper for a given URL.
  • TelegramMessageScraper^: Scrapes Telegram messages using the telethon library.
  • NewsScraper^: Extracts article content from webpages using aiohttp and trafilatura.
  • TelegramNewsScraper^: A specialized MultiScraper for handling Telegram and news links, with a fallback to TextScraper.

Analyzers

  • TextStatsAnalyzer: Computes text statistics such as word count, character count, sentence count, and average word length.
  • OpenAiAnalyzer^: Uses OpenAI's API to analyze content based on a provided instruction and optional schema validation.
  • GeminiAnalyzer^: Integrates Google Gemini's API to analyze content based on instruction and mandatory schema validation.

Components with a caret^ require scraipe[extended].


Plug these components into your basic workflow.