Bundled Components
Scraipe comes with a collection of powerful scrapers and analyzers that will keep expanding. If you need more functionality, see custom components.
Scrapers
TextScraper: Extracts visible text from HTML content usingaiohttpfor fetching andBeautifulSoupfor parsing.RawScraper: Retrieves unmodified website content usingaiohttp.MultiScraper^: Uses ingress rules to determine the appropriate scraper for a given URL.TelegramMessageScraper^: Scrapes Telegram messages using thetelethonlibrary.NewsScraper^: Extracts article content from webpages usingaiohttpandtrafilatura.TelegramNewsScraper^: A specializedMultiScraperfor handling Telegram and news links, with a fallback toTextScraper.
Analyzers
TextStatsAnalyzer: Computes text statistics such as word count, character count, sentence count, and average word length.OpenAiAnalyzer^: Uses OpenAI's API to analyze content based on a provided instruction and optional schema validation.GeminiAnalyzer^: Integrates Google Gemini's API to analyze content based on instruction and mandatory schema validation.
Components with a caret^ require scraipe[extended].
Plug these components into your basic workflow.