Scraipe

Scraipe is a high performance scraping and analysis framework.

Why use Scraipe?

Versatile Scraping: Scraipe integrates popular libraries such as aiohttp, trafilatura, and telethon to efficiently scrape diverse documents.f
Speed: Run batches of asynchronous IO-bound tasks in parallel through a simple synchronous API.
Customization: Scraipe's modular architecture allows you to easily extend or create new scrapers and analyzers.
LLM Analysis: Use LLM models to extract structured information from unstructured content.
Pydantic Validation: Ensure the format of component and LLM outputs with built-in Pydantic integration.
Workflow Management: Streamline scraping and analysis in one fault-tolerant workflow.
Jupyter Friendly: Scraipe workflows are ergonomic in the notebook environment.

Quick Start

Installation

Install Scraipe with full features:

pip install scraipe[extended]

Or install the core package to build your own components:

pip install scraipe

2. Example Usage

# Import components from scraipe
from scraipe.defaults import TextScraper
from scraipe.defaults import TextStatsAnalyzer
from scraipe import Workflow

# Initialize the scraper and analyzer
scraper = TextScraper()
analyzer = TextStatsAnalyzer()

# Create the workflow instance
workflow = Workflow(scraper, analyzer)

# List urls to scrape
urls = [
    "https://example.com",
    "https://rickandmortyapi.com/api/character/1",
    "https://ckaestne.github.io/seai/"
]

# Run the workflow
workflow.scrape(urls)
workflow.analyze()

# Print the results
results = workflow.export()
print(results)

Explore

Check out these awesome resources to get cooking.