Skip to content

raw_scraper

RawScraper

RawScraper(headers=None)

Bases: IAsyncScraper

Asynchronous scraper that retrieves webpage content in raw text format. The scraper performs no cleaning or parsing of the content.

Uses aiohttp to perform HTTP GET requests.

Attributes:

Name Type Description
DEFAULT_USER_AGENT str

Default User-Agent string for HTTP requests.

headers dict

HTTP headers used during the requests.

Parameters:

Name Type Description Default
headers dict

Custom headers for HTTP requests. Defaults to None, which uses the class-defined headers.

None

headers class-attribute instance-attribute

headers: dict = headers or headers

Headers to be used in the HTTP requests. Defaults to a standard User-Agent header.

async_scrape async

async_scrape(url: str) -> ScrapeResult

Scrape a webpage asynchronously and return its raw text content.

Parameters:

Name Type Description Default
url str

URL of the webpage to be scraped.

required

Returns:

Name Type Description
ScrapeResult ScrapeResult

Result containing the URL, raw text content, success flag, and error message if applicable.