raw_scraper
RawScraper
Bases: IAsyncScraper
Asynchronous scraper that retrieves webpage content in raw text format. The scraper performs no cleaning or parsing of the content.
Uses aiohttp to perform HTTP GET requests.
Attributes:
| Name | Type | Description |
|---|---|---|
DEFAULT_USER_AGENT |
str
|
Default User-Agent string for HTTP requests. |
headers |
dict
|
HTTP headers used during the requests. |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
headers
|
dict
|
Custom headers for HTTP requests. Defaults to None, which uses the class-defined headers. |
None
|
headers
class-attribute
instance-attribute
Headers to be used in the HTTP requests. Defaults to a standard User-Agent header.
async_scrape
async
Scrape a webpage asynchronously and return its raw text content.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
URL of the webpage to be scraped. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
ScrapeResult |
ScrapeResult
|
Result containing the URL, raw text content, success flag, and error message if applicable. |