multi_scraper
IngressRule
A rule that defines how to handle a specific type of URL.
Attributes:
| Name | Type | Description |
|---|---|---|
match |
Pattern
|
A compiled regular expression used to match URLs. |
scraper |
IScraper
|
An instance of a scraper to be used when the URL matches. |
match (str|re.Pattern): The regex pattern to match against URLs.
scraper (IScraper): The scraper to use for this match.
exclusive (bool): If True, this rule is exclusive and no other rules will be processed if it matches.
from_scraper
staticmethod
Create an IngressRule from a scraper instance and its expected link format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
scraper
|
IScraper
|
The scraper to use for this rule. |
required |
exclusive
|
bool
|
If True, this rule is exclusive and no other rules will be processed if it matches. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
IngressRule |
IngressRule
|
An IngressRule instance with a match that always returns True. |
MultiScraper
Bases: IAsyncScraper
A scraper that uses multiple ingress rules to determine how to scrape a link.
Attributes:
| Name | Type | Description |
|---|---|---|
DEFAULT_USER_AGENT |
str
|
Default User-Agent used for HTTP requests. |
ingress_rules |
List[IngressRule]
|
A list of ingress rule instances. |
debug |
bool
|
Indicates whether debug mode is enabled. |
debug_delimiter |
str
|
The delimiter used to join debug log messages. |
Methods:
| Name | Description |
|---|---|
async_scrape |
str) -> ScrapeResult: Asynchronously scrapes the given URL using the first matching ingress rule. Returns a ScrapeResult indicating success or failure. |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ingress_rules
|
list[IngressRule]
|
A list of IngressRule instances. None items are omited. |
required |
debug
|
bool
|
Enable debug mode. Defaults to False. |
False
|
debug_delimiter
|
str
|
Delimiter for joining debug log messages. Defaults to "; ". |
'; '
|
async_scrape
async
Scrape the given URL using the appropriate scraper based on ingress rules.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
The URL to scrape. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
ScrapeResult |
ScrapeResult
|
The result of the scrape. |