text_scraper
TextScraper
Bases: IAsyncScraper
Asynchronous text scraper that extracts visible text from HTML.
Fetches webpage content using aiohttp and parses the HTML with BeautifulSoup. Strips HTML tags.
Attributes:
| Name | Type | Description |
|---|---|---|
DEFAULT_USER_AGENT |
str
|
Default User-Agent string for HTTP requests. |
headers |
dict
|
HTTP headers used in fetching the webpage content. |
:param headers: A dictionary of HTTP headers to use in asynchronous requests. If not provided, defaults to a standard User-Agent header.
headers
class-attribute
instance-attribute
Headers to be used in the HTTP requests. Defaults to a standard User-Agent header.
async_scrape
async
Scrape a webpage asynchronously and extract visible text.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
URL of the webpage to be scraped. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
ScrapeResult |
ScrapeResult
|
Result containing the URL, extracted text content, success flag, and error if any. |