Skip to content

text_scraper

TextScraper

TextScraper(headers=None)

Bases: IAsyncScraper

Asynchronous text scraper that extracts visible text from HTML.

Fetches webpage content using aiohttp and parses the HTML with BeautifulSoup. Strips HTML tags.

Attributes:

Name Type Description
DEFAULT_USER_AGENT str

Default User-Agent string for HTTP requests.

headers dict

HTTP headers used in fetching the webpage content.

:param headers: A dictionary of HTTP headers to use in asynchronous requests. If not provided, defaults to a standard User-Agent header.

headers class-attribute instance-attribute

headers: dict = headers or headers

Headers to be used in the HTTP requests. Defaults to a standard User-Agent header.

async_scrape async

async_scrape(url: str) -> ScrapeResult

Scrape a webpage asynchronously and extract visible text.

Parameters:

Name Type Description Default
url str

URL of the webpage to be scraped.

required

Returns:

Name Type Description
ScrapeResult ScrapeResult

Result containing the URL, extracted text content, success flag, and error if any.