Using LLM Analyzers
Large language models can provide intelligent feature extraction with tailored prompts. This page covers how to configure and use LLM analyzers provided by Scraipe.
Overview
GeminiAnalyzer and OpenAiAnalyzer are included in the scraipe[extended] package. These analyzers extend LlmAnalyzerBase, an IAsyncScraper implementation that orchestrates LLM queries and validates responses.
Usage
Most LLM analyzers need to be configured with the following parameters:
instruction: the system instruction to guide the LLM's behavior.api_key: the API key to access the LLM provider's service.pydantic_schema: a Pydantic model that defines the schema for the LLM's JSON response.
The specific configuration parameters will depend on the analyzer's implementation. Here is an example using GeminiAnalyzer. Note that you will need a Gemini API key.
-
Import dependencies and API key.
-
Configure the analyzer with a prompt and schema.
# Craft an instruction for the LLM instruction = """ Determine the article's topic and whether the sentiment is positive, negative, or neutral. Output the result in the following JSON format: { "topic": "<topic>", "sentiment": "<positive|negative|neutral>", } """ # Define a pydantic schema for the expected output class ExpectedOutput(BaseModel): topic: str sentiment: str # Initialize the analyzer with the API key, instruction, and schema analyzer = GeminiAnalyzer( api_key=gemini_key, instruction=instruction, pydantic_schema=ExpectedOutput, ) -
Analyze and display results.
# Analyze an article article = """ Scraipe is a powerful tool for scraping and analyzing web data. It allows users to extract information from websites easily and efficiently. With Scraipe, users can automate the process of data collection and analysis, saving time and effort. """ result = analyzer.analyze(article) # Output the analysis result print(result.output)
Running this script outputs the extracted topic and sentiment of the article:
Conclusion
The topic & sentiment analyzer we configured can plugged into a basic workflow. Consider pairing this analyzer with NewsScraper to feed it the most relevant content from news sites.
Check out celebrities_example.ipynb for an advanced workflow using NewsScraper and OpenAiAnalyzer.
To integrate other LLms, check out how to Custom LLM Analyzers.