llm

package module

v0.0.0-...-768e335 Latest Latest Go to latest Published: Jun 14, 2025 License: MIT Imports: 18 Imported by: 1

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/starpkg/llm

Links

Open Source Insights

README ¶

🤖 `llm` - Starlark Module for AI and LLM Services

A powerful Starlark module for interacting with OpenAI and OpenAI-compatible API services. Easily generate text using chat completions and create images with DALL-E from your Starlark scripts.

Features

Chat completions using OpenAI GPT models
Image generation using DALL-E models
Support for OpenAI API
Support for Azure OpenAI services
Support for multimodal inputs (text and images)
Streaming mode for real-time responses
Customizable retry behavior and error handling

Installation

go get github.com/starpkg/llm

Usage in Go

The module provides two main constructors:

NewModule(): Creates a module with empty configurations
NewModuleWithConfig(serviceProvider, endpointURL, apiKey, gptModel, dalleModel, apiVersion string): Creates a module with preset configuration values

package main

import (
	"fmt"
	"os"

	"github.com/1set/starlet"
	"github.com/starpkg/llm"
)

func main() {
	// Create a new LLM module with API key
	apiKey := os.Getenv("OPENAI_API_KEY")
	mod := llm.NewModuleWithConfig("openai", "", apiKey, "gpt-4o", "dall-e-3", "")

	// Create a Starlet interpreter with the module
	interpreter := starlet.New(
		starlet.WithModuleLoader("llm", mod.LoadModule()),
	)

	// Run a Starlark script
	script := `
load("llm", "chat", "draw")

# Generate text using GPT
response = chat(
    text="Explain quantum computing in simple terms.",
    model="gpt-4o",
    max_tokens=100,
)
print("GPT response:", response)

# Generate an image using DALL-E
image_url = draw(
    prompt="A cute robot explaining quantum computing to children",
    model="dall-e-3",
    size="1024x1024",
)
print("Image URL:", image_url)
`

	// Execute the script
	if err := interpreter.ExecScript("example.star", script); err != nil {
		fmt.Println("Error:", err)
	}
}

Starlark API

Configuration

The module has the following configuration options:

openai_provider: The provider type ("openai", "azure", or "anthropic")
openai_endpoint_url: The API endpoint URL (required for Azure, optional for OpenAI)
openai_api_key: The API key (required)
openai_gpt_model: The default GPT model to use
openai_dalle_model: The default DALL-E model to use
api_version: The API version (for Azure, default: "2024-02-01")
legacy_mode: Use legacy mode for data conversion (default: true)

Functions

`message(role?, text?, image?, image_file?, image_url?)`

Creates a message object for the chat function. Parameters:

role: The role of the message (default: "user")
text: The text content of the message
image: Raw image data to include in the message
image_file: Path to an image file to include in the message
image_url: URL of an image to include in the message

Returns a dictionary representing the message.

`chat(text?, image?, image_file?, image_url?, messages?, model?, n?, max_tokens?, max_completion_tokens?, temperature?, top_p?, frequency_penalty?, presence_penalty?, stop?, response_format?, reasoning_effort?, retry?, full_response?, allow_error?, stream?, stream_callback?, kwargs?)`

Sends a chat completion request to the OpenAI API. Parameters:

text: Text content for a user message
image: Raw image data to include in the user message
image_file: Path to an image file to include in the user message
image_url: URL of an image to include in the user message
messages: List of message dictionaries (from message() function)
model: Model to use (defaults to openai_gpt_model config)
n: Number of completions to generate (default: 1)
max_tokens: Maximum number of tokens to generate (deprecated for o1 series models)
max_completion_tokens: Upper bound for generated completion tokens (for o1 series models)
temperature: Sampling temperature (default: 1.0)
top_p: Nucleus sampling parameter (default: 1.0)
frequency_penalty: Frequency penalty (default: 0.0)
presence_penalty: Presence penalty (default: 0.0)
stop: List of stop sequences
response_format: Format of the response ("text" or "json") (default: "text")
reasoning_effort: Controls reasoning effort for reasoning-capable models ("low", "medium", or "high")
kwargs: Dictionary of additional parameters to pass to the API (for custom or non-standard parameters)
retry: Number of retry attempts (default: 1)
full_response: Return the full API response (default: false)
allow_error: Return None instead of an error (default: false)
stream: Enable streaming mode (default: false)
stream_callback: Function to call for each chunk in streaming mode

Returns the generated text or a list of generated texts if n > 1. In streaming mode, the return value is constructed by combining all chunks. When full_response=True in streaming mode, the response includes token usage information accumulated from all stream chunks.

`draw(prompt, model?, n?, quality?, size?, style?, response_format?, background?, moderation?, output_format?, output_compression?, retry?, full_response?, allow_error?)`

Generates an image using DALL-E or GPT Image 1. Parameters:

prompt: Text prompt for the image generation (required)
- Max length: 32000 characters for gpt-image-1, 4000 for dall-e-3, 1000 for dall-e-2
model: Model to use (defaults to openai_dalle_model config)
- Supported models: "dall-e-2", "dall-e-3", "gpt-image-1"
n: Number of images to generate (default: 1)
- dall-e-2/gpt-image-1: 1-10 images, dall-e-3: only 1 image supported
quality: Image quality (default: "auto" for gpt-image-1, "standard" for DALL-E)
- GPT Image 1: "auto", "high", "medium", "low"
- DALL-E 3: "standard", "hd"
- DALL-E 2: "standard" only
size: Image size (default: "auto" for gpt-image-1, "1024x1024" for DALL-E)
- GPT Image 1: "auto", "1024x1024", "1536x1024" (landscape), "1024x1536" (portrait)
- DALL-E 3: "1024x1024", "1792x1024", "1024x1792"
- DALL-E 2: "256x256", "512x512", "1024x1024"
style: Image style (DALL-E 3 only, default: "vivid")
- DALL-E 3: "vivid", "natural"
response_format: Response format (DALL-E only, default: "url")
- DALL-E 2/3: "url", "b64_json"
- GPT Image 1: Always returns base64-encoded images (parameter ignored)
background: Background type (GPT Image 1 only, default: "auto")
- GPT Image 1: "auto", "transparent", "opaque"
moderation: Content moderation level (GPT Image 1 only, default: "auto")
- GPT Image 1: "auto", "low"
output_format: Output image format (GPT Image 1 only, default: "png")
- GPT Image 1: "png", "jpeg", "webp"
output_compression: Compression level 0-100 (GPT Image 1 only, default: 100)
- GPT Image 1: Only supported with "jpeg" or "webp" output formats
retry: Number of retry attempts (default: 1)
full_response: Return the full API response (default: false)
allow_error: Return None instead of an error (default: false)

Returns the image URL (DALL-E) or base64-encoded image data (GPT Image 1), or a list if n > 1.

Examples

Chat Completion

load("llm", "chat")

# Simple text generation
response = chat(
    text="What are the three laws of robotics?",
    max_tokens=200,
)
print(response)

# Using JSON mode
json_resp = chat(
    text="Generate a JSON object with the three laws of robotics. Include each law as a separate field.",
    response_format="json",
    max_tokens=200,
)
print(json_resp)

Image Generation

load("llm", "draw")

# Generate with DALL-E 3 (existing functionality)
dalle_image = draw(
    prompt="A futuristic city with flying cars and tall skyscrapers",
    model="dall-e-3",
    quality="hd",
    size="1024x1024",
    style="vivid"
)
print("DALL-E 3 image URL:", dalle_image)

# Generate with GPT Image 1 (new functionality)
gpt_image = draw(
    prompt="A realistic portrait of a robot scientist in a laboratory",
    model="gpt-image-1",
    quality="high",
    size="1024x1536",  # Portrait orientation
    background="transparent",
    output_format="png"
)
print("GPT Image 1 base64 data length:", len(gpt_image))

# Using content moderation with GPT Image 1
moderated_image = draw(
    prompt="A family-friendly cartoon character playing in a park",
    model="gpt-image-1",
    quality="medium",
    moderation="low",
    output_format="webp",
    output_compression=85
)
print("Moderated image data:", moderated_image[:100] + "...")

# Generate multiple images with GPT Image 1
multiple_images = draw(
    prompt="Abstract geometric patterns in bright colors",
    model="gpt-image-1",
    n=3,
    quality="medium",
    size="1024x1024",
    output_format="jpeg",
    output_compression=90
)
print(f"Generated {len(multiple_images)} images")

# Get full response with token usage information
full_resp = draw(
    prompt="An artistic illustration of AI concepts and neural networks",
    model="gpt-image-1",
    quality="high",
    full_response=True
)
print("Image data:", full_resp.data[0].b64_json[:100] + "...")
if hasattr(full_resp, "usage"):
    usage = full_resp.usage
    print(f"Total tokens: {usage.total_tokens}")
    print(f"Input tokens: {usage.input_tokens}")
    print(f"Output tokens: {usage.output_tokens}")

Streaming Mode

load("llm", "chat")

# Function to handle each chunk of the response
def handle_chunk(chunk):
    # Access the delta content from the first choice
    if len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content:
            # Print each chunk as it arrives
            print(delta.content, end="", flush=True)

# Stream response with a callback
full_response = chat(
    text="Write a short poem about coding in Python, one line at a time.",
    max_tokens=200,
    stream=True,
    stream_callback=handle_chunk,
)

# The full_response contains the complete aggregated text
print("\n\nFull response:", full_response)

Streaming with Progress Tracking

load("llm", "chat")

# Create a simple progress tracker
tracker = {
    "tokens": 0,
    "started": False,
    "done": False
}

def process_chunk(chunk):
    if not tracker["started"]:
        print("Generating response...")
        tracker["started"] = True
    
    # Count tokens received
    if len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content:
            tracker["tokens"] += 1
            
            # Show progress
            if tracker["tokens"] % 10 == 0:
                print(".", end="", flush=True)

# Generate a longer response with progress tracking
response = chat(
    text="Explain how transformers work in machine learning, with detailed technical information.",
    max_tokens=500,
    stream=True,
    stream_callback=process_chunk
)

print("\nDone! Received", tracker["tokens"], "chunks.")
print("\nFinal response:\n", response)

Accessing Token Usage Information

load("llm", "chat")

# Get the full response with token usage information
full_resp = chat(
    text="Explain the concept of transfer learning in AI.",
    max_tokens=300,
    stream=True,
    full_response=True
)

# Access token usage information
if hasattr(full_resp, "usage"):
    usage = full_resp.usage
    print(f"Prompt tokens: {usage.prompt_tokens}")
    print(f"Completion tokens: {usage.completion_tokens}")
    print(f"Total tokens: {usage.total_tokens}")
    
    # Calculate approximate cost (example rates for gpt-4)
    # Note: Token counts are accumulated from all stream responses for better accuracy
    prompt_cost = usage.prompt_tokens * 0.00003  # $0.03 per 1000 tokens
    completion_cost = usage.completion_tokens * 0.00006  # $0.06 per 1000 tokens
    total_cost = prompt_cost + completion_cost
    
    print(f"Approximate cost: ${total_cost:.6f}")

Multimodal Interaction

load("llm", "chat")

# Ask about an image
response = chat(
    text="What's in this image?",
    image_url="https://example.com/image.jpg",
    max_tokens=150,
)
print(response)

Advanced Chat with Message History

load("llm", "message", "chat")

# Create a conversation history
messages = [
    message(role="user", text="Hello, who are you?"),
    message(role="assistant", text="I'm an AI assistant. How can I help you today?"),
    message(role="user", text="Can you explain what an LLM is?"),
]

# Continue the conversation
response = chat(
    messages=messages,
    max_tokens=200,
)
print(response)

Using Reasoning Models

load("llm", "chat")

# Set endpoint and API key for a reasoning-capable model provider
llm.set_openai_endpoint_url("https://reasoning-model-api-endpoint.com")
llm.set_openai_api_key("your-api-key-here")

# Call a reasoning-capable model with specific reasoning effort
response = chat(
    text="Solve this step by step: If 3x + 7 = 22, what is the value of x?",
    model="reasoning-model-name",
    reasoning_effort="high",  # Can be "low", "medium", or "high"
    max_tokens=300,
    full_response=True,
)

# If the model provides reasoning content, it will be available
if hasattr(response.choices[0].message, "reasoning_content"):
    print("Reasoning:")
    print(response.choices[0].message.reasoning_content)
    
print("\nFinal answer:")
print(response.choices[0].message.content)

Using Custom Parameters with kwargs

load("llm", "chat")

# Example 1: Using kwargs for custom or experimental parameters
# Some API providers or custom deployments may support additional parameters
response = chat(
    text="Generate a creative story about space exploration.",
    max_tokens=200,
    kwargs={
        "custom_parameter": "value",
        "experimental_feature": True,
        "custom_config": {
            "setting_a": "option1",
            "setting_b": 42
        }
    }
)
print(response)

# Example 2: Using kwargs for provider-specific parameters
# Different OpenAI-compatible providers may have unique parameters
response = chat(
    text="What are the benefits of renewable energy?",
    max_tokens=150,
    kwargs={
        "provider_specific_param": "custom_value",
        "optimization_level": "high",
        "cache_enabled": True
    }
)
print(response)

# Example 3: Combining standard parameters with custom kwargs
response = chat(
    text="Explain machine learning in simple terms.",
    model="gpt-4",
    temperature=0.7,
    max_tokens=200,
    kwargs={
        "safety_level": "strict",
        "response_style": "educational",
        "custom_instruction": "Use analogies when possible"
    },
    full_response=True
)

# Access both standard response and any custom fields
print("Content:", response.choices[0].message.content)
if hasattr(response, 'custom_fields'):
    print("Custom response fields:", response.custom_fields)

License

This package is licensed under the MIT License - see the LICENSE file for details.

Documentation ¶

Overview ¶

Package llm provides a Starlark module that calls OpenAI models.

Configuration options:

openai_provider: Provider type (openai, azure, anthropic)
openai_endpoint_url: API endpoint URL
openai_api_key: API key for authentication
openai_gpt_model: Default GPT model name
openai_dalle_model: Default DALL-E model name
api_version: API version (for Azure)
legacy_mode: Use legacy mode for data conversion (default: true)

The chat function supports both blocking and streaming modes:

In blocking mode (default), the function waits for the complete response
In streaming mode (stream=True), responses are received incrementally and can be processed via a callback
Streaming mode can improve responsiveness for large responses or when displaying partial results is desired
To use streaming mode, set stream=True and optionally provide stream_callback=function
The stream_callback receives each chunk of the response as it arrives
In both streaming and blocking modes, the function returns the same format: either the complete content or full response
For streaming, the content is automatically aggregated from all chunks

Custom parameters support:

The kwargs parameter allows passing custom or non-standard parameters to the API
Useful for provider-specific features, experimental parameters, or custom deployments
Any dictionary passed as kwargs will be included in the ChatTemplateKwargs field of the API request

Token parameters for different models:

max_tokens: Maximum number of tokens to generate (default: 64) - works with most models
max_completion_tokens: Upper bound for generated completion tokens - for o1 series models
For o1, o3, o4 series models, use max_completion_tokens instead of max_tokens

When legacy_mode is true (default), response objects are converted using direct struct access (ConvertJSONStruct). When false, JSON conversion is used (GoToStarlarkViaJSON).

Index ¶

Constants
type Module
- func NewModule() *Module
- func NewModuleWithConfig(serviceProvider, endpointURL, apiKey, gptModel, dalleModel, apiVersion string) *Module
- func (m *Module) LoadModule() starlet.ModuleLoader
- func (m *Module) SetClient(cli *oai.Client)

Constants ¶

View Source

const (
	// ProviderOpenAI represents the default OpenAI API provider
	ProviderOpenAI = "openai"
	// ProviderAzure represents the Azure OpenAI Service provider
	ProviderAzure = "azure"
	// ProviderAnthropic represents the Anthropic Claude API provider
	ProviderAnthropic = "anthropic"
)

Provider type constants

View Source

const ModuleName = "llm"

ModuleName defines the expected name for this module when used in Starlark's load() function, e.g., load('llm', 'chat')

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Module ¶

type Module struct {
	// contains filtered or unexported fields
}

Module wraps the ConfigurableModule with specific functionality for calling OpenAI models.

func NewModule ¶

func NewModule() *Module

NewModule creates a new instance of Module with default empty configurations.

func NewModuleWithConfig ¶

func NewModuleWithConfig(serviceProvider, endpointURL, apiKey, gptModel, dalleModel, apiVersion string) *Module

NewModuleWithConfig creates a new instance of Module with the given configuration values.

func (*Module) LoadModule ¶

func (m *Module) LoadModule() starlet.ModuleLoader

LoadModule returns the Starlark module loader with the email-specific functions.

func (*Module) SetClient ¶

func (m *Module) SetClient(cli *oai.Client)

SetClient sets the OpenAI client for this module.

Source Files ¶

View all Source files

openai.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL