silky

package module
v1.0.9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 3, 2026 License: AGPL-3.0 Imports: 25 Imported by: 0

README

Silky

Silky is a declarative, YAML-configured API crawler designed for complex and dynamic API data extraction. It allows developers to describe multi-step API interactions with support for nested operations, data transformations, and context-based processing.

The core functionality of Silky revolves around three main step types:

  • request: to perform API calls,
  • forEach: to iterate over arrays extracted from context data,
  • forValues: to iterate over literal values defined in the configuration.

Each step operates in its own context, allowing for precise manipulation and isolation of data. Contexts are organized in a hierarchical structure, with each forEach or forValues step creating new child contexts. This enables fine-grained control of nested operations and data scoping. After execution, results can be merged into parent or ancestor contexts using declarative merge rules.

Silky also supports:

  • Response transformation via jq expressions
  • Request templating with Go templates
  • Global and request-level authentication and headers
  • Multiple authentication mechanisms: OAuth2 (password and client_credentials flows), Bearer tokens, Basic auth, Cookie-based auth, JWT auth, and fully customizable authentication
  • Streaming of top-level entities when operating on array-based root contexts
  • Parallel execution of forEach iterations with configurable concurrency and rate limiting

To simplify development, Silky includes a configuration builder CLI tool, written in Go, that enables real-time execution and inspection of the configuration. This tool helps developers debug and refine their manifests by visualizing intermediate steps.

The library comes with a developer IDE which helps in building, debugging and analyze crawl configuration.

ide


Features

  • Declarative configuration using YAML
  • Supports nested data traversal and merging
  • Powerful context hierarchy system for scoped operations
  • Built-in support for jq and Go templates
  • Multiple authentication types (OAuth2, Basic, Bearer, Cookie, JWT, Custom)
  • Parallel execution support for forEach steps with rate limiting
  • Config builder with live evaluation and inspection
  • Streaming support for root-level arrays

Installation & Development Tools

Silky provides two development tools to help build, test, and debug configurations:

Terminal IDE (TUI)

The terminal-based IDE provides an interactive environment for developing Silky configurations with real-time execution feedback.

Installation:

# Build the IDE
cd cmd/ide && go build -o silky-ide

# Or run directly
cd cmd/ide && go run .

Features:

  • File watcher with auto-restart on configuration changes
  • Step-by-step execution visualization
  • Context inspection at each step
  • Export execution tree to /out folder for debugging
  • Keyboard shortcuts for navigation and control

Usage:

# Run the IDE and select a configuration file
./silky-ide

# Or specify a file directly
./silky-ide path/to/config.silky.yaml

Keyboard Shortcuts:

Key Action
Enter Select step / Expand details
j/k or Up/Down Navigate steps
c View context map
v Set runtime variables (JSON)
r Restart execution
s Stop execution
d Dump execution tree to /out
q Quit
? Show help
VS Code Extension

The VS Code extension provides IDE integration with syntax highlighting, validation, snippets, and execution capabilities.

Installation:

  1. From GitHub Release (Recommended):

    • Go to Releases
    • Download the latest silky-vscode-*.vsix file
    • Install via command line:
      code --install-extension silky-vscode-0.1.0.vsix
      
    • Or install via VS Code UI:
      1. Open VS Code
      2. Go to Extensions (Ctrl+Shift+X)
      3. Click the ... menu at the top of the Extensions panel
      4. Select "Install from VSIX..."
      5. Browse to the downloaded .vsix file
  2. Build from Source:

    # Build the extension
    cd vscode-extension
    npm install
    npm run package
    
    # Install in VS Code
    code --install-extension silky-vscode-*.vsix
    
  3. For Development:

    cd vscode-extension
    npm install
    npm run compile
    # Press F5 in VS Code to launch Extension Development Host
    

Prerequisites:

  • VS Code 1.80.0 or higher
  • YAML extension by Red Hat (will be prompted to install)

Build Prerequisites (only if building from source):

  • Go 1.21+ (for building binaries)
  • Node.js 18+ (for building extension)

Features:

  • JSON Schema validation for .silky.yaml files
  • 37 code snippets with silky- prefix
  • Execution with step-by-step timeline visualization
  • Context and result inspection
  • Integrated profiler view

Usage:

  1. Create a file with .silky.yaml extension
  2. Use snippets: type silky- and select from autocomplete
  3. Click the play button in the editor title bar to run
  4. View execution steps in the Silky sidebar panel
  5. Click on steps to inspect context and results

Configuration:

{
  "silky.autoValidate": true,
  "silky.autoRun": false,
  "silky.maxOutputSize": 10000,
  "silky.collapseSteps": false
}

Context System

Silky's context system is the foundation of its data processing capabilities. Understanding how contexts work is essential for building effective crawl configurations.

Context Hierarchy

When the crawler starts, it initializes a root context containing the initial data structure (either an empty array [] or empty object {}). As steps execute:

  1. ForEach steps create a new child context for each iteration, extracting items from a path in the current context
  2. ForValues steps create an overlay context for each literal value, preserving access to parent context variables
  3. Request steps create a working context with the response data; nested steps operate on the response
  4. Each context has a unique key and maintains a reference to its parent context
  5. All ancestor contexts remain accessible via the context map
Canonical vs Working Contexts

Silky distinguishes between two types of contexts:

  • Canonical contexts: Named contexts that persist throughout execution (e.g., "root", contexts created by forEach with as, contexts created by forValues with as). These are the targets for mergeWithContext operations.
  • Working contexts: Temporary contexts created by request steps to hold response data. When a request executes within a canonical context (like "root"), it creates a working context with a _response_ prefix internally, ensuring the canonical context remains available for merge operations.

This architecture ensures that mergeWithContext: {name: root} always merges into the actual root context, not a cloned copy.

Context Variables

Within templates and jq expressions, you can reference:

  • Named contexts: Access any context by its as name (e.g., .language, .location, .data) in Go templates
  • Special variable $res: In merge rules, refers to the result being merged
  • Special variable $ctx: In transform and merge rules, provides access to the full context map as an object
  • Context map access: Use $ctx.contextName to access any named context from jq expressions
Understanding Parent Context in Merge Operations

It's important to understand what "parent" means for merge operations:

In forEach steps: The parent is the context the forEach is executing within. For example:

rootContext: {items: [{id: 1}, {id: 2}]}
steps:
  - type: forEach
    path: .items
    as: item
    steps:
      - type: request
        mergeWithParentOn: .result = $res  # Parent is ROOT, not .items!

Even though forEach operates on .items path, the parent context for merge is the root context, not the .items array.

In request steps with as: The parent is still the context the request is executing within:

forEach as: language
  request as: data
    mergeWithParentOn: .[$ctx.language.value] = $res  # Parent is "language" context
Merge Strategies

After a step executes, its result can be merged back into a context using several strategies:

  • mergeOn: Merge with current context using a jq expression (e.g., .items = $res)
  • mergeWithParentOn: Merge with immediate parent context using a jq expression (see above for what "parent" means)
  • mergeWithContext: Merge with a named ancestor context (e.g., {name: "facility", rule: ".details = $res"})
  • noopMerge: true: Skip merging entirely (useful when nested steps handle their own merging)
  • Default: If no merge option is specified, arrays are appended and objects are shallow-merged

These options are mutually exclusive - only one can be specified per step.


Environment Variable Expansion

Silky expands environment variables in YAML configuration files before parsing. This allows you to reference secrets, URLs, and environment-specific values without hardcoding them.

Syntax
Syntax Behavior
${VAR} Replaced with the value of VAR, or empty string if unset
${VAR:-default} Replaced with the value of VAR, or default if unset/empty
Example
auth:
  type: bearer
  token: "${API_TOKEN}"

steps:
  - type: request
    request:
      url: "${API_HOST:-https://api.example.com}/v1/data"
      method: GET
    headers:
      X-Custom-Key: "${CUSTOM_KEY}"

Expansion happens on the raw YAML text before parsing, so it works in any field (URLs, headers, body values, auth config, etc.).


Template Functions

All Go template fields (URLs, headers, body values) support the full Sprig function library, providing 100+ functions for string manipulation, math, date formatting, encoding, and more.

Practical Examples

Timestamp formatting (e.g., .NET /Date() pattern):

body:
  request:
    fromData: '{{ printf "/Date(%d000+0000)/" .from }}'
    toData: '{{ printf "/Date(%d000+0000)/" .to }}'

Math operations:

request:
  url: "https://api.example.com/items?offset={{ mul .page 50 }}"

Conditional defaults:

request:
  url: '{{ default "https://api.example.com" .baseUrl }}/data'
  headers:
    X-Format: '{{ default "json" .format | upper }}'

String manipulation:

body:
  name: '{{ .input | trim | upper }}'
  slug: '{{ .title | lower | replace " " "-" }}'

Base64 encoding:

headers:
  Authorization: 'Basic {{ printf "%s:%s" .user .pass | b64enc }}'
Function Reference
Category Functions
String upper, lower, trim, trimPrefix, trimSuffix, replace, contains, hasPrefix, hasSuffix, repeat, nospace, substr, quote
Math add, sub, mul, div, mod, max, min, floor, ceil, round
Date/Time now, date, dateModify, toDate, unixEpoch, dateInZone
Default/Logic default, empty, ternary, coalesce
Type Conversion toString, toInt, toFloat64, toJson, toPrettyJson
Lists list, first, last, join, has, sortAlpha
Encoding b64enc, b64dec
Formatting printf (standard Go), snakecase, camelcase, kebabcase

For the complete function reference, see the Sprig documentation.


Runtime Variables

Silky supports runtime variables that can be injected into your configuration at execution time. This allows you to:

  • Pass API keys, tokens, or secrets without hardcoding them
  • Configure environment-specific values (production, staging, development)
  • Share configurations across different contexts
Variable Injection

Variables are passed as a map[string]any to the Run() method and are accessible in:

  1. URL templates: Use {{ .variableName }}
  2. Headers: Use {{ .variableName }}
  3. Request body: Use {{ .variableName }} (recursively in nested objects)
  4. Merge rules (JQ expressions): Use $ctx.variableName

Variables are injected at the root level of the template context with highest priority, meaning they override any context values with the same name.

CLI Usage
# Pass variables via JSON flag
silky -config my-config.silky.yaml -vars '{"apiKey":"abc123","env":"prod"}'

# Complex variables
silky -config config.yaml -vars '{"auth":{"user":"admin","pass":"secret"},"limit":100}'
Example Configuration
rootContext: {}
steps:
  - type: request
    name: Fetch Data
    request:
      url: "https://api.example.com/{{ .env }}/data"
      method: POST
      headers:
        Authorization: "Bearer {{ .apiKey }}"
        Content-Type: application/json
      body:
        query: "{{ .searchTerm }}"
        filters:
          startDate: "{{ .startFrom }}"
          endDate: "{{ .endTo }}"
    resultTransformer: |
      .items | map(select(.category == $ctx.category))

Run with:

silky -config example.yaml -vars '{"env":"production","apiKey":"secret123","searchTerm":"test","startFrom":"2024-01-01","endTo":"2024-12-31","category":"active"}'
Terminal IDE Usage

Press v to open the variables input modal. Enter JSON and press Save. Variables persist across restarts until modified.

VS Code Extension Usage

The VS Code extension provides a dedicated Variables panel in the Silky sidebar (similar to the Watch panel in debuggers):

  1. Variables Panel - Located under "Execution Steps" in the Silky sidebar

    • Click + to add a new variable
    • Click the edit icon on a variable to modify it
    • Click the trash icon to remove a variable
    • Click "Clear All" to remove all variables
  2. Bulk Edit - Use the command palette: Silky: Set Runtime Variables to enter all variables as JSON

  3. Variables persist across runs until cleared


Context Example

The configuration

rootContext: []

steps:
  - type: request
    name: Fetch Facilities
    request:
      url: https://www.foo-bar/GetFacilities
      method: GET
      headers:
        Accept: application/json
    resultTransformer: |
      [.Facilities[]
        | select(.ReceiptMerchant == "STA – Strutture Trasporto Alto Adige SpA Via dei Conciapelli, 60 39100  Bolzano UID: 00586190217")
      ]
    steps:
      - type: forEach
        path: .
        as: facility
        steps:
          - type: request
            name: Get Facility Free Places
            request:
              url: https://www.foo-bar/FacilityFreePlaces?FacilityID={{ .facility.FacilityId }}
              method: GET
              headers:
                Accept: application/json
            resultTransformer: '[.FreePlaces]'
            mergeOn: .FacilityDetails = $res

          - type: forEach
            path: .subFacilities
            as: sub
            steps:
              - type: request
                name: Get SubFacility Free Places
                request:
                  url: https://www.foo-bar/FacilityFreePlaces?FacilityID={{ .sub.FacilityId }}
                  method: GET
                  headers:
                    Accept: application/json
                resultTransformer: '[.FreePlaces]'
                mergeOn: .SubFacilityDetails = $res

              - type: forEach
                path: .locations
                as: loc
                steps:
                  - type: request
                    name: Get Location Details
                    request:
                      url: https://www.foo-bar/Locations/{{ .loc }}
                      method: GET
                      headers:
                        Accept: application/json
                    mergeWithContext:
                      name: sub
                      rule: ".locationDetails = (.locationDetails // {}) + {($res.id): $res}"

Generates a Context tree like

rootContext: []
│
└── Request: Fetch Facilities
    (result is filtered list of Facilities)
    │
    └── ForEach: facility in [.]
        (new child context per facility)
        │
        ├── Request: Get Facility Free Places
        │   (merges .FacilityDetails into facility context via mergeOn)
        │
        └── ForEach: sub in .subFacilities
            (new child context per sub-facility)
            │
            ├── Request: Get SubFacility Free Places
            │   (merges .SubFacilityDetails into sub context via mergeOn)
            │
            └── ForEach: loc in .locations
                (new child context per location ID)
                │
                └── Request: Get Location Details
                    (merges into sub context under .locationDetails via mergeWithContext)

Configuration Structure

Top-Level Fields
Field Type Description
rootContext [] or {} Required. Initial context for the crawler.
auth AuthenticationStruct Optional. Global authentication configuration.
headers map[string]string Optional. Global headers applied to all requests.
stream boolean Optional. Enable streaming; requires rootContext to be [].
steps Array<ForeachStep|ForValuesStep|RequestStep> Required. List of crawler steps.

AuthenticationStruct

Silky supports multiple authentication mechanisms to handle diverse API authentication patterns.

Common Fields
Field Type Description
type string Required. One of: basic, bearer, oauth, cookie, jwt, custom
Type: basic

HTTP Basic Authentication.

Field Type Required Description
username string Yes Basic auth username
password string Yes Basic auth password

Example:

auth:
  type: basic
  username: myuser
  password: mypassword
Type: bearer

Bearer token authentication.

Field Type Required Description
token string Yes Bearer token

Example:

auth:
  type: bearer
  token: my-api-token-123
Type: oauth

OAuth2 authentication with password or client credentials flow.

Field Type Required When Description
method string Always password or client_credentials
tokenUrl string Always OAuth2 token endpoint URL
clientId string If method == client_credentials OAuth2 client ID
clientSecret string If method == client_credentials OAuth2 client secret
username string If method == password User username
password string If method == password User password
scopes []string Optional OAuth2 scopes

Example (Client Credentials):

auth:
  type: oauth
  method: client_credentials
  tokenUrl: https://api.example.com/oauth/token
  clientId: my-client-id
  clientSecret: my-client-secret
  scopes: [read, write]

Example (Password Flow):

auth:
  type: oauth
  method: password
  tokenUrl: https://api.example.com/oauth/token
  username: user@example.com
  password: userpass
  scopes: [api]

Cookie-based authentication - performs login request, extracts cookie, and injects it in subsequent requests.

Field Type Required Description
loginRequest RequestConfig Yes Login request configuration
extractSelector string Yes Cookie name to extract
maxAgeSeconds int Optional Token refresh interval (0 = no refresh)

Example:

auth:
  type: cookie
  loginRequest:
    url: https://api.example.com/login
    method: POST
    headers:
      Content-Type: application/json
    body:
      username: myuser
      password: mypass
  extractSelector: session_id
  maxAgeSeconds: 3600
Type: jwt

JWT authentication - performs login request, extracts JWT from response, and injects as Bearer token.

Field Type Required Description
loginRequest RequestConfig Yes Login request configuration
extractFrom string Optional header or body (default: body)
extractSelector string Yes Header name or jq expression for token
maxAgeSeconds int Optional Token refresh interval (0 = no refresh)

Example (Extract from Body):

auth:
  type: jwt
  loginRequest:
    url: https://api.example.com/auth/login
    method: POST
    headers:
      Content-Type: application/json
    body:
      email: user@example.com
      password: mypass
  extractFrom: body
  extractSelector: .token
  maxAgeSeconds: 3600

Example (Extract from Header):

auth:
  type: jwt
  loginRequest:
    url: https://api.example.com/auth/login
    method: POST
    headers:
      Content-Type: application/json
    body:
      username: myuser
      password: mypass
  extractFrom: header
  extractSelector: X-Auth-Token
Type: custom

Fully customizable authentication - specify where to extract credentials and where to inject them.

Field Type Required Description
loginRequest RequestConfig Yes Login request configuration
extractFrom string Yes cookie, header, or body
extractSelector string Yes Cookie/header name or jq expression
injectInto string Yes cookie, header, bearer, query, or body
injectKey string If not bearer Cookie/header/query/body field name
maxAgeSeconds int Optional Token refresh interval (0 = no refresh)

Example (Cookie to Custom Header):

auth:
  type: custom
  loginRequest:
    url: https://api.example.com/login
    method: POST
    headers:
      Content-Type: application/json
    body:
      username: myuser
      password: mypass
  extractFrom: cookie
  extractSelector: auth_cookie
  injectInto: header
  injectKey: X-Custom-Auth

Example (Body JSON to Query Parameter):

auth:
  type: custom
  loginRequest:
    url: https://api.example.com/auth
    method: POST
    headers:
      Content-Type: application/json
  extractFrom: body
  extractSelector: .access_token
  injectInto: query
  injectKey: api_key
  maxAgeSeconds: 3600

ForeachStep

Iterates over an array extracted from the current context, creating a new child context for each item.

Field Type Description
type string Required. Must be forEach
name string Optional name for the step
path jq expression Required. Path to the array to iterate over
as string Required. Context name for each item
parallelism ParallelismConfig Optional. Parallel execution configuration
steps Array Optional. Nested steps to execute for each item
mergeWithParentOn jq expression Optional. Rule for merging with parent context
mergeOn jq expression Optional. Rule for merging with current context
mergeWithContext MergeWithContextRule Optional. Advanced merging rule
noopMerge bool Optional. Skip merging (nested steps handle merging)

Note: Only one of mergeWithParentOn, mergeOn, mergeWithContext, or noopMerge can be specified.

Example with Parallel Execution:

- type: forEach
  path: .users
  as: user
  parallelism:
    maxConcurrency: 5
    requestsPerSecond: 10
    burst: 2
  steps:
    - type: request
      # ... fetch user details

ForValuesStep

Iterates over literal values defined in the configuration, creating an overlay context for each value. Unlike forEach, the context variable is set directly to the value (not wrapped in an object).

Field Type Description
type string Required. Must be forValues
name string Optional name for the step
values array<any> Required. Literal values to iterate over
as string Required. Context name for the current value
steps Array Optional. Nested steps to execute for each value

Note: forValues does not support merge options or parallelism. Nested steps handle their own merging. The context variable is accessible directly (e.g., {{ .language }} not {{ .language.value }}).

Example:

- type: forValues
  name: Iterate languages
  values: ["en", "de", "it"]
  as: language
  steps:
    - type: request
      request:
        url: "https://api.example.com/data?lang={{ .language }}"
        method: GET
      mergeWithContext:
        name: root
        rule: ".results += [$res]"

Use Cases:

  • Iterating over a predefined set of values (languages, regions, categories)
  • Matrix-style iteration when nested (e.g., regions × tiers)
  • Preserving parent context variables for nested requests

ParallelismConfig

Controls parallel execution of forEach iterations.

Field Type Description
maxConcurrency int Optional. Maximum concurrent workers (default: 10)
requestsPerSecond float64 Optional. Maximum requests per second for rate limiting
burst int Optional. Burst size for temporary rate exceeding (default: 1)

When parallelism is present on a forEach step, iterations will be executed in parallel using a worker pool. The maxConcurrency setting limits how many iterations run concurrently. Rate limiting is applied if requestsPerSecond is specified.


MergeWithContextRule
Field Type Description
name string Required. Name of ancestor context
rule string Required. jq expression for merge logic

RequestStep

Performs an HTTP request and optionally transforms the response.

Field Type Description
type string Required. Must be request
name string Optional step name
request RequestStruct Required. Request configuration
resultTransformer jq expression Optional transformation of the result
as string Optional. Context name for this request's result (see below)
steps Array<ForeachStep|RequestStep> Optional. Nested steps
mergeWithParentOn jq expression Optional. Rule for merging with parent context
mergeOn jq expression Optional. Rule for merging with current context
mergeWithContext MergeWithContextRule Optional. Advanced merging rule

Note: Only one of mergeWithParentOn, mergeOn, or mergeWithContext can be specified.

Understanding the as Property for Requests

The as property on request steps creates a new sibling context instead of replacing the current context. This is critical when you have nested forEach loops and need inner requests to access outer forEach variables.

The Problem: Context Replacement

Without forValues, a request replaces the current context with its response data:

steps:
  - type: forValues
    values: ["en", "de", "it"]
    as: language                    # Creates "language" overlay context
    steps:
      - type: request               # Creates working context with response
        request:
          url: "https://api.example.com/data?lang={{ .language }}"
        steps:
          - type: forEach
            path: .items
            as: item
            steps:
              - type: request
                request:
                  # With forValues, .language IS accessible here!
                  url: "https://api.example.com/detail?lang={{ .language }}"

Alternative: Using forEach with path-based data

When iterating over data from the context (not literal values), use forEach:

steps:
  - type: forEach
    path: .languages              # Extract from context data
    as: language                  # Creates "language" context per item
    steps:
      - type: request
        request:
          url: "https://api.example.com/locations?lang={{ .language.code }}"
        steps:
          - type: forEach
            path: .
            as: location
            steps:
              - type: request
                request:
                  # Access both language and location contexts
                  url: "https://api.example.com/details?lang={{ .language.code }}&id={{ .location.id }}"

When to Use forValues vs forEach:

  1. forValues: For literal values defined in configuration (languages, regions, categories)
  2. forEach: For iterating over arrays extracted from context data via path

Key Points:

  • forValues creates an overlay context that preserves parent variables - nested steps can access the value directly (e.g., {{ .language }})
  • forEach creates a child context from extracted data - access item properties (e.g., {{ .language.code }})
  • Use $ctx.contextName in jq expressions to access any named context
  • Use mergeWithContext to merge results into canonical contexts like "root"

RequestStruct

Defines an HTTP request configuration.

Field Type Description
url go-template string Required. Request URL with template support
method string (GET | POST) Required. HTTP method
headers map<string, string> Optional headers (use Content-Type here for POST body type)
body map<string, any> Optional request body
pagination PaginationStruct Optional pagination config
auth AuthenticationStruct Optional override authentication

Important: For POST requests with a body, specify Content-Type in the headers map:

request:
  url: https://api.example.com/data
  method: POST
  headers:
    Content-Type: application/json
  body:
    key: value

Supported Content-Types:

  • application/json - Body will be JSON-encoded
  • application/x-www-form-urlencoded - Body will be form-encoded

PaginationStruct

Defines pagination behavior for requests.

Field Type Description
nextPageUrlSelector string Optional (either this or params). Selector for next page URL: body:<jq-expression> or header:<header-name>
params array<PaginationParamsStruct> Optional (either this or nextPageUrlSelector). Pagination parameters
stopOn array<PaginationStopsStruct> Required. Stop conditions

Note: Use either nextPageUrlSelector for next-URL-based pagination OR params for offset/cursor-based pagination.


PaginationParamsStruct
Field Type Description
name string Required. Parameter name
location string Required. One of: query, body, header
type string Required. One of: int, float, datetime, dynamic
format string Required if type == datetime (Go time format)
default string Required. Initial value (must match the type)
increment string Optional. Increment expression (e.g., + 10, +1d)
source string Required if type == dynamic. Format: body:<jq-expr> or header:<name>

Examples:

Integer offset pagination:

pagination:
  params:
    - name: offset
      location: query
      type: int
      default: "0"
      increment: "+ 50"
  stopOn:
    - type: pageNum
      value: 10

Dynamic token pagination:

pagination:
  params:
    - name: cursor
      location: query
      type: dynamic
      source: "body:.pagination.next_cursor"
      default: ""
  stopOn:
    - type: responseBody
      expression: ".pagination.next_cursor == null"

PaginationStopsStruct
Field Type Description
type string Required. One of: responseBody, requestParam, pageNum
expression jq expression Required if type == responseBody. Boolean jq expression
param string Required if type == requestParam. Format: .<location>.<name>
compare string Required if type == requestParam. One of: lt, lte, eq, gt, gte
value any Required if type == requestParam or type == pageNum

Examples:

Stop when response indicates no more pages:

stopOn:
  - type: responseBody
    expression: ".data | length == 0"

Stop when offset reaches limit:

stopOn:
  - type: requestParam
    param: .query.offset
    compare: gte
    value: 1000

Stop after 5 pages:

stopOn:
  - type: pageNum
    value: 5

Parallel Execution

Silky supports parallel execution of forEach iterations, significantly improving performance for I/O-bound operations.

Configuration
- type: forEach
  path: .items
  as: item
  parallel: true           # Enable parallel execution
  maxConcurrency: 10       # Optional: max concurrent workers (default: 10)
  rateLimit:               # Optional: rate limiting
    requestsPerSecond: 5.0
    burst: 2
  steps:
    - type: request
      # ... nested requests execute in parallel
Features
  • Thread-safe merging: All merge operations use mutexes for safe concurrent access
  • Worker pool: Limits concurrent operations to prevent overwhelming APIs
  • Rate limiting: Controls request rate across all workers
  • Deterministic results: Results maintain iteration order even with parallel execution
  • Nested parallelism: Each forEach level can have its own parallelism settings
Best Practices
  1. Use parallelism for I/O-bound operations (API calls, database queries)
  2. Set appropriate maxConcurrency based on target API limits
  3. Always configure rateLimit to respect API rate limits
  4. Monitor for race conditions when merging to shared contexts
  5. Use noopMerge with nested step merges for predictable ordering

Stream Mode

When stream: true is enabled at the top-level, the crawler emits entities incrementally as it processes them. In this mode:

  • rootContext must be an empty array ([])
  • Each result from forEach or request is pushed to the output stream
  • Streaming happens at depth 0 or 1 in the context hierarchy
  • The final result will be an empty array (data is streamed, not accumulated)

Example:

rootContext: []
stream: true

steps:
  - type: request
    # ... fetches list
    steps:
      - type: forEach
        # Each item is streamed as it's processed

Configuration Builder

The CLI utility enables real-time execution of your manifest with step-by-step inspection. It helps:

  • Validate configuration
  • Execute each step and inspect intermediate results
  • Debug jq and template expressions interactively
  • Visualize context hierarchy and data flow
  • Profile execution performance

Examples

The package includes several tests and examples to better understand its usage. The configuration files listed below demonstrate various features.

Feel free to contribute by adding more examples or tests! 🚀


Test Cases

These files are used for automated testing of the paginator and crawler components.

Paginator Tests
Test Description
test1_int_increment.yaml Pagination using simple integer increment
test2_datetime.yaml Pagination based on datetime values
test3_next_token.yaml Pagination using next token from response
test4_empty.yaml Handling empty response
test5_empty_array.yaml Handling response with empty array
test6_now_datetime.yaml Pagination using current datetime
test7_now_datetime_multistop.yaml Multiple stop conditions with datetime
test8_example_pagination_url.yaml Pagination using full next URL
test9_stop_on_iteration.yaml Stop condition based on iteration count
Crawler Tests
Test Description
example.yaml Baseline crawler configuration
example2.yaml Complex crawler with nested requests
example_single.yaml Single non-paginated request
example_foreach_value.yaml ForEach iteration over response values
example_foreach_value_transform_ctx.yaml ForEach with context in transformation
example_foreach_value_stream.yaml ForEach iteration with streaming
example_pagination_next.yaml Pagination using next_url from response
example_pagination_increment.yaml Pagination with incrementing number
example_pagination_increment_stream.yaml Pagination with streaming enabled
example_pagination_increment_nested.yaml Pagination on nested request
post_json_body.yaml POST request with JSON body
post_form_urlencoded.yaml POST request with form-encoded body
post_body_merge_pagination.yaml POST with body, pagination, and merging
Authentication Tests
Test Description
auth_basic.yaml Basic HTTP authentication
auth_bearer.yaml Bearer token authentication
auth_oauth_password.yaml OAuth2 password grant flow
auth_oauth_client_credentials.yaml OAuth2 client credentials flow
auth_cookie.yaml Cookie-based authentication
auth_jwt_body.yaml JWT auth with token from response body
auth_jwt_header.yaml JWT auth with token from response header
auth_custom_cookie_to_header.yaml Custom auth: extract cookie, inject as header
auth_custom_body_to_query.yaml Custom auth: extract from body, inject as query param
auth_mixed_override.yaml Global auth with request-level override
ForValues Tests
Test Description
forvalues_simple.yaml Simple forValues iteration
forvalues_nested.yaml Nested forValues iterations
forvalues_objects.yaml ForValues with object values
request_as_dynamic_keys.yaml Request with dynamic context keys
request_as_context_disconnect.yaml Request context isolation
edge_case_multiple_forvalues.yaml Multiple forValues at same level
edge_case_deep_nesting.yaml Deep nesting edge cases
Parallel Execution Tests
Test Description
parallel/simple.yaml Basic parallel forEach execution
parallel/ratelimited.yaml Parallel execution with rate limiting
parallel/noop_merge.yaml Parallel with noopMerge strategy
parallel/nested_parallel.yaml Nested parallel forEach steps
parallel/multi_root_parallel.yaml Multiple parallel roots
parallel/error_handling.yaml Error handling in parallel execution

Usage Examples

These files provide practical, ready-to-use examples for common crawling patterns.

Example Description
foreach-iteration-not-streamed.yaml Iterating over a list without streaming
list-and-details-paginated-stopped-streamed.yaml Pagination + stop conditions + streaming
pagination-url-not-stream.yaml Pagination using next URL without streaming

Debug & Development

Running Tests
# Run all tests
go test -v ./...

# Run specific test
go test -v -run TestExampleForeachValue

# Run paginator tests only
go test -v -run TestPaginator
Building
# Build all packages
go build -v ./...

# Build the IDE
cd cmd/ide && go build -o silky-ide
Debugging with Delve
# Debug IDE with headless Delve (attach from VS Code or other debugger)
cd cmd/ide && dlv debug ./... --headless=true --listen=:2345 --api-version=2
VS Code Debug Configuration

The repository includes VS Code debug configurations in .vscode/launch.json for attaching to the Delve debugger on port 2345.

Documentation

Index

Constants

View Source
const JQ_CTX_KEY = "$ctx"
View Source
const JQ_RES_KEY = "$res"

Variables

This section is empty.

Functions

func ExpandEnv added in v1.0.9

func ExpandEnv(s string) string

ExpandEnv expands environment variables in a string. Supports ${VAR} and ${VAR:-default} syntax. If VAR is not set or empty:

  • With default (${VAR:-default}): returns the default value
  • Without default (${VAR}): returns empty string

func NewApiCrawler

func NewApiCrawler(configPath string) (*ApiCrawler, []ValidationError, error)

func NormalizeRawQuery added in v1.0.7

func NormalizeRawQuery(raw string) string

NormalizeRawQuery percent-encodes characters that are invalid in a URL query string (spaces, '#', control characters, etc.) while preserving everything that is valid — including literal '+' signs and existing '%XX' sequences.

This operates directly on the raw query string WITHOUT a decode/re-encode round-trip, so '+' is never confused with space.

func NormalizeURL added in v1.0.7

func NormalizeURL(rawURL string) string

NormalizeURL pre-encodes invalid characters in the query portion of a raw URL string, then parses it. This handles externally-sourced URLs (e.g., nextPageUrl from API responses) where query values may contain unencoded '#', '+', or spaces.

The key insight: url.Parse treats '#' as a fragment separator, silently dropping everything after it from the query. By running NormalizeRawQuery on the query portion BEFORE url.Parse, we encode '#' → '%23' so it is preserved.

Returns the original string unchanged if parsing fails.

func QueryParamEncode added in v1.0.7

func QueryParamEncode(value string) string

QueryParamEncode percent-encodes a query parameter value per RFC 3986. Unlike url.QueryEscape (which uses application/x-www-form-urlencoded where space becomes '+'), this encodes space as '%20' and literal '+' as '%2B'.

func SetQueryParams added in v1.0.7

func SetQueryParams(u *url.URL, params map[string]string)

SetQueryParams appends query parameters to u using RFC 3986 encoding. Existing parameters in u.RawQuery are preserved exactly as-is — no decode/re-encode round-trip that would corrupt '+' signs.

func ValidateAndCompile added in v1.0.4

func ValidateAndCompile(cfg Config) (*CompiledConfig, []ValidationError, error)

ValidateAndCompile performs both structural validation and cold-start compilation. This is the preferred entry point for production use as it: 1. Validates the configuration structure 2. Pre-compiles all JQ expressions and templates (fail-fast) 3. Builds the execution topology

Returns the compiled config if successful, or validation errors if any step fails.

Types

type ApiCrawler

type ApiCrawler struct {
	Config         Config
	CompiledConfig *CompiledConfig // Pre-compiled JQ/templates (nil for legacy mode)
	ContextMap     map[string]*Context

	DataStream chan any
	// contains filtered or unexported fields
}

func (*ApiCrawler) EnableProfiler

func (a *ApiCrawler) EnableProfiler() chan StepProfilerData

func (*ApiCrawler) ExecuteStep

func (c *ApiCrawler) ExecuteStep(ctx context.Context, exec *stepExecution) error

func (*ApiCrawler) GetData

func (a *ApiCrawler) GetData() interface{}

func (*ApiCrawler) GetDataStream

func (a *ApiCrawler) GetDataStream() chan interface{}

func (*ApiCrawler) Run

func (c *ApiCrawler) Run(ctx context.Context, vars map[string]any) error

func (*ApiCrawler) SetClient

func (a *ApiCrawler) SetClient(client HTTPClient)

func (*ApiCrawler) SetLogger

func (a *ApiCrawler) SetLogger(logger Logger)

type AuthProfiler

type AuthProfiler struct {
	// contains filtered or unexported fields
}

AuthProfiler is a helper for emitting authentication profiling events

type Authenticator

type Authenticator interface {
	PrepareRequest(req *http.Request, requestID string) error
	SetProfiler(profiler chan StepProfilerData)
}

func NewAuthenticator

func NewAuthenticator(config AuthenticatorConfig, httpClient HTTPClient) Authenticator

NewAuthenticator creates an authenticator based on the configuration

type AuthenticatorConfig

type AuthenticatorConfig struct {
	Type string `yaml:"type,omitempty" json:"type,omitempty"` // basic | bearer | oauth | cookie | jwt | custom

	// Basic auth
	Username string `yaml:"username,omitempty" json:"username,omitempty"`
	Password string `yaml:"password,omitempty" json:"password,omitempty"`

	// Bearer auth
	Token string `yaml:"token,omitempty" json:"token,omitempty"`

	// OAuth (inlined for backward compatibility)
	OAuthConfig `yaml:",inline" json:",inline"`

	// Cookie/JWT/Custom auth
	LoginRequest    *RequestConfig `yaml:"loginRequest,omitempty" json:"loginRequest,omitempty"`
	ExtractFrom     string         `yaml:"extractFrom,omitempty" json:"extractFrom,omitempty"`         // cookie | header | body
	ExtractSelector string         `yaml:"extractSelector,omitempty" json:"extractSelector,omitempty"` // jq for body, name for cookie/header
	InjectInto      string         `yaml:"injectInto,omitempty" json:"injectInto,omitempty"`           // cookie | header | bearer | body | query
	InjectKey       string         `yaml:"injectKey,omitempty" json:"injectKey,omitempty"`             // name for cookie/header/query/body field

	// Refresh settings
	MaxAgeSeconds int `yaml:"maxAgeSeconds,omitempty" json:"maxAgeSeconds,omitempty"` // 0 = no refresh
}

type BaseAuthenticator

type BaseAuthenticator struct {
	// contains filtered or unexported fields
}

func (*BaseAuthenticator) GetProfiler

func (a *BaseAuthenticator) GetProfiler() *AuthProfiler

func (*BaseAuthenticator) SetProfiler

func (a *BaseAuthenticator) SetProfiler(profiler chan StepProfilerData)

type BasicAuthenticator

type BasicAuthenticator struct {
	*BaseAuthenticator
	// contains filtered or unexported fields
}

BasicAuthenticator - HTTP Basic Authentication

func (*BasicAuthenticator) PrepareRequest

func (a *BasicAuthenticator) PrepareRequest(req *http.Request, requestID string) error

type BearerAuthenticator

type BearerAuthenticator struct {
	*BaseAuthenticator
	// contains filtered or unexported fields
}

BearerAuthenticator - Bearer token authentication

func (*BearerAuthenticator) PrepareRequest

func (a *BearerAuthenticator) PrepareRequest(req *http.Request, requestID string) error

func (*BearerAuthenticator) SetProfiler

func (a *BearerAuthenticator) SetProfiler(profiler chan StepProfilerData)

type CompiledBodyTemplates added in v1.0.4

type CompiledBodyTemplates struct {
	Templates map[string]*CompiledBodyValue
}

CompiledBodyTemplates holds compiled templates for a request body. Structure mirrors the body map structure with templates replacing string values.

type CompiledBodyValue added in v1.0.4

type CompiledBodyValue struct {
	// One of the following will be set:
	StringTemplate *CompiledTemplate             // For string values with templates
	Literal        any                           // For non-template values (numbers, bools, etc.)
	Map            map[string]*CompiledBodyValue // For nested objects
	Array          []*CompiledBodyValue          // For arrays
}

CompiledBodyValue represents a body value that may contain templates. It mirrors the recursive structure of request bodies.

func (*CompiledBodyValue) Execute added in v1.0.4

func (c *CompiledBodyValue) Execute(ctx map[string]any) (any, error)

Execute expands all templates in the body value recursively.

type CompiledConfig added in v1.0.4

type CompiledConfig struct {
	Config                Config                       // Original configuration
	Steps                 map[string]*CompiledStep     // Pre-compiled steps keyed by step path
	Topology              *StepTopology                // Step execution topology
	GlobalHeaderTemplates map[string]*CompiledTemplate // Pre-compiled global header templates
}

CompiledConfig holds the fully compiled configuration. This is the result of ValidateAndCompile and eliminates all runtime compilation.

func CompileConfig added in v1.0.4

func CompileConfig(cfg Config) (*CompiledConfig, error)

CompileConfig compiles all steps in a configuration. This is the main entry point for cold-start compilation.

func (*CompiledConfig) ExecuteGlobalHeaders added in v1.0.4

func (cc *CompiledConfig) ExecuteGlobalHeaders(ctx map[string]any) (map[string]string, error)

ExecuteGlobalHeaders renders all global header templates with the given context. Returns a map of header names to expanded values.

func (*CompiledConfig) GetCompiledStep added in v1.0.4

func (cc *CompiledConfig) GetCompiledStep(path string) *CompiledStep

GetCompiledStep retrieves a pre-compiled step by its path.

type CompiledJQ added in v1.0.4

type CompiledJQ struct {
	Code       *gojq.Code
	Expression string   // Original expression for error messages
	Variables  []string // Variable names (e.g., ["$res", "$ctx"])
	UsedPaths  []string // Paths referenced in the expression (for selective context)
}

CompiledJQ holds a pre-compiled JQ expression with its metadata. Created at validation time to enable fail-fast and avoid runtime mutex contention.

func (*CompiledJQ) Run added in v1.0.4

func (c *CompiledJQ) Run(input any, variables ...any) (any, error)

Run executes the compiled JQ expression against the input data.

func (*CompiledJQ) RunArray added in v1.0.4

func (c *CompiledJQ) RunArray(input any) ([]interface{}, error)

RunArray executes the JQ expression and returns results as an array. Handles jq expressions that emit items one-by-one or as arrays.

func (*CompiledJQ) RunSingle added in v1.0.4

func (c *CompiledJQ) RunSingle(input any, variables ...any) (any, error)

RunSingle executes the JQ expression and expects exactly one result.

type CompiledMerge added in v1.0.4

type CompiledMerge struct {
	Rule       *CompiledJQ // The JQ expression to apply
	Target     MergeTarget // Which context to merge into
	TargetName string      // Context name (only for MergeTargetNamed)
	SourceRule string      // Original rule string for profiling
}

CompiledMerge holds a unified merge operation compiled from any of: - mergeOn (target: current) - mergeWithParentOn (target: parent) - mergeWithContext (target: named)

type CompiledStep added in v1.0.4

type CompiledStep struct {
	StepPath string // Unique path to this step (e.g., "steps[0].steps[1]")

	// Request step compilations
	URLTemplate     *CompiledTemplate            // URL template
	HeaderTemplates map[string]*CompiledTemplate // Header value templates
	BodyTemplates   *CompiledBodyTemplates       // Body value templates

	// Transform and merge compilations
	ResultTransformer *CompiledJQ    // Response transformation (.resultTransformer)
	Merge             *CompiledMerge // Unified merge (from mergeOn/mergeWithParentOn/mergeWithContext)

	// ForEach compilations
	PathExtractor  *CompiledJQ // Path extraction for forEach (.path)
	SyntheticMerge *CompiledJQ // Default forEach merge: path + " = $new"

	// Nested steps (pre-compiled recursively)
	NestedSteps []*CompiledStep
}

CompiledStep holds all pre-compiled expressions for a single step. This eliminates runtime compilation and its associated mutex contention.

func CompileStep added in v1.0.4

func CompileStep(step *Step, stepPath string) (*CompiledStep, []string, error)

CompileStep compiles all expressions in a step and its nested steps. stepPath is used for error messages and step lookup.

func (*CompiledStep) ExecuteBodyTemplates added in v1.0.4

func (cs *CompiledStep) ExecuteBodyTemplates(ctx map[string]any) (map[string]any, error)

ExecuteBodyTemplates expands all body templates with the given context.

func (*CompiledStep) ExecuteHeaderTemplates added in v1.0.4

func (cs *CompiledStep) ExecuteHeaderTemplates(ctx map[string]any) (map[string]string, error)

ExecuteHeaderTemplates renders all header templates with the given context.

func (*CompiledStep) ExecutePathExtractor added in v1.0.4

func (cs *CompiledStep) ExecutePathExtractor(data any) ([]interface{}, error)

ExecutePathExtractor extracts items from context data for forEach iteration.

func (*CompiledStep) ExecuteResultTransformer added in v1.0.4

func (cs *CompiledStep) ExecuteResultTransformer(input any, templateCtx map[string]any) (any, error)

ExecuteResultTransformer transforms the response using the pre-compiled JQ expression.

func (*CompiledStep) ExecuteSyntheticMerge added in v1.0.4

func (cs *CompiledStep) ExecuteSyntheticMerge(contextData any, newValue any) (any, error)

ExecuteSyntheticMerge applies the default forEach merge (path = $new).

func (*CompiledStep) ExecuteURLTemplate added in v1.0.4

func (cs *CompiledStep) ExecuteURLTemplate(ctx map[string]any, defaultUrl string) (string, error)

ExecuteURLTemplate renders the URL with the given context.

type CompiledTemplate added in v1.0.4

type CompiledTemplate struct {
	Template   *template.Template
	Source     string   // Original template string for error messages
	UsedFields []string // Fields referenced in the template (for selective context)
}

CompiledTemplate holds a pre-compiled Go template with its metadata. Created at validation time to enable fail-fast and avoid runtime mutex contention.

func (*CompiledTemplate) Execute added in v1.0.4

func (c *CompiledTemplate) Execute(ctx map[string]any) (string, error)

Execute renders the template with the given context.

type Config

type Config struct {
	Steps          []Step               `yaml:"steps" json:"steps"`
	RootContext    interface{}          `yaml:"rootContext" json:"rootContext"`
	Authentication *AuthenticatorConfig `yaml:"auth,omitempty" json:"auth,omitempty"`
	Headers        map[string]string    `yaml:"headers,omitempty" json:"headers,omitempty"`
	Stream         bool                 `yaml:"stream,omitempty" json:"stream,omitempty"`
}

type ConfigP

type ConfigP struct {
	Pagination Pagination `yaml:"pagination"`
}

type Context

type Context struct {
	Data          interface{}
	ParentContext string
	// contains filtered or unexported fields
}

type ContextData

type ContextData struct {
	Data          any    `json:"data"`
	ParentContext string `json:"parentContext"`
	Depth         int    `json:"depth"`
	Key           string `json:"key"`
}

ContextData represents a single context in a snapshot

type CookieAuthenticator

type CookieAuthenticator struct {
	*BaseAuthenticator
	// contains filtered or unexported fields
}

CookieAuthenticator - performs login via POST, extracts cookie, injects it

func (*CookieAuthenticator) PrepareRequest

func (a *CookieAuthenticator) PrepareRequest(req *http.Request, requestID string) error

type CustomAuthenticator

type CustomAuthenticator struct {
	*BaseAuthenticator
	// contains filtered or unexported fields
}

CustomAuthenticator - fully configurable authenticator

func (*CustomAuthenticator) PrepareRequest

func (a *CustomAuthenticator) PrepareRequest(req *http.Request, requestID string) error

type HTTPClient

type HTTPClient interface {
	Do(req *http.Request) (*http.Response, error)
}

type JWTAuthenticator

type JWTAuthenticator struct {
	*BaseAuthenticator
	// contains filtered or unexported fields
}

JWTAuthenticator - performs login via POST, extracts JWT from response

func (*JWTAuthenticator) PrepareRequest

func (a *JWTAuthenticator) PrepareRequest(req *http.Request, requestID string) error

type Logger

type Logger interface {
	Debug(msg string, args ...any)
	Info(msg string, args ...any)
	Warning(msg string, args ...any)
	Error(msg string, args ...any)
}

func NewDefaultLogger

func NewDefaultLogger() Logger

func NewNoopLogger

func NewNoopLogger() Logger

type MergeEventData

type MergeEventData struct {
	CurrentContextKey   string
	TargetContextKey    string
	MergeRule           string
	TargetContextBefore any
	TargetContextAfter  any
	ContextMap          map[string]*Context
}

MergeEventData holds data for context merge event

type MergeTarget added in v1.0.4

type MergeTarget int

MergeTarget indicates which context a merge operation should target.

const (
	// MergeTargetCurrent merges into the current/working context
	MergeTargetCurrent MergeTarget = iota
	// MergeTargetParent merges into the parent context
	MergeTargetParent
	// MergeTargetNamed merges into a named context (specified by TargetName)
	MergeTargetNamed
)

func (MergeTarget) String added in v1.0.4

func (t MergeTarget) String() string

String returns a human-readable name for the merge target.

type MergeWithContextRule

type MergeWithContextRule struct {
	Name string `yaml:"name"`
	Rule string `yaml:"rule"`
}

type NoopAuthenticator

type NoopAuthenticator struct {
	*BaseAuthenticator
}

NoopAuthenticator - no authentication

func (NoopAuthenticator) PrepareRequest

func (np NoopAuthenticator) PrepareRequest(req *http.Request, requestID string) error

type OAuthAuthenticator

type OAuthAuthenticator struct {
	*BaseAuthenticator
	// contains filtered or unexported fields
}

OAuthAuthenticator - OAuth2 authentication

func (*OAuthAuthenticator) GetToken

func (a *OAuthAuthenticator) GetToken(requestID string) (string, error)

GetToken retrieves a valid access token (refreshing if necessary)

func (*OAuthAuthenticator) GetTokenWithCache

func (a *OAuthAuthenticator) GetTokenWithCache(requestID string) (string, bool, error)

GetTokenWithCache retrieves a valid access token and returns whether it was cached

func (*OAuthAuthenticator) PrepareRequest

func (a *OAuthAuthenticator) PrepareRequest(req *http.Request, requestID string) error

type OAuthConfig

type OAuthConfig struct {
	Method       string `yaml:"method,omitempty" json:"method,omitempty"` // password | client_credentials
	TokenURL     string `yaml:"tokenUrl,omitempty" json:"tokenUrl,omitempty"`
	ClientID     string `yaml:"clientId,omitempty" json:"clientId,omitempty"`
	ClientSecret string `yaml:"clientSecret,omitempty" json:"clientSecret,omitempty"`
	// usernam and password inherited from AuthenticatorConfig
	Scopes []string `yaml:"scopes,omitempty" json:"scopes,omitempty"`
}

type Pagination

type Pagination struct {
	NextPageUrlSelector string          `yaml:"nextPageUrlSelector,omitempty" json:"nextPageUrlSelector,omitempty"` // jq selector to get nextPage url
	Params              []Param         `yaml:"params,omitempty" json:"params,omitempty"`
	StopOn              []StopCondition `yaml:"stopOn,omitempty" json:"stopOn,omitempty"`
}

type PaginationContext

type PaginationContext map[string]interface{}

type PaginationEvalData

type PaginationEvalData struct {
	PageNumber           int
	PaginationConfig     Pagination
	PreviousResponseBody any
	PreviousHeaders      map[string]string
	PreviousState        map[string]any
	AfterState           map[string]any
}

PaginationEvalData holds data for pagination evaluation event

type Paginator

type Paginator struct {
	// contains filtered or unexported fields
}

func NewPaginator

func NewPaginator(cfg ConfigP) (*Paginator, error)

NewPaginator creates a new paginator from YAML config

func NewPaginatorFromFile

func NewPaginatorFromFile(yamlData []byte) (*Paginator, error)

NewPaginatorFromFile creates a new paginator from YAML config

func (*Paginator) Ctx

func (p *Paginator) Ctx() PaginationContext

func (*Paginator) Next

func (p *Paginator) Next(resp *http.Response) (*RequestParts, bool, error)

Next advances the paginator and returns query/body/header params for the next request

func (*Paginator) NextFromCtx

func (p *Paginator) NextFromCtx() *RequestParts

func (*Paginator) PageNum

func (p *Paginator) PageNum() int

type ParallelismConfig

type ParallelismConfig struct {
	MaxConcurrency    int     `yaml:"maxConcurrency,omitempty" json:"maxConcurrency,omitempty"`
	RequestsPerSecond float64 `yaml:"requestsPerSecond,omitempty" json:"requestsPerSecond,omitempty"`
	Burst             int     `yaml:"burst,omitempty" json:"burst,omitempty"`
}

type ParallelismSetupData

type ParallelismSetupData struct {
	MaxConcurrency int
	WorkerPoolID   string
	WorkerIDs      []int
	RateLimit      float64
	Burst          int
}

ParallelismSetupData holds data for parallelism setup event

type Param

type Param struct {
	Name      string `yaml:"name" json:"name"`
	Location  string `yaml:"location" json:"location"` // "query", "body", "header"
	Type      string `yaml:"type" json:"type"`         // "int", "float", "datetime", "dynamic"`
	Format    string `yaml:"format,omitempty" json:"format,omitempty"`
	Default   string `yaml:"default" json:"default"`
	Increment string `yaml:"increment,omitempty" json:"increment,omitempty"`
	Source    string `yaml:"source,omitempty" json:"source,omitempty"` // "body:selector" or "header:selector"
}

type ProfileEventType

type ProfileEventType int
const (
	// Root
	EVENT_ROOT_START ProfileEventType = iota

	// Request step container
	EVENT_REQUEST_STEP_START
	EVENT_REQUEST_STEP_END

	// Request step sub-events
	EVENT_CONTEXT_SELECTION
	EVENT_REQUEST_PAGE_START
	EVENT_REQUEST_PAGE_END
	EVENT_PAGINATION_EVAL
	EVENT_URL_COMPOSITION
	EVENT_REQUEST_DETAILS
	EVENT_REQUEST_RESPONSE
	EVENT_RESPONSE_TRANSFORM
	EVENT_CONTEXT_MERGE

	// ForEach step container
	EVENT_FOREACH_STEP_START
	EVENT_FOREACH_STEP_END

	// ForValues step container
	EVENT_FORVALUES_STEP_START
	EVENT_FORVALUES_STEP_END

	// ForEach/ForValues step sub-events
	EVENT_PARALLELISM_SETUP
	EVENT_ITEM_SELECTION

	// Authentication events
	EVENT_AUTH_START
	EVENT_AUTH_CACHED
	EVENT_AUTH_LOGIN_START
	EVENT_AUTH_LOGIN_END
	EVENT_AUTH_TOKEN_EXTRACT
	EVENT_AUTH_TOKEN_INJECT
	EVENT_AUTH_END

	// Result events
	EVENT_RESULT
	EVENT_STREAM_RESULT

	// Errors
	EVENT_ERROR
)

type Profiler

type Profiler struct {
	// contains filtered or unexported fields
}

Profiler wraps the profiler channel and provides methods for emitting events. All methods are safe to call even when the profiler is disabled (nil channel).

func NewProfiler

func NewProfiler(mergeMutex *sync.Mutex) *Profiler

NewProfiler creates a new Profiler instance

func (*Profiler) Channel

func (p *Profiler) Channel() chan StepProfilerData

Channel returns the underlying channel for consumers to read from

func (*Profiler) EmitContextMerge

func (p *Profiler) EmitContextMerge(pageID string, step Step, data MergeEventData)

EmitContextMerge emits context merge event NOT THREAD SAFE

func (*Profiler) EmitContextSelection

func (p *Profiler) EmitContextSelection(parentID string, step Step, contextKey string, contextMap map[string]*Context)

EmitContextSelection emits context selection event

func (*Profiler) EmitContextSelectionWithWorker

func (p *Profiler) EmitContextSelectionWithWorker(parentID string, step Step, contextKey string, contextMap map[string]*Context, workerID int, workerPool string)

EmitContextSelectionWithWorker emits context selection event with worker tracking

func (*Profiler) EmitError

func (p *Profiler) EmitError(name string, parentID string, err string)

EmitError emits an error event

func (*Profiler) EmitFinalResult

func (p *Profiler) EmitFinalResult(rootID string, result any)

EmitFinalResult emits the final result event (non-streaming mode)

func (*Profiler) EmitForEachStepEnd

func (p *Profiler) EmitForEachStepEnd(stepID string, parentID string, step Step, startTime time.Time)

EmitForEachStepEnd emits forEach step end event

func (*Profiler) EmitForEachStepStart

func (p *Profiler) EmitForEachStepStart(step Step, parentID string) string

EmitForEachStepStart emits forEach step start and returns the step ID

func (*Profiler) EmitForValuesStepEnd

func (p *Profiler) EmitForValuesStepEnd(stepID string, parentID string, step Step, startTime time.Time)

EmitForValuesStepEnd emits forValues step end event

func (*Profiler) EmitForValuesStepStart

func (p *Profiler) EmitForValuesStepStart(step Step, parentID string) string

EmitForValuesStepStart emits forValues step start and returns the step ID

func (*Profiler) EmitItemSelection

func (p *Profiler) EmitItemSelection(parentID string, step Step, index int, item any, contextKey string, contextData any) string

EmitItemSelection emits item selection event and returns the item ID

func (*Profiler) EmitItemSelectionWithWorker

func (p *Profiler) EmitItemSelectionWithWorker(parentID string, step Step, index int, item any, contextKey string, contextData any, workerID int, workerPool string) string

EmitItemSelectionWithWorker emits item selection event with worker tracking

func (*Profiler) EmitPaginationEval

func (p *Profiler) EmitPaginationEval(pageID string, step Step, data PaginationEvalData)

EmitPaginationEval emits pagination evaluation event

func (*Profiler) EmitParallelismSetup

func (p *Profiler) EmitParallelismSetup(parentID string, step Step, data ParallelismSetupData)

EmitParallelismSetup emits parallelism setup event

func (*Profiler) EmitRequestDetails

func (p *Profiler) EmitRequestDetails(pageID string, step Step, data RequestDetailsData)

EmitRequestDetails emits request details event

func (*Profiler) EmitRequestPageEnd

func (p *Profiler) EmitRequestPageEnd(pageID string, stepID string, step Step, pageNum int, startTime time.Time)

EmitRequestPageEnd emits page end event

func (*Profiler) EmitRequestPageStart

func (p *Profiler) EmitRequestPageStart(stepID string, step Step, pageNum int) string

EmitRequestPageStart emits page start and returns the page ID

func (*Profiler) EmitRequestResponse

func (p *Profiler) EmitRequestResponse(pageID string, step Step, data ResponseData)

EmitRequestResponse emits request response event

func (*Profiler) EmitRequestStepEnd

func (p *Profiler) EmitRequestStepEnd(stepID string, parentID string, step Step, startTime time.Time)

EmitRequestStepEnd emits request step end event

func (*Profiler) EmitRequestStepStart

func (p *Profiler) EmitRequestStepStart(step Step, parentID string) string

EmitRequestStepStart emits request step start and returns the step ID

func (*Profiler) EmitResponseTransform

func (p *Profiler) EmitResponseTransform(pageID string, step Step, rule string, before any, after any)

EmitResponseTransform emits response transform event

func (*Profiler) EmitRootStart

func (p *Profiler) EmitRootStart(config Config, contextMap map[string]*Context) string

EmitRootStart emits the root start event and returns the root ID

func (*Profiler) EmitStreamResult

func (p *Profiler) EmitStreamResult(parentID string, step Step, entity any, index int)

EmitStreamResult emits stream result event

func (*Profiler) EmitURLComposition

func (p *Profiler) EmitURLComposition(pageID string, step Step, data URLCompositionData)

EmitURLComposition emits URL composition event

func (*Profiler) EmitValueSelection

func (p *Profiler) EmitValueSelection(parentID string, step Step, index int, value any, contextKey string) string

EmitValueSelection emits value selection event (for forValues) and returns the item ID

func (*Profiler) Enabled

func (p *Profiler) Enabled() bool

Enabled returns whether profiling is enabled

type RequestConfig

type RequestConfig struct {
	URL            string               `yaml:"url" json:"url"`
	Method         string               `yaml:"method" json:"method"`
	Headers        map[string]string    `yaml:"headers,omitempty" json:"headers,omitempty"`
	Body           map[string]any       `yaml:"body,omitempty" json:"body,omitempty"`
	Pagination     Pagination           `yaml:"pagination,omitempty" json:"pagination,omitempty"`
	Authentication *AuthenticatorConfig `yaml:"auth,omitempty" json:"auth,omitempty"`
}

type RequestDetailsData

type RequestDetailsData struct {
	CurlCommand string
	Method      string
	URL         string
	Headers     map[string]string
	Body        map[string]interface{}
}

RequestDetailsData holds data for request details event

type RequestParts

type RequestParts struct {
	QueryParams map[string]string      `yaml:"queryParams"`
	BodyParams  map[string]interface{} `yaml:"bodyParams"`
	Headers     map[string]string      `yaml:"headers"`
	NextPageUrl string                 `yaml:"nextPageUrl"`
}

type ResponseData

type ResponseData struct {
	StatusCode   int
	Headers      map[string]string
	Body         any
	ResponseSize int
	DurationMs   int64
}

ResponseData holds data for response event

type Step

type Step struct {
	Type              string                `yaml:"type" json:"type"`
	Name              string                `yaml:"name,omitempty" json:"name,omitempty"`
	Path              string                `yaml:"path,omitempty" json:"path,omitempty"`
	As                string                `yaml:"as,omitempty" json:"as,omitempty"`
	Values            []interface{}         `yaml:"values,omitempty" json:"values,omitempty"`
	Steps             []Step                `yaml:"steps,omitempty" json:"steps,omitempty"`
	Request           *RequestConfig        `yaml:"request,omitempty" json:"request,omitempty"`
	ResultTransformer string                `yaml:"resultTransformer,omitempty" json:"resultTransformer,omitempty"`
	MergeWithParentOn string                `yaml:"mergeWithParentOn,omitempty" json:"mergeWithParentOn,omitempty"`
	MergeOn           string                `yaml:"mergeOn,omitempty" json:"mergeOn,omitempty"`
	MergeWithContext  *MergeWithContextRule `yaml:"mergeWithContext,omitempty" json:"mergeWithContext,omitempty"`
	NoopMerge         bool                  `yaml:"noopMerge,omitempty" json:"noopMerge,omitempty"`
	Parallelism       *ParallelismConfig    `yaml:"parallelism,omitempty" json:"parallelism,omitempty"`
}

type StepNode added in v1.0.4

type StepNode struct {
	StepPath string // Unique path (e.g., "steps[0].steps[1]")
	StepRef  *Step  // Reference to the original step definition
	Parent   *StepNode
	Children []*StepNode
	Depth    int // Distance from root (0 = top-level step)

	// Execution characteristics
	IsParallel bool // forEach with parallelism config
	IsForEach  bool // forEach or forValues step
	HasMerge   bool // Step has explicit merge rule

	// Streaming characteristics
	MergesToRoot bool   // Merge target is root or depth <= 1 (triggers streaming)
	MergeTarget  string // Context name that this step merges to
}

StepNode represents a step in the execution topology tree. Used for planning execution order, determining parallelism, and identifying when to emit streamed results.

func (*StepNode) GetMergeTargetNode added in v1.0.4

func (n *StepNode) GetMergeTargetNode(topology *StepTopology) *StepNode

GetMergeTargetNode returns the node that this step merges to.

func (*StepNode) IsStreamingPoint added in v1.0.4

func (n *StepNode) IsStreamingPoint() bool

IsStreamingPoint returns true if this step should trigger streaming when it completes (merges to root at depth <= 1).

func (*StepNode) String added in v1.0.4

func (n *StepNode) String() string

NodeString returns a human-readable representation of a node.

type StepProfilerData

type StepProfilerData struct {
	// Core identification
	ID       string           `json:"id"`
	ParentID string           `json:"parentId,omitempty"`
	Type     ProfileEventType `json:"type"`
	Name     string           `json:"name"`
	Step     Step             `json:"step"`

	// Timeline
	Timestamp time.Time `json:"timestamp"`
	Duration  int64     `json:"durationMs,omitempty"` // Only in END events

	// Worker tracking (for parallel execution)
	WorkerID   int    `json:"workerId,omitempty"`
	WorkerPool string `json:"workerPool,omitempty"`

	// Flexible event-specific data
	Data map[string]any `json:"data"`
}

type StepTopology added in v1.0.4

type StepTopology struct {
	// Root nodes (top-level steps)
	Roots []*StepNode

	// All nodes indexed by path for O(1) lookup
	ByPath map[string]*StepNode

	// Statistics
	MaxDepth      int
	TotalSteps    int
	ParallelSteps int
}

StepTopology holds the complete step execution topology. Built at validation time from the configuration.

func BuildTopology added in v1.0.4

func BuildTopology(cfg Config) *StepTopology

BuildTopology constructs the step execution topology from a configuration. This analyzes the step hierarchy and determines execution characteristics.

func (*StepTopology) GetNode added in v1.0.4

func (t *StepTopology) GetNode(path string) *StepNode

GetNode retrieves a step node by its path.

func (*StepTopology) GetParallelBranches added in v1.0.4

func (t *StepTopology) GetParallelBranches() []*StepNode

GetParallelBranches returns all nodes that execute in parallel.

func (*StepTopology) GetStreamingPoints added in v1.0.4

func (t *StepTopology) GetStreamingPoints() []*StepNode

GetStreamingPoints returns all nodes that should trigger streaming.

func (*StepTopology) String added in v1.0.4

func (t *StepTopology) String() string

String returns a human-readable representation of the topology.

func (*StepTopology) WalkPostOrder added in v1.0.4

func (t *StepTopology) WalkPostOrder(fn func(*StepNode) bool)

WalkPostOrder visits all nodes in post-order (children before parent).

func (*StepTopology) WalkPreOrder added in v1.0.4

func (t *StepTopology) WalkPreOrder(fn func(*StepNode) bool)

WalkPreOrder visits all nodes in pre-order (parent before children).

type StopCondition

type StopCondition struct {
	Type       string `yaml:"type" json:"type"`             // "responseBody", "requestParam", "pageNum"
	Expression string `yaml:"expression" json:"expression"` // used by jq

	Param   string `yaml:"param,omitempty" json:"param,omitempty"`     // for requestParam
	Compare string `yaml:"compare,omitempty" json:"compare,omitempty"` // "lt", "lte", "eq", "gt", "gte"
	Value   any    `yaml:"value,omitempty" json:"value,omitempty"`     // value to compare against
}

type URLCompositionData

type URLCompositionData struct {
	URLTemplate     string
	PageNumber      int
	QueryParams     map[string]string
	BodyParams      map[string]interface{}
	NextPageURL     string
	TemplateContext map[string]any
	ResultURL       string
	ResultHeaders   map[string]string
	ResultBody      interface{}
}

URLCompositionData holds data for URL composition event

type ValidationError

type ValidationError struct {
	Message  string
	Location string // optional, e.g. "steps[0].request.url"
}

func ValidateConfig

func ValidateConfig(cfg Config) []ValidationError

func (ValidationError) Error

func (e ValidationError) Error() string

Directories

Path Synopsis
cmd
cli command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL