webinfo

package module
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 12, 2026 License: Apache-2.0 Imports: 23 Imported by: 0

README

webinfo -- Extract metadata from web pages

ci status codeql status GitHub license GitHub release Go reference

webinfo extracts common metadata (title, description, canonical, image, etc.) from web pages and provides helpers to download images and generate thumbnails.

Design goals

  • Keep metadata extraction simple and deterministic.
  • Use clear precedence rules for HTML/meta parsing.
  • Provide practical image utilities with minimal API surface.
  • Keep context-aware network operations as the default style.

Development

Requirements
  • Go 1.25.10 or later
  • Task command (local tool for this repository)
Local validation
task test
task govulncheck

Run all maintenance tasks:

task

CI Workflows

  • ci: lint (golangci-lint with gosec), tests, and govulncheck
  • CodeQL: scheduled and push/PR static analysis

Usage

Install and import
go get github.com/goark/webinfo@latest
import "github.com/goark/webinfo"
Fetch metadata
ctx := context.Background()
info, err := webinfo.Fetch(ctx, "https://example.com", "")
if err != nil {
  return err
}
fmt.Println(info.Title, info.Description)
Download image and thumbnail
imgPath, err := info.DownloadImage(ctx, "images", true)
if err != nil {
  return err
}

thumbPath, err := info.DownloadThumbnail(ctx, "thumbnails", 150, false)
if err != nil {
  return err
}

imgBytes, err := info.ImageBytes(ctx)
if err != nil {
  return err
}
fmt.Println(len(imgBytes))
Public API
  • Fetch(ctx, rawURL, userAgent) extracts metadata from a page.
  • (*Webinfo).ImageBytes(ctx) downloads Webinfo.ImageURL into memory.
  • (*Webinfo).DownloadImage(ctx, destDir, temporary) downloads Webinfo.ImageURL.
  • (*Webinfo).DownloadThumbnail(ctx, destDir, width, temporary) creates a resized thumbnail.

Behavior notes

  • Fetch uses explicit precedence for metadata extraction:
    • title: title -> twitter:title -> og:title
    • description: meta[name=description] -> twitter:description -> og:description
    • image: twitter:image -> og:image
  • DownloadImage resolves extension in this order:
    1. URL path extension
    2. response Content-Type
    3. sniff first 512 bytes (http.DetectContentType)
    4. fallback .img
  • DownloadThumbnail uses width 150 when width <= 0.
  • ImageBytes reads the full response body into memory; very large images can increase memory usage.

Error handling

This package wraps errors with github.com/goark/errs and attaches context values such as url, path, and dir.

Modules Requirement Graph

dependency.png

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrNullPointer = errors.New("null reference instance")
	ErrNoImageURL  = errors.New("no image URL")
	ErrInvalidURL  = errors.New("invalid URL")
)

Functions

This section is empty.

Types

type Webinfo

type Webinfo struct {
	URL         string `json:"url,omitempty"`         // Original page URL
	Location    string `json:"location,omitempty"`    // Location
	Canonical   string `json:"canonical,omitempty"`   // Canonical URL
	Title       string `json:"title,omitempty"`       // Page title
	Description string `json:"description,omitempty"` // Meta description
	ImageURL    string `json:"image_url,omitempty"`   // Representative image URL
	UserAgent   string `json:"user_agent,omitempty"`  // User-Agent used to fetch the page
}

Webinfo stores metadata extracted from a web page and values used for follow-up image download operations.

func Fetch

func Fetch(ctx context.Context, urlStr, userAgent string) (info *Webinfo, err error)

Fetch retrieves metadata from a web page and returns it as Webinfo.

It fetches the page with the given context and User-Agent (or a default one when empty), peeks up to 1024 bytes to determine encoding, then parses the head section with goquery.

Extraction precedence is kept explicit: title: title -> twitter:title -> og:title description: meta[name=description] -> twitter:description -> og:description image: twitter:image -> og:image

Returned errors are wrapped with context. Response close errors are joined.

func (*Webinfo) DownloadImage

func (w *Webinfo) DownloadImage(ctx context.Context, destDir string, temporary bool) (outPath string, err error)

DownloadImage downloads w.ImageURL and writes it under destDir.

If temporary is true, or if the URL path has no filename, a temporary file is created. Otherwise the output file name is derived from the URL path.

Extension resolution order is:

  1. URL path extension
  2. response Content-Type
  3. sniffed content type from up to 512 bytes
  4. fallback ".img"

Returned errors are wrapped with context and include cleanup failures.

func (*Webinfo) DownloadThumbnail

func (w *Webinfo) DownloadThumbnail(ctx context.Context, destDir string, width int, temporary bool) (outPath string, err error)

DownloadThumbnail downloads the source image, resizes it to width while keeping aspect ratio, and writes the thumbnail to destDir.

width defaults to 150 when width <= 0.

The source image is downloaded to a temporary file first and removed on return. Output uses a temporary name when temporary is true; otherwise it uses "<base>-thumb<ext>" derived from the original image URL.

Returned errors are wrapped with context and include cleanup failures.

func (*Webinfo) ImageBytes added in v0.2.0

func (w *Webinfo) ImageBytes(ctx context.Context) (data []byte, err error)

ImageBytes downloads w.ImageURL and returns its contents in memory.

Risk: this method reads the entire response body into memory, so very large images can increase memory usage.

Returned errors are wrapped with context and include response close failures.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL