webinfo

package module

v0.2.0 Latest Latest Go to latest Published: May 12, 2026 License: Apache-2.0 Imports: 23 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/goark/webinfo

Links

Open Source Insights

README ¶

webinfo -- Extract metadata from web pages

webinfo extracts common metadata (title, description, canonical, image, etc.) from web pages and provides helpers to download images and generate thumbnails.

Design goals

Keep metadata extraction simple and deterministic.
Use clear precedence rules for HTML/meta parsing.
Provide practical image utilities with minimal API surface.
Keep context-aware network operations as the default style.

Development

Requirements

Go 1.25.10 or later
Task command (local tool for this repository)

Local validation

task test
task govulncheck

Run all maintenance tasks:

task

CI Workflows

ci: lint (golangci-lint with gosec), tests, and govulncheck
CodeQL: scheduled and push/PR static analysis

Usage

Install and import

go get github.com/goark/webinfo@latest

import "github.com/goark/webinfo"

Fetch metadata

ctx := context.Background()
info, err := webinfo.Fetch(ctx, "https://example.com", "")
if err != nil {
  return err
}
fmt.Println(info.Title, info.Description)

Download image and thumbnail

imgPath, err := info.DownloadImage(ctx, "images", true)
if err != nil {
  return err
}

thumbPath, err := info.DownloadThumbnail(ctx, "thumbnails", 150, false)
if err != nil {
  return err
}

imgBytes, err := info.ImageBytes(ctx)
if err != nil {
  return err
}
fmt.Println(len(imgBytes))

Public API

Fetch(ctx, rawURL, userAgent) extracts metadata from a page.
(*Webinfo).ImageBytes(ctx) downloads Webinfo.ImageURL into memory.
(*Webinfo).DownloadImage(ctx, destDir, temporary) downloads Webinfo.ImageURL.
(*Webinfo).DownloadThumbnail(ctx, destDir, width, temporary) creates a resized thumbnail.

Behavior notes

Fetch uses explicit precedence for metadata extraction:
- title: title -> twitter:title -> og:title
- description: meta[name=description] -> twitter:description -> og:description
- image: twitter:image -> og:image
DownloadImage resolves extension in this order:
1. URL path extension
2. response Content-Type
3. sniff first 512 bytes (http.DetectContentType)
4. fallback .img
DownloadThumbnail uses width 150 when width <= 0.
ImageBytes reads the full response body into memory; very large images can increase memory usage.

Error handling

This package wraps errors with github.com/goark/errs and attaches context values such as url, path, and dir.

Modules Requirement Graph

Documentation ¶

Index ¶

Variables
type Webinfo
- func Fetch(ctx context.Context, urlStr, userAgent string) (info *Webinfo, err error)

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	ErrNullPointer = errors.New("null reference instance")
	ErrNoImageURL  = errors.New("no image URL")
	ErrInvalidURL  = errors.New("invalid URL")
)

Functions ¶

This section is empty.

Types ¶

type Webinfo ¶

type Webinfo struct {
	URL         string `json:"url,omitempty"`         // Original page URL
	Location    string `json:"location,omitempty"`    // Location
	Canonical   string `json:"canonical,omitempty"`   // Canonical URL
	Title       string `json:"title,omitempty"`       // Page title
	Description string `json:"description,omitempty"` // Meta description
	ImageURL    string `json:"image_url,omitempty"`   // Representative image URL
	UserAgent   string `json:"user_agent,omitempty"`  // User-Agent used to fetch the page
}

Webinfo stores metadata extracted from a web page and values used for follow-up image download operations.

func Fetch ¶

func Fetch(ctx context.Context, urlStr, userAgent string) (info *Webinfo, err error)

Fetch retrieves metadata from a web page and returns it as Webinfo.

It fetches the page with the given context and User-Agent (or a default one when empty), peeks up to 1024 bytes to determine encoding, then parses the head section with goquery.

Extraction precedence is kept explicit: title: title -> twitter:title -> og:title description: meta[name=description] -> twitter:description -> og:description image: twitter:image -> og:image

Returned errors are wrapped with context. Response close errors are joined.

func (*Webinfo) DownloadImage ¶

func (w *Webinfo) DownloadImage(ctx context.Context, destDir string, temporary bool) (outPath string, err error)

DownloadImage downloads w.ImageURL and writes it under destDir.

If temporary is true, or if the URL path has no filename, a temporary file is created. Otherwise the output file name is derived from the URL path.

Extension resolution order is:

URL path extension
response Content-Type
sniffed content type from up to 512 bytes
fallback ".img"

Returned errors are wrapped with context and include cleanup failures.

func (*Webinfo) DownloadThumbnail ¶

func (w *Webinfo) DownloadThumbnail(ctx context.Context, destDir string, width int, temporary bool) (outPath string, err error)

DownloadThumbnail downloads the source image, resizes it to width while keeping aspect ratio, and writes the thumbnail to destDir.

width defaults to 150 when width <= 0.

The source image is downloaded to a temporary file first and removed on return. Output uses a temporary name when temporary is true; otherwise it uses "<base>-thumb<ext>" derived from the original image URL.

Returned errors are wrapped with context and include cleanup failures.

func (*Webinfo) ImageBytes ¶ added in v0.2.0

func (w *Webinfo) ImageBytes(ctx context.Context) (data []byte, err error)

ImageBytes downloads w.ImageURL and returns its contents in memory.

Risk: this method reads the entire response body into memory, so very large images can increase memory usage.

Returned errors are wrapped with context and include response close failures.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL