fas

package module
v0.0.0-...-2b29985 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 20, 2025 License: Apache-2.0 Imports: 16 Imported by: 0

README

Fly Autoscaler

The project is a metrics-based autoscaler for Fly.io. The autoscaler supports polling for metrics from a Prometheus instance and then computing the number of machines based on those metrics.

How it works

The Fly Autoscaler works by performing a reconciliation loop on a regular interval. By default, it runs every 15 seconds.

  1. Collect metrics from external systems (e.g. Prometheus)

  2. Compute the target number of machines based on a user-provided expression.

  3. Fetch a list of all Fly Machines for your application.

  4. If the target number of machines is less than the number of started machines, use the Fly Machines API to start new machines.

                                     ┌────────────────────┐
fly-autoscaler ──────────┐           │                    │
│ ┌────────────────────┐ │    ┌──────│     Prometheus     │
│ │                    │ │    │      │                    │
│ │  Metric Collector  │◀┼────┘      └────────────────────┘
│ │                    │ │                                 
│ └──────┬─────────────┘ │                                 
│        │     △         │                                 
│        ▽     │         │                                 
│ ┌────────────┴───────┐ │                                 
│ │                    │ │                                 
│ │     Reconciler     │◀┼────┐                            
│ │                    │ │    │      ┌────────────────────┐
│ └────────────────────┘ │    │      │                    │
└────────────────────────┘    └─────▶│  Fly Machines API  │
                                     │                    │
                                     └────────────────────┘
Expressions

The autoscaler uses the Expr language to define the target number of machines. See the Expr Language Definition for syntax and a full list of built-in functions. The expression can utilize any named metrics that you collect and it should always return a number.

For example, if you poll for queue depth and each machine can handle 10 queue items at a time, you can compute the number of machines as:

ceil(queue_depth / 10)

The autoscaler can only start machines so it will never exceed the number of machines available for a Fly app.

Usage

Create an app for your autoscaler

First, create an app for your autoscaler:

$ fly apps create my-autoscaler

Then create a fly.toml for the deployment. Update the TARGET_APP_NAME with the name of the app that you want to scale and update MY_ORG to the organization where your Prometheus metrics live.

app = "my-autoscaler"

[build]
image = "flyio/fly-autoscaler:0.2"

[env]
FAS_APP_NAME = "TARGET_APP_NAME"
FAS_STARTED_MACHINE_COUNT = "ceil(queue_depth / 10)"
FAS_PROMETHEUS_ADDRESS = "https://api.fly.io/prometheus/MY_ORG"
FAS_PROMETHEUS_METRIC_NAME = "queue_depth"
FAS_PROMETHEUS_QUERY = "sum(queue_depth)"

[metrics]
port = 9090
path = "/metrics"
Create a deploy token

Next, set up a new deploy token for the application you want to scale:

$ fly tokens create deploy -a TARGET_APP_NAME

Set the token as a secret on your application:

$ fly secrets set FAS_API_TOKEN="FlyV1 ..."
Create a read-only token

Create a token for reading your Prometheus data:

$ fly tokens create readonly

Set the token as a secret on your application:

$ fly secrets set FAS_PROMETHEUS_TOKEN="FlyV1 ..."
Deploy the server

Finally, deploy your autoscaler application:

$ fly deploy

This should create a new machine and start it with the fly-autoscaler server running.

Testing your metrics & expression

You can perform a one-time run of metrics collection & expression evaluation for testing or debugging purposes by using the eval command. This command does not perform any scaling of Fly Machines. It will only print the evaluated expression based on current metrics numbers.

$ fly-autoscaler eval

You can change the evaluated expression by setting an environment variable:

$ FAS_STARTED_MACHINE_COUNT=queue_depth fly-autoscaler eval

Configuration

You can also configure fly-autoscaler with a YAML config file if you don't want to use environment variables or if you want to configure more than one metric collector.

Please see the reference fly-autoscaler.yml for more details.

Documentation

Index

Constants

View Source
const (
	DefaultConcurrency            = 1
	DefaultReconcileTimeout       = 30 * time.Second
	DefaultReconcileInterval      = 15 * time.Second
	DefaultAppListRefreshInterval = 60 * time.Second
)

Variables

View Source
var (
	ErrExprRequired = errors.New("expression required")
	ErrExprNaN      = errors.New("expression returned NaN")
	ErrExprInf      = errors.New("expression returned Inf")
)

Expression errors.

Functions

func ExpandMetricQuery

func ExpandMetricQuery(ctx context.Context, query, app string) string

ExpandMetricQuery replaces variables in query with their values.

func FormatWildcardAsRegexp

func FormatWildcardAsRegexp(s string) string

FormatWildcardAsRegexp returns a regexp for a given wildcard expression.

Types

type FlapsClient

type FlapsClient interface {
	List(ctx context.Context, state string) ([]*fly.Machine, error)
	Launch(ctx context.Context, input fly.LaunchMachineInput) (*fly.Machine, error)
	Destroy(ctx context.Context, input fly.RemoveMachineInput, nonce string) error
	Start(ctx context.Context, id, nonce string) (*fly.MachineStartResponse, error)
	Stop(ctx context.Context, in fly.StopMachineInput, nonce string) error
}

type FlyClient

type FlyClient interface {
	GetOrganizationBySlug(ctx context.Context, slug string) (*fly.Organization, error)
	GetAppsForOrganization(ctx context.Context, orgID string) ([]fly.App, error)
}

type MetricCollector

type MetricCollector interface {
	Name() string
	CollectMetric(ctx context.Context, app string) (float64, error)
}

MetricCollector represents a client for collecting metrics from an external source.

type NewFlapsClientFunc

type NewFlapsClientFunc func(ctx context.Context, appName string) (FlapsClient, error)

type Reconciler

type Reconciler struct {

	// Client to connect to Machines API to scale app. Required.
	Client FlapsClient

	// The name of the app currently being reconciled.
	AppName string

	// List of regions that machines can be created in.
	// The reconciler uses a round-robin approach to choosing next region.
	Regions []string

	// Expression used for calculating the number of created machines.
	// If current number is less than min, more machines will be created.
	// If current number is more than max, machines will be destroyed.
	MinCreatedMachineN string
	MaxCreatedMachineN string

	// Expression used for calculating the number of currently started machines.
	// If current number is less than min, more machines will be started.
	// If current number is more than max, machines will be stopped.
	MinStartedMachineN string
	MaxStartedMachineN string

	// Initial machine state (started or stopped)
	InitialMachineState string

	// List of collectors to fetch metric values from.
	Collectors []MetricCollector

	// Must also be registered in RegisterPromMetrics() for visibility.
	Stats *ReconcilerStats
	// contains filtered or unexported fields
}

Reconciler represents the central part of the autoscaler that stores metrics, computes the number of necessary machines, and performs scaling.

func NewReconciler

func NewReconciler() *Reconciler

func (*Reconciler) CalcMaxCreatedMachineN

func (r *Reconciler) CalcMaxCreatedMachineN() (int, bool, error)

CalcMaxCreatedMachineN returns the maximum number of created machines.

func (*Reconciler) CalcMaxStartedMachineN

func (r *Reconciler) CalcMaxStartedMachineN() (int, bool, error)

CalcMaxStartedMachineN returns the maximum number of started machines.

func (*Reconciler) CalcMinCreatedMachineN

func (r *Reconciler) CalcMinCreatedMachineN() (int, bool, error)

CalcMinCreatedMachineN returns the minimum number of created machines.

func (*Reconciler) CalcMinStartedMachineN

func (r *Reconciler) CalcMinStartedMachineN() (int, bool, error)

CalcMinStartedMachineN returns the minimum number of started machines.

func (*Reconciler) CollectMetrics

func (r *Reconciler) CollectMetrics(ctx context.Context) error

CollectMetrics fetches metrics from all collectors.

func (*Reconciler) NextRegion

func (r *Reconciler) NextRegion() string

NextRegion returns the next region to launch a machine in. If Regions is empty, returns a blank string.

func (*Reconciler) Reconcile

func (r *Reconciler) Reconcile(ctx context.Context) error

Reconcile scales the number of machines up, if needed. Machines should shut themselves down to scale down. Returns the number of started machines, if any.

func (*Reconciler) SetValue

func (r *Reconciler) SetValue(name string, value float64)

SetValue sets the value of a named metric.

func (*Reconciler) Value

func (r *Reconciler) Value(name string) (float64, bool)

Value returns the value of a named metric and whether the metric has been set.

type ReconcilerPool

type ReconcilerPool struct {

	// Time allowed to perform reconciliation for a single app.
	ReconcileTimeout time.Duration

	// Frequency to run the reconciliation loop for each app.
	ReconcileInterval time.Duration

	// Frequency to update the list of matching apps when using wildcards.
	AppListRefreshInterval time.Duration

	// Name of application to scale. Supports wildcards for multiple apps.
	// All applications must be in the same org.
	AppName string

	// Organization slug. Required if app name is a wildcard.
	OrganizationSlug string

	// NewFlapsClient is a constructor for building a FLAPS client for a given app.
	NewFlapsClient NewFlapsClientFunc

	// NewReconciler is a constructor for building reconcilers.
	// Called one or more times on Open().
	NewReconciler func() *Reconciler

	// Shared stats for all reconcilers.
	Stats ReconcilerStats
	// contains filtered or unexported fields
}

ReconcilerPool represents a set of reconcilers that act as a worker pool.

This is used to distribute scaling across multiple applications while also limiting the maximum concurrency allowed by the scaler.

func NewReconcilerPool

func NewReconcilerPool(flyClient FlyClient, concurrency int) *ReconcilerPool

NewReconcilerPool returns a new instance of ReconcilerPool.

func (*ReconcilerPool) Close

func (p *ReconcilerPool) Close() error

Close stops all processing of the pool and underlying reconcilers. Only returns once all reconcilers have finished processing.

func (*ReconcilerPool) Open

func (p *ReconcilerPool) Open() error

func (*ReconcilerPool) RegisterPromMetrics

func (p *ReconcilerPool) RegisterPromMetrics(reg prometheus.Registerer)

type ReconcilerStats

type ReconcilerStats struct {
	// Outcomes, incremented for each reconciliation.
	BulkCreate  atomic.Int64
	BulkDestroy atomic.Int64
	BulkStart   atomic.Int64
	BulkStop    atomic.Int64
	NoScale     atomic.Int64

	// Individual machine stats.
	MachineCreated       atomic.Int64
	MachineCreateFailed  atomic.Int64
	MachineDestroyed     atomic.Int64
	MachineDestroyFailed atomic.Int64
	MachineStarted       atomic.Int64
	MachineStartFailed   atomic.Int64
	MachineStopped       atomic.Int64
	MachineStopFailed    atomic.Int64
}

Directories

Path Synopsis
cmd
fly-autoscaler command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL