Skip to content

Pull sources

A pull source is a job where HeliosLogs reads logs itself, on a schedule, instead of waiting for a shipper to push them — today, by tailing files on the local filesystem. Useful when HeliosLogs runs alongside an application whose logs are on disk. (S3 and watch/event modes are reserved for the future.)

Manage sources — and the HTTP endpoint on/off switches — under Admin → Data Ingestion → Sources.

Admin → Data Ingestion → Sources: HTTP endpoint toggles on top, the source list below

HTTP ingestion endpoints

The top of the Sources tab has the master on/off switches for push-based ingestion (separate from the pull sources below). Turning a class off makes HeliosLogs reject those requests with 403; pull sources and the syslog listener are unaffected.

  • HeliosLogs Ingest API/api/ingest and /api/ingest/raw.
  • Drop-in compatibility APIs — Elasticsearch _bulk, OTLP, Loki, and HEC.

(Require-auth and ingest tokens are on the Tokens tab.)

How pull sources work

A background supervisor polls each enabled source on its interval, reads any new bytes, parses them, and writes the events to the source's env/index. Progress is checkpointed per file, so a restart resumes where it left off rather than re-ingesting.

In a cluster, each run takes a short per-source lease so only one node polls a given source at a time; a lease left stuck (a crash mid-run) is reclaimed after 5 minutes.

Configuring a source

FieldNotes
NameA label.
EnvironmentThe source ingests into your active environment.
IndexThe storage partition events land in.
PathA glob the source matches, e.g. /var/log/app/**/*.log (** recurses).
ExcludeGlob patterns to skip (matched against the absolute path).
Formatauto (sniff) or an explicit format — ndjson, json, text, syslog, and the other parser formats.
Compressionauto (sniff/extension), none, gzip, or zstd.
MultilineA regex that opens a new event (folds stack traces), with an optional max-lines cap.
Grok patternA preset, %{...} grok, or named-capture regex (for format=grok).
Poll intervalSeconds between polls. Minimum 5, default 10.
Source tagA default source value for events that don't carry one.
EnabledWhether the supervisor runs it.

Checkpoints (resume vs. re-ingest)

Each source tracks, per file, how many bytes it has consumed and the file's last modification time. This lets HeliosLogs resume cleanly after a restart, and detect rotation (a changed mtime/size re-reads the file from the start). Use the reset action to clear a source's checkpoint and re-ingest from the beginning.

Operating sources

The Sources list shows live status — last run, last status/error, total ingested, and the file currently being read. Actions:

ActionEndpointWhat it does
Run nowPOST /api/sources/:id/runTrigger an immediate poll.
ResetPOST /api/sources/:id/resetClear the checkpoint; re-ingest from the start.
BrowseGET /api/sources/browse?path=…Server-side directory listing (admin) to help build the path glob.

Full CRUD is available at GET/POST /api/sources and GET/PATCH/DELETE /api/sources/:id.

Local filesystem only

A pull source reads the filesystem of the node running it. Make sure the log paths exist on that host (or are mounted into the container) and that the HeliosLogs process user can read them.