Skip to main content
The Sawmills Collector exposes OpenTelemetry internal metrics in Prometheus format. Any Prometheus-compatible scraper (Prometheus, VictoriaMetrics, Grafana Agent, Datadog Agent, etc.) can collect them. This guide lists the endpoint, the metrics worth watching, and sample alert rules. A kube-prometheus-stack example is included at the end.

Scrape Endpoint

Scrape every collector pod at:
  • Port: 19465
  • Path: /metrics
  • Recommended interval: 15s

Collector Configuration

The Prometheus port is set by the Helm chart — no collector-config edits are needed. To override the default, set the value under managedChartsValues.sawmills-collector in your remote-operator values.yaml (see Updating Collector Values):
managedChartsValues:
  sawmills-collector:
    telemetry:
      prometheus:
        port: 19465  # default

Metrics to Monitor

The collector emits per-signal counters with the suffix _log_records_total, _metric_points_total, or _spans_total. Where this guide shows <sig>, substitute the signal you ingest. All counters carry the _total suffix in OpenTelemetry’s Prometheus exposition.
  1. Liveness
    • up{job="<job>"} — scrape target reachable (the job label depends on your scraper config)
    • otelcol_process_uptime_total — monotonic uptime; resets on restart
  2. Resource Usage
    • otelcol_process_cpu_seconds_total — CPU rate per pod
    • otelcol_process_memory_rss — resident memory per pod
    • otelcol_process_runtime_heap_alloc_bytes — Go heap (use to detect leaks)
  3. Ingestion (Receiver Side)
    • otelcol_receiver_accepted_<sig>_total — successful ingest, broken down by receiver
    • otelcol_receiver_refused_<sig>_total — input-side rejections; common causes: malformed data, incompatible protocol versions, rate limiting
  4. Egress (Exporter Side)
    • otelcol_exporter_sent_<sig>_total — successful sends, broken down by exporter
    • otelcol_exporter_send_failed_<sig>_total — downstream send errors; common causes: network issues, authentication failures, backend overload
    • otelcol_exporter_enqueue_failed_<sig>_total — drops at the queue boundary (data lost before it could be sent)
  5. Backpressure
    • otelcol_exporter_queue_size / otelcol_exporter_queue_capacity — pair these for utilization ratio. Sustained high utilization indicates the collector cannot keep up with the destination.

Label Conventions

Sawmills emits processor instances with the format <type>/<uuid>. Real receiver and exporter label values look like:
  • receiver="otlp/collector-backend/otlp/72cfa7f8-00ae-40b4-a4eb-dc6f59932c29"
  • exporter="datadog/ec03b655-369f-4273-b255-8ed7dd33366f"
  • exporter="awss3/sampling-destination-1876f24b-3bdb-4fe5-93e1-fa36cb89e864"
When writing alert rules, use regex matchers (exporter=~"datadog/.*") rather than literal equality. The otelcol_exporter_queue_size and _queue_capacity series also carry a data_type label (logs, metrics, traces) — if you have multiple signals exported, group or filter on it.

A Note on Lazy Emission

OpenTelemetry only exposes a counter after its first non-zero observation. otelcol_exporter_send_failed_<sig>_total and otelcol_exporter_enqueue_failed_<sig>_total will be entirely absent from the scrape until at least one failure has occurred — this is normal, not a misconfiguration. Alerts on these metrics evaluate as “no data” rather than zero, so use absent_over_time or OR vector(0) if you need to distinguish “healthy” from “metric missing because never failed”.

Sample Alert Rules

The following are starting points and should be adjusted based on your collector configuration, data volumes, and tolerance for refusals. Replace <job> with the actual job label your scraper assigns to the collector — open /targets in your Prometheus UI and copy the value shown on the collector rows.
groups:
- name: SawmillsCollector
  rules:
  - alert: SawmillsCollectorDown
    expr: absent(up{job="<job>"}) or up{job="<job>"} == 0
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "Sawmills collector scrape target down or missing"
      description: "Prometheus has been unable to scrape the collector for 2 minutes (or the target has disappeared from service discovery)."

  - alert: SawmillsReceiverRefusingLogs
    expr: sum by (receiver) (rate(otelcol_receiver_refused_log_records_total{job="<job>",receiver=~"otlp/.*"}[5m])) > 100
    for: 15m
    labels:
      severity: warning
    annotations:
      summary: "Sawmills collector refusing logs at receiver {{ $labels.receiver }}"
      description: "The receiver is refusing logs, possibly due to high load or malformed data."

  - alert: SawmillsExporterSendFailingLogs
    expr: sum by (exporter) (rate(otelcol_exporter_send_failed_log_records_total{job="<job>"}[5m])) > 100
    for: 15m
    labels:
      severity: critical
    annotations:
      summary: "Sawmills exporter {{ $labels.exporter }} failing to send logs to destination"
      description: "The exporter is failing to send logs, possibly due to backend connectivity issues."

  - alert: SawmillsExporterEnqueueFailingLogs
    expr: sum by (exporter) (rate(otelcol_exporter_enqueue_failed_log_records_total{job="<job>"}[5m])) > 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Sawmills exporter {{ $labels.exporter }} dropping logs at the export queue"
      description: "Records are being dropped before they can be sent — the exporter queue is full or rejecting writes. This is data loss."

  - alert: SawmillsExporterQueueSaturated
    expr: max by (exporter, data_type) (otelcol_exporter_queue_size{job="<job>"} / clamp_min(otelcol_exporter_queue_capacity{job="<job>"}, 1)) > 0.8
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "Sawmills exporter {{ $labels.exporter }} queue near capacity ({{ $labels.data_type }})"
      description: "Queue utilization above 80% sustained — the destination is slower than ingest. Enqueue failures (data loss) follow if this persists."

Appendix: Kube-Prometheus-Stack Example

If you run kube-prometheus-stack with Prometheus Operator, this ServiceMonitor wires up scraping directly. Adjust the release: label to match your Prometheus’s serviceMonitorSelector.

ServiceMonitor

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: sawmills-collector-monitor
  namespace: sawmills
  labels:
    release: prometheus  # Match your Prometheus serviceMonitorSelector
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: sawmills-collector-chart
  namespaceSelector:
    matchNames:
      - sawmills
  endpoints:
    - port: prometheus
      interval: 15s
Apply:
kubectl apply -f sawmills-servicemonitor.yaml

Verification

Port-forward Prometheus and check /targets for the sawmills-collector-monitor scrape pool:
kubectl port-forward svc/prometheus-kube-prometheus-prometheus 9090:9090 -n <prometheus-namespace>
# http://localhost:9090/targets
For more information about OpenTelemetry collector metrics, visit the official documentation.