Skip to main content

Overview

This guide explains how to customize your Sawmills collector deployment in Kubernetes. Follow the steps below based on whether you’re installing a new collector or updating an existing one.

NEW Sawmills Collector

  1. Log in to the Sawmills dashboard.
  2. Navigate to the Collectors section (located at the bottom tab of the left navigation menu).
  3. Click the New Collector button and follow the instructions to install the agent with your organization API key.

EXISTING Sawmills Collector

  1. Log in to the Sawmills dashboard.
  2. Navigate to Collector Management.
  3. Click on an existing collector to access its management page.

Customizing the Sawmills Collector Installation

Install or Upgrade Collector Using Helm Values

To install a new collector or update an existing one in the sawmills namespace, use the following Helm command:
helm upgrade --install sawmills-remote-operator \
  oci://public.ecr.aws/s7a5m1b4/sawmills-remote-operator-chart \
  --version 0.174.0 \
  --namespace sawmills \
  --set apiKeyExistingSecret=sawmills-secret \
  --set operatorAddress=https://controller.ue1.prod.plat.sm-svc.com \
  --set collectorName="my-collector"
Note: For first-time installations, you must create the namespace and secret as described in Create a New Collector.

Updating Collector Values

The remote operator deploys the collector to your Kubernetes cluster. To modify collector values, update the managedChartsValues.sawmills-collector section in your values.yaml file and then upgrade the remote operator. The legacy key managedCharts continues to work as an alias, but managedChartsValues is the preferred field going forward. After modifying the remote operator values, the collector should automatically redeploy. Example: To add a nodeSelector to the collector, update the managedChartsValues.sawmills-collector section:
managedChartsValues:
  sawmills-collector:
    nodeSelector:
      region: us-east-1

Overriding the collector chart (private registries)

If you need the operator to pull the collector chart from a private registry or pin a different chart version, set managedChartsOverrides:
managedChartsOverrides:
  sawmills-collector:
    chartName: oci://registry.example.com/sawmills-collector
    version: 1.2.3
Use managedChartsOverrides only for chart reference/version. Continue using managedChartsValues for regular values (replicas, image tags, tolerations, etc.). After making changes, run the following command to update the remote operator:
helm upgrade --install sawmills-remote-operator \
  oci://public.ecr.aws/s7a5m1b4/sawmills-remote-operator-chart \
  --version 0.174.0 \
  --namespace sawmills \
  --set apiKeyExistingSecret=sawmills-secret \
  --set operatorAddress=https://controller.ue1.prod.plat.sm-svc.com \
  --set collectorName="sawmills-collector-v1" \
  -f values.yaml
Important: After adding new nodeSelector, podAntiAffinity, or similar configuration options, you must redeploy the pipeline for the changes to take effect.

Collector Values File Options

The following configuration options are available in your values file under the managedChartsValues.sawmills-collector section (or the legacy managedCharts.sawmills-collector alias):
managedChartsValues:
  sawmills-collector:
    # Number of replicas for the Sawmills collector
    replicaCount: 3

    apiSecret:
      name: sawmills-secret
      key: api-key

    prometheusremotewrite:
      endpoint: https://ingress.sawmills.ai

    # Additional Environment Variables
    # Example:
    # - name: DD_API_KEY
    #   valueFrom:
    #     secretKeyRef:
    #       name: datadog
    #       key: api-key
    extraEnv: []

    # The node selector for the Sawmills collector
    nodeSelector: {}

    # Pod annotations for the Sawmills collector pods (e.g. for Prometheus scraping)
    # podAnnotations:
    #   prometheus.io/scrape: "true"
    #   prometheus.io/path: "/metrics"
    #   prometheus.io/port: "metrics"

    # Tolerations for the Sawmills collector pods
    # Example:
    # - key: "key1"
    #   operator: "Equal"
    #   value: "value1"
    #   effect: "NoSchedule"
    tolerations: []

    affinity:
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                  - key: app.kubernetes.io/name
                    operator: In
                    values:
                      - sawmills-collector-chart
              topologyKey: kubernetes.io/hostname

    # The resources for the Sawmills collector
    resources:
      requests:
        memory: 512Mi
        cpu: 250m
      limits:
        memory: 1Gi
        cpu: 500m

    # Telemetry Configuration
    telemetry:
      resources:
        requests:
          memory: 512Mi
          cpu: 250m
        limits:
        memory: 1Gi
        cpu: 500m
      # Prometheus port for the telemetry collector, can be used by customers to scrape metrics
      prometheus:
        port: 19465

    # HPA configuration for the Sawmills collector
    autoscaling:
      enabled: false
      minReplicas: 3
      maxReplicas: 50
      targetCPUUtilizationPercentage: 80
      targetMemoryUtilizationPercentage: 80
      # Add behavior configuration for faster scaling
      behavior:
        scaleUp:
          stabilizationWindowSeconds: 120
          policies:
            - type: Percent
              value: 50
              periodSeconds: 120
        scaleDown:
          stabilizationWindowSeconds: 300
          policies:
            - type: Percent
              value: 25
              periodSeconds: 120
    # KEDA-based autoscaling (separate from standard HPA)
    keda:
      enabled: false
      minReplicas: 1
      maxReplicas: 10
      # How often KEDA checks the metrics for scaling decisions (in seconds)
      pollingInterval: 30
      # How long to wait after the last scaling operation before scaling down (in seconds)
      cooldownPeriod: 300
      # Scaling configuration for different metric sources
      scaling:
        # Prometheus-based scaling configuration
        prometheus:
          enabled: false
          metricType: Value
          metadata:
            # Prometheus server address
            serverAddress: http://prometheus:9090
            # PromQL query to fetch metrics
            query: "histogram_quantile(0.95, sum(rate(http_server_duration_bucket[1m])) by (le))"
            # Target value to maintain (will scale up if metric exceeds this value)
            threshold: "2000"
            # Minimum value before scaling starts (will not scale if metric is below this value)
            activationThreshold: "1000"
        # External scaler configuration for OpenTelemetry metrics
        # This section configures the external scaler that works with the kedaScaler component
        # defined below. When enabled, it uses the OTLP metrics from the kedaScaler to make
        # scaling decisions based on the configured query and target value.
        external:
          enabled: false
          metricType: Value
          metadata:
            # Address of the KEDA OTLP scaler service
            scalerAddress: "sawmills-collector-keda-otel-scaler.sawmills.svc.cluster.local:4418"
            # PromQL query to fetch metrics from the OTLP scaler
            query: "histogram_quantile(0.95, sum(rate(http_server_duration_bucket[2m])) by (le))"
            # Target value to maintain (will scale up if metric exceeds this value)
            targetValue: "2000"
        # CPU-based scaling configuration
        cpu:
          enabled: true
          # Target CPU utilization percentage to maintain
          targetUtilization: 80
        # Memory-based scaling configuration
        memory:
          enabled: true
        # Target memory utilization percentage to maintain
        targetUtilization: 80

    # Service Configuration
    service:
      type: ClusterIP
      headless:
        enabled: true
      # internalTrafficPolicy controls how traffic from within the cluster is routed
      # Available values: "Cluster" (default) or "Local"
      # When set to "Local", traffic from within the cluster is routed to endpoints on the same node
      internalTrafficPolicy: ""
      # Service annotations
      # Example:
      # annotations:
      #   service.kubernetes.io/topology-mode: "Auto"  # Enables automatic topology-aware routing
      #   custom.annotation: "value"
      annotations: {}

    # Additional containers to be added to the pod
    # This allows you to add sidecar containers to the collector pod.
    # Each container follows the Kubernetes container spec format.
    # For detailed examples, see the examples directory.
    #
    # Example:
    # additionalContainers:
    #   sidecar:
    #     image: busybox:1.35
    #     command: ['sh', '-c', 'echo Hello && sleep 3600']
    #     resources:
    #       requests:
    #         memory: "64Mi"
    #         cpu: "100m"
    #       limits:
    #         memory: "128Mi"
    #         cpu: "200m"
    additionalContainers: {}

    # Additional volumes to be mounted by additional containers
    # This allows you to add volumes that can be mounted by sidecar containers.
    # For detailed examples, see the examples directory.
    #
    # Example:
    # additionalVolumes:
    #   - name: config-volume
    #     configMap:
    #       name: sidecar-config
    additionalVolumes: []

    # Setup a service account for the collector, should be used to grant permissions
    # to the collector to access resources (e.g. s3 buckets)
    serviceAccount:
      # -- Whether to create a service account for the Sawmills Collector deployment.
      create: false
      # -- Additional labels to add to the created service account.
      additionalLabels: {}
      # -- Annotations to add to the created service account.
      annotations: {}
      # -- The name of the service account,
      # default is "default" when serviceAccount.create is false.
      # default is "sa-collector-service" when serviceAccount.create is true.
      name: null

    # ServiceMonitor configuration for Prometheus Operator
    serviceMonitor:
      # Enable or disable ServiceMonitor resource creation
      enabled: false
      # Additional labels for the ServiceMonitor
      labels:
        release: prometheus
      # Metrics endpoints configuration
      metricsEndpoints:
        - port: prometheus
          interval: 15s

    # Ingress Configuration
    ingress:
      # Type of ingress controller to use
      # Available options - nginx or haproxy
      type: nginx

      # Nginx-specific ingress configuration
      nginx:
        # Enable/disable nginx ingress
        enabled: false

        # The ingress class name to use
        # This should match your nginx ingress controller's class name
        className: nginx

        # The port that the ingress will forward traffic to
        # This should match the port your collector is listening on
        port: 4318

        # Annotations for the ingress resource
        # These are passed directly to the Kubernetes ingress resource
        annotations:
          # Specify the ingress class
          kubernetes.io/ingress.class: nginx
          # Disable SSL redirect if you're not using HTTPS
          nginx.ingress.kubernetes.io/ssl-redirect: "false"

        # Host configuration for the ingress
        hosts:
          - host: sawmills-collector.local  # The hostname to use
            paths:
              - path: /                    # The URL path to match
                pathType: Prefix
        tls: []

      haproxy:
        # Enable/disable haproxy ingress
        enabled: false

        # The ingress class name to use
        # This should match your haproxy ingress controller's class name
        className: haproxy

        # The port that the ingress will forward traffic to
        # This should match the port your collector is listening on
        port: 4318

        # Annotations for the ingress resource
        # These are passed directly to the Kubernetes ingress resource
        annotations:
          # Specify the ingress class
          kubernetes.io/ingress.class: haproxy
          # Disable SSL redirect if you're not using HTTPS
          haproxy.ingress.kubernetes.io/ssl-redirect: "false"

        # Host configuration for the ingress
        hosts:
          - host: sawmills-collector.local  # The hostname to use
            paths:
              - path: /                    # The URL path to match
                pathType: Prefix
        # TLS configuration for secure connections
        # Uncomment and configure if you want to enable TLS
        tls: []
        # - secretName: collector-tls
        #   hosts:
        #     - sawmills-collector.local

    # KEDA external scaler configuration for OpenTelemetry integration
    # This component acts as an external scaler for KEDA (Kubernetes Event-driven Autoscaling)
    # that integrates with OpenTelemetry (OTel) collector. It enables dynamic scaling of your
    # Sawmills collector based on OTLP metrics, providing more precise and efficient
    # autoscaling capabilities compared to standard Kubernetes HPA.
    #
    # Note: To use this scaler, you must also enable the external scaling in the KEDA section above
    # by setting `keda.scaling.external.enabled: true` and configuring the appropriate metadata.
    kedaScaler:
      # Enable or disable the KEDA scaler component
      enabled: false
      # Container image pull policy (Always, IfNotPresent, Never)
      imagePullPolicy: Always
      podSecurityContext: {}
      probes:
        # Liveness probe settings to determine if the pod is healthy
        liveness:
          initialDelaySeconds: 5
          periodSeconds: 5
          failureThreshold: 5
        # Readiness probe settings to determine if the pod is ready to receive traffic
        readiness:
          initialDelaySeconds: 5
          periodSeconds: 5
      # Service configuration for the KEDA scaler
      service:
        # Type of Kubernetes service (ClusterIP, NodePort, LoadBalancer)
        type: ClusterIP
        # Port for receiving OTLP data
        otlpReceiverPort: 4518
        # Port for KEDA external scaler communication
        kedaExternalScalerPort: 4418
        # Port for health check endpoints
        healthcheckPort: 13134
        # Port for monitoring metrics
        monitoringPort: 19465
        otelPrometheusPort: 19876
      resources:
        limits:
          cpu: 500m
          memory: 256Mi
        requests:
          cpu: 500m
          memory: 128Mi
      # Sawmills telemetry configuration for sending metrics to KEDA scaler
      telemetryConfig:
        receivers:
          prometheus/keda_scaler:
            config:
              scrape_configs:
                - job_name: keda_scaler
                  scrape_interval: 15s
                  static_configs:
                    - targets: ["${env:MY_POD_IP}:${env:KEDA_SCALER_OTEL_PROMETHEUS_PORT}"]
        processors:
          filter/keda:
            error_mode: ignore
            metrics:
              metric:
                - name != "http.server.duration"
                # For create a KEDA scaler query on haproxy metrics and enable prometheus receiver
                # - name != "haproxy_server_http_responses_total" and name != "http.server.duration"
        exporters:
          otlp/keda:
            # Address of the KEDA scaler service
            endpoint: sawmills-collector-keda-otel-scaler.sawmills.svc.cluster.local:${env:KEDA_SCALER_OTLP_RECEIVER_PORT}
            compression: "none"
            tls:
              insecure: true
        service:
          pipelines:
            metrics/telemetry_keda_scaler:
              receivers: [prometheus/keda_scaler]
              processors:
                - memory_limiter
                - batch
              exporters: [routing, forward]
            metrics/keda:
              exporters:
                - otlp/keda
              processors:
                - filter/keda
              receivers:
                - forward
                # For create a KEDA scaler query enable prometheus receiver and change filter/keda filter
                # - prometheus
        config:
          # OTLP receiver configuration for collecting metrics
          receivers:
            otlp:
              protocols:
                grpc:
                  endpoint: ${env:MY_POD_IP}:${env:OTLP_RECEIVER_PORT}
          exporters:
            kedascaler:
              engine_settings:
                timeout: 30s
                max_samples: 500000
                lookback_delta: 5m
                retention_duration: 10m
              monitoring_http:
                endpoint: ${env:MY_POD_IP}:${env:MONITORING_PORT}
              scaler_grpc:
                endpoint: ${env:MY_POD_IP}:${env:KEDA_EXTERNAL_SCALER_PORT}
                transport: tcp
          # Processors for handling metrics data
          processors:
            memory_limiter:
              check_interval: 1s
              limit_mib: 0
              limit_percentage: 95
              spike_limit_mib: 0
              spike_limit_percentage: 10
            batch:
              metadata_cardinality_limit: 0
              metadata_keys: []
              send_batch_max_size: 2048
              send_batch_size: 1024
              timeout: 1s
          # Extensions for additional functionality
          extensions:
            cgroupruntime:
              gomaxprocs:
                enabled: true
              gomemlimit:
                enabled: true
                ratio: 0.95
            health_check:
              endpoint: ${env:MY_POD_IP}:${env:HEALTHCHECK_PORT}
          service:
            extensions:
              - health_check
              - cgroupruntime
            pipelines:
              metrics:
                receivers:
                  - otlp
                processors:
                  - memory_limiter
                  - batch
                exporters:
                  - kedascaler
            # Telemetry configuration for logging and metrics
            telemetry:
              resource:
                pod: ${env:MY_POD_NAME}
              logs:
                encoding: json
                error_output_paths:
                  - stdout
                level: info
                output_paths:
                  - stdout
              metrics:
                address: ${env:MY_POD_IP}:${env:OTEL_PROMETHEUS_PORT}