Skip to main content
This guide explains how to set up Prometheus and Grafana monitoring for your Sawmills Collector in a Kubernetes environment. While this setup is optional, it provides valuable insights into your collector’s performance and the telemetry data it processes.

Prerequisites

Before you begin, ensure you have:
  • The Sawmills Collector deployed and running in the sawmills namespace
  • A Kubernetes cluster with Prometheus Operator installed
  • Grafana installed in your cluster
For example, in this guide we’ll assume Prometheus and Grafana are installed in the observability namespace, but adjust the commands to match your actual installation namespace.

Setting Up the Monitoring Stack

Step 1: Set Up the ServiceMonitor

The Sawmills Collector service already exposes metrics on port 19465. Create a ServiceMonitor to configure Prometheus to scrape these metrics:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: sawmills-collector-monitor
  namespace: sawmills
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: sawmills-collector-chart
  namespaceSelector:
    matchNames:
      - sawmills
  endpoints:
    - port: prometheus
      interval: 15s
Save this as sawmills-servicemonitor.yaml and apply it:
kubectl apply -f sawmills-servicemonitor.yaml

Step 2: Set Up the Grafana Dashboard

The Sawmills Collector is based on OpenTelemetry. You can use the official OpenTelemetry Collector dashboard from the Grafana dashboard catalog. There are two methods to install this dashboard:

Option 1: Import Directly in Grafana UI

  1. In your Grafana UI, go to the ”+” icon in the sidebar and select “Import”
  2. Enter dashboard ID 15983 in the “Import via grafana.com” field
  3. Click “Load”
  4. Select your Prometheus data source
  5. Click “Import”

Option 2: Deploy via ConfigMap

Create a ConfigMap for the Grafana dashboard. This should be created in the same namespace where Grafana is installed (we’ll use observability as an example):
apiVersion: v1
kind: ConfigMap
metadata:
  name: sawmills-grafana-dashboard
  namespace: observability  # Change this to match your Grafana installation namespace
  labels:
    grafana_dashboard: "1"
data:
  sawmills-otel-collector.json: |
    {
      "annotations": {
        "list": [
          {
            "builtIn": 1,
            "datasource": "-- Grafana --",
            "enable": true,
            "hide": true,
            "iconColor": "rgba(0, 211, 255, 1)",
            "name": "Annotations & Alerts",
            "type": "dashboard"
          }
        ]
      },
      "description": "Visualize OpenTelemetry (OTEL) collector metrics (tested with OTEL contrib v0.120.1)",
      "editable": true,
      "gnetId": 15983,
      "graphTooltip": 0,
      "id": 15983,
      "links": [],
      "panels": [],
      "schemaVersion": 27,
      "style": "dark",
      "tags": [
        "opentelemetry",
        "otel",
        "sawmills"
      ],
      "templating": {
        "list": []
      },
      "time": {
        "from": "now-1h",
        "to": "now"
      },
      "timepicker": {},
      "timezone": "",
      "title": "OpenTelemetry Collector",
      "uid": "otel-collector",
      "version": 1
    }
Note: The above JSON is a placeholder. You should download the complete dashboard JSON from Grafana.com or export it after importing via the UI method. Save this as sawmills-grafana-dashboard.yaml and apply it:
kubectl apply -f sawmills-grafana-dashboard.yaml

Collector Configuration for Metrics

Ensure your Sawmills collector configuration includes proper telemetry settings to expose metrics. Here’s an example:
service:
  telemetry:
    metrics:
      address: 0.0.0.0:19465
      level: detailed

Verifying the Setup

1. Check ServiceMonitor Status

Verify that the ServiceMonitor is properly configured:
kubectl get servicemonitor -n sawmills
## Should show: sawmills-collector-monitor

2. Verify Prometheus Scraping

Check if Prometheus is successfully scraping the metrics. Replace observability with your Prometheus installation namespace:
# Port forward Prometheus (adjust namespace as needed)
kubectl port-forward svc/prometheus-kube-prometheus-prometheus 9090:9090 -n observability

# Visit http://localhost:9090/targets in your browser
# Look for targets matching job="sawmills/sawmills-collector-monitor"

3. Access the Grafana Dashboard

View your metrics in Grafana. Replace observability with your Grafana installation namespace:
# Port forward Grafana (adjust namespace and service name as needed)
kubectl port-forward svc/prometheus-grafana 3000:80 -n observability

# Visit http://localhost:3000 and search for "OpenTelemetry Collector"

Troubleshooting Guide

Dashboard Not Appearing in Grafana

  1. Verify Dashboard ConfigMap Location
    # Check if ConfigMap exists in your Grafana namespace
    kubectl get configmap -n <grafana-namespace> | grep sawmills
    
    Ensure the ConfigMap is in the same namespace as your Grafana installation.
  2. Check Dashboard ConfigMap Labels
    # The label should be exactly:
    grafana_dashboard: "1"
    
    Common issues:
    • Using "true" instead of "1"
    • Missing or incorrect labels
  3. Verify Dashboard JSON Format
    • Ensure the dashboard data key has a .json extension
    • Check that the JSON is properly formatted
    • Verify the dashboard has proper tags
  4. Restart Grafana to Pick Up Changes
    # Delete the Grafana pod to force a restart (adjust namespace and labels as needed)
    kubectl delete pod -n <grafana-namespace> -l app.kubernetes.io/name=grafana
    

Metrics Not Appearing in Dashboard

  1. Verify Collector Service
    # Check if the collector service exposes prometheus port
    kubectl get svc sawmills-collector -n sawmills
    # Should show port 19465 named "prometheus"
    
  2. Check ServiceMonitor Configuration
    # Verify ServiceMonitor exists
    kubectl get servicemonitor -n sawmills
    
    # Check its configuration
    kubectl describe servicemonitor sawmills-collector-monitor -n sawmills
    
    Common issues:
    • Incorrect label selectors
    • Wrong port name (should be “prometheus”)
    • Incorrect namespace selector
  3. Verify Prometheus Target Discovery
    # Port forward Prometheus (adjust namespace as needed)
    kubectl port-forward svc/prometheus-kube-prometheus-prometheus 9090:9090 -n <prometheus-namespace>
    
    # Check /targets page for:
    # - Target status (up/down)
    # - Last scrape time
    # - Error messages
    
  4. Check Metric Availability In Prometheus UI:
    • Try querying basic OpenTelemetry metrics:
      otelcol_process_uptime
      otelcol_receiver_accepted_spans
      otelcol_processor_refused_spans
      otelcol_exporter_queue_size
      
    • Use the Graph view to verify data points are being collected

Common Error Scenarios

  1. “No Data” in Grafana Panels
    • Verify Prometheus data source is configured in Grafana
    • Check metric names in panel queries match actual metrics
    • Ensure time range is appropriate (default: last 1 hour)
  2. ServiceMonitor Not Working
    • Verify Prometheus Operator is watching the sawmills namespace
    • Check if ServiceMonitor labels match Prometheus configuration
    • Ensure endpoints section matches service port configuration
  3. Dashboard Import Issues
    • Clear browser cache and reload Grafana
    • Check Grafana logs for JSON parsing errors
    • Verify dashboard version compatibility

Best Practices

Effective Monitoring Strategy

  1. Resource Monitoring
    • Monitor CPU and memory usage to ensure the collector operates efficiently
    • Watch for unusual resource consumption patterns that could indicate issues
  2. Data Flow Monitoring
    • Track data volume with receiver metrics (adjust based on your receiver type, e.g., OTLP for receiving from instrumented services, Kafka for consuming from message brokers)
    • Watch export performance with exporter metrics (adjust based on your destination, e.g., OTLP for forwarding to backends like Jaeger or Tempo, S3 for archiving)
  3. Refusal and Error Detection
    • Receiver Refusals: Monitor refused data at the receiver
      • Create alerts when refusals occur
      • Common causes: malformed data, incompatible protocol versions, rate limiting
    • Exporter Refusals: Track refusals at the exporter
      • Critical metric that indicates data isn’t reaching its final destination
      • Common causes: network issues, authentication failures, backend overload
  4. Queue Monitoring
    • Track queue sizes for each exporter (queue configuration will depend on your deployment environment and destination systems)
    • Watch for sustained high queue sizes, which indicate backpressure between the collector and downstream systems

Sample Prometheus Alert Rules

The following are sample alert rules that should be adjusted based on your specific collector configuration, data volumes, and tolerance for refusals:
groups:
- name: SawmillsCollector
  rules:
  - alert: SawmillsReceiverRefusingLogs
    expr: sum(rate(otelcol_receiver_refused_log_records{job="sawmills/sawmills-collector-monitor",receiver="otlp"}[5m])) > 100
    for: 15m
    labels:
      severity: warning
    annotations:
      summary: "Sawmills collector refusing logs at receiver"
      description: "The receiver is refusing logs, possibly due to high load or malformed data."

  - alert: SawmillsExporterRefusingLogs
    expr: sum(rate(otelcol_exporter_refused_log_records{job="sawmills/sawmills-collector-monitor",exporter="otlp"}[5m])) > 100
    for: 15m
    labels:
      severity: critical
    annotations:
      summary: "Sawmills collector refusing logs at exporter"
      description: "The exporter is refusing logs, possibly due to backend connectivity issues."
Note: These alert rules are starting points and should be tailored to your environment. The thresholds (>100) and evaluation periods (15m) should be adjusted based on your normal data volumes and acceptable error rates for your specific source and destination systems. For assistance with configuring alerts specific to your Sawmills deployment, please contact the Sawmills team. For more information about OpenTelemetry collector metrics, visit the official documentation.