Monitor Sawmills with Prometheus and Grafana

This guide explains how to set up Prometheus and Grafana monitoring for your Sawmills Collector in a Kubernetes environment. While this setup is optional, it provides valuable insights into your collector’s performance and the telemetry data it processes.

Prerequisites

Before you begin, ensure you have:

The Sawmills Collector deployed and running in the sawmills namespace
A Kubernetes cluster with Prometheus Operator installed
Grafana installed in your cluster

For example, in this guide we’ll assume Prometheus and Grafana are installed in the observability namespace, but adjust the commands to match your actual installation namespace.

Setting Up the Monitoring Stack

Step 1: Set Up the ServiceMonitor

The Sawmills Collector service already exposes metrics on port 19465. Create a ServiceMonitor to configure Prometheus to scrape these metrics:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: sawmills-collector-monitor
  namespace: sawmills
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: sawmills-collector-chart
  namespaceSelector:
    matchNames:
      - sawmills
  endpoints:
    - port: prometheus
      interval: 15s

Save this as sawmills-servicemonitor.yaml and apply it:

kubectl apply -f sawmills-servicemonitor.yaml

Step 2: Set Up the Grafana Dashboard

The Sawmills Collector is based on OpenTelemetry. You can use the official OpenTelemetry Collector dashboard from the Grafana dashboard catalog. There are two methods to install this dashboard:

Option 1: Import Directly in Grafana UI

In your Grafana UI, go to the ”+” icon in the sidebar and select “Import”
Enter dashboard ID 15983 in the “Import via grafana.com” field
Click “Load”
Select your Prometheus data source
Click “Import”

Option 2: Deploy via ConfigMap

Create a ConfigMap for the Grafana dashboard. This should be created in the same namespace where Grafana is installed (we’ll use observability as an example):

apiVersion: v1
kind: ConfigMap
metadata:
  name: sawmills-grafana-dashboard
  namespace: observability  # Change this to match your Grafana installation namespace
  labels:
    grafana_dashboard: "1"
data:
  sawmills-otel-collector.json: |
    {
      "annotations": {
        "list": [
          {
            "builtIn": 1,
            "datasource": "-- Grafana --",
            "enable": true,
            "hide": true,
            "iconColor": "rgba(0, 211, 255, 1)",
            "name": "Annotations & Alerts",
            "type": "dashboard"
          }
        ]
      },
      "description": "Visualize OpenTelemetry (OTEL) collector metrics (tested with OTEL contrib v0.120.1)",
      "editable": true,
      "gnetId": 15983,
      "graphTooltip": 0,
      "id": 15983,
      "links": [],
      "panels": [],
      "schemaVersion": 27,
      "style": "dark",
      "tags": [
        "opentelemetry",
        "otel",
        "sawmills"
      ],
      "templating": {
        "list": []
      },
      "time": {
        "from": "now-1h",
        "to": "now"
      },
      "timepicker": {},
      "timezone": "",
      "title": "OpenTelemetry Collector",
      "uid": "otel-collector",
      "version": 1
    }

Note: The above JSON is a placeholder. You should download the complete dashboard JSON from Grafana.com or export it after importing via the UI method. Save this as sawmills-grafana-dashboard.yaml and apply it:

kubectl apply -f sawmills-grafana-dashboard.yaml

Collector Configuration for Metrics

Ensure your Sawmills collector configuration includes proper telemetry settings to expose metrics. Here’s an example:

service:
  telemetry:
    metrics:
      address: 0.0.0.0:19465
      level: detailed

Verifying the Setup

1. Check ServiceMonitor Status

Verify that the ServiceMonitor is properly configured:

kubectl get servicemonitor -n sawmills
## Should show: sawmills-collector-monitor

2. Verify Prometheus Scraping

Check if Prometheus is successfully scraping the metrics. Replace observability with your Prometheus installation namespace:

# Port forward Prometheus (adjust namespace as needed)
kubectl port-forward svc/prometheus-kube-prometheus-prometheus 9090:9090 -n observability

# Visit http://localhost:9090/targets in your browser
# Look for targets matching job="sawmills/sawmills-collector-monitor"

3. Access the Grafana Dashboard

View your metrics in Grafana. Replace observability with your Grafana installation namespace:

# Port forward Grafana (adjust namespace and service name as needed)
kubectl port-forward svc/prometheus-grafana 3000:80 -n observability

# Visit http://localhost:3000 and search for "OpenTelemetry Collector"

Troubleshooting Guide

Dashboard Not Appearing in Grafana

Verify Dashboard ConfigMap Location

# Check if ConfigMap exists in your Grafana namespace
kubectl get configmap -n <grafana-namespace> | grep sawmills

Ensure the ConfigMap is in the same namespace as your Grafana installation.

Check Dashboard ConfigMap Labels
```
# The label should be exactly:
grafana_dashboard: "1"
```
Common issues:
- Using "true" instead of "1"
- Missing or incorrect labels
Verify Dashboard JSON Format
- Ensure the dashboard data key has a .json extension
- Check that the JSON is properly formatted
- Verify the dashboard has proper tags

Restart Grafana to Pick Up Changes

# Delete the Grafana pod to force a restart (adjust namespace and labels as needed)
kubectl delete pod -n <grafana-namespace> -l app.kubernetes.io/name=grafana

Metrics Not Appearing in Dashboard

Verify Collector Service

# Check if the collector service exposes prometheus port
kubectl get svc sawmills-collector -n sawmills
# Should show port 19465 named "prometheus"

Check ServiceMonitor Configuration

# Verify ServiceMonitor exists
kubectl get servicemonitor -n sawmills

# Check its configuration
kubectl describe servicemonitor sawmills-collector-monitor -n sawmills

Common issues:

Incorrect label selectors
Wrong port name (should be “prometheus”)
Incorrect namespace selector

Verify Prometheus Target Discovery

# Port forward Prometheus (adjust namespace as needed)
kubectl port-forward svc/prometheus-kube-prometheus-prometheus 9090:9090 -n <prometheus-namespace>

# Check /targets page for:
# - Target status (up/down)
# - Last scrape time
# - Error messages

Check Metric Availability In Prometheus UI:
- Try querying basic OpenTelemetry metrics:
  otelcol_process_uptime otelcol_receiver_accepted_spans otelcol_processor_refused_spans otelcol_exporter_queue_size
- Use the Graph view to verify data points are being collected

Common Error Scenarios

“No Data” in Grafana Panels
- Verify Prometheus data source is configured in Grafana
- Check metric names in panel queries match actual metrics
- Ensure time range is appropriate (default: last 1 hour)
ServiceMonitor Not Working
- Verify Prometheus Operator is watching the sawmills namespace
- Check if ServiceMonitor labels match Prometheus configuration
- Ensure endpoints section matches service port configuration
Dashboard Import Issues
- Clear browser cache and reload Grafana
- Check Grafana logs for JSON parsing errors
- Verify dashboard version compatibility

Best Practices

Effective Monitoring Strategy

Resource Monitoring
- Monitor CPU and memory usage to ensure the collector operates efficiently
- Watch for unusual resource consumption patterns that could indicate issues
Data Flow Monitoring
- Track data volume with receiver metrics (adjust based on your receiver type, e.g., OTLP for receiving from instrumented services, Kafka for consuming from message brokers)
- Watch export performance with exporter metrics (adjust based on your destination, e.g., OTLP for forwarding to backends like Jaeger or Tempo, S3 for archiving)
Refusal and Error Detection
- Receiver Refusals: Monitor refused data at the receiver
  - Create alerts when refusals occur
  - Common causes: malformed data, incompatible protocol versions, rate limiting
- Exporter Refusals: Track refusals at the exporter
  - Critical metric that indicates data isn’t reaching its final destination
  - Common causes: network issues, authentication failures, backend overload
Queue Monitoring
- Track queue sizes for each exporter (queue configuration will depend on your deployment environment and destination systems)
- Watch for sustained high queue sizes, which indicate backpressure between the collector and downstream systems

Sample Prometheus Alert Rules

The following are sample alert rules that should be adjusted based on your specific collector configuration, data volumes, and tolerance for refusals:

groups:
- name: SawmillsCollector
  rules:
  - alert: SawmillsReceiverRefusingLogs
    expr: sum(rate(otelcol_receiver_refused_log_records{job="sawmills/sawmills-collector-monitor",receiver="otlp"}[5m])) > 100
    for: 15m
    labels:
      severity: warning
    annotations:
      summary: "Sawmills collector refusing logs at receiver"
      description: "The receiver is refusing logs, possibly due to high load or malformed data."

  - alert: SawmillsExporterRefusingLogs
    expr: sum(rate(otelcol_exporter_refused_log_records{job="sawmills/sawmills-collector-monitor",exporter="otlp"}[5m])) > 100
    for: 15m
    labels:
      severity: critical
    annotations:
      summary: "Sawmills collector refusing logs at exporter"
      description: "The exporter is refusing logs, possibly due to backend connectivity issues."

Note: These alert rules are starting points and should be tailored to your environment. The thresholds (>100) and evaluation periods (15m) should be adjusted based on your normal data volumes and acceptable error rates for your specific source and destination systems. For assistance with configuring alerts specific to your Sawmills deployment, please contact the Sawmills team. For more information about OpenTelemetry collector metrics, visit the official documentation.

Getting Started

Collectors

Pipelines

Processors

Live Logs

Data Insights

Analytics Dashboard

How To Guides

Integrations

Administration

Monitor Sawmills with Prometheus and Grafana

Prerequisites

Setting Up the Monitoring Stack

Step 1: Set Up the ServiceMonitor

Step 2: Set Up the Grafana Dashboard

Option 1: Import Directly in Grafana UI

Option 2: Deploy via ConfigMap

Collector Configuration for Metrics

Verifying the Setup

1. Check ServiceMonitor Status

2. Verify Prometheus Scraping

3. Access the Grafana Dashboard

Troubleshooting Guide

Dashboard Not Appearing in Grafana

Metrics Not Appearing in Dashboard

Common Error Scenarios

Best Practices

Effective Monitoring Strategy

Sample Prometheus Alert Rules

Getting Started

Collectors

Pipelines

Processors

Live Logs

Data Insights

Analytics Dashboard

How To Guides

Integrations

Administration

​Prerequisites

​Setting Up the Monitoring Stack

​Step 1: Set Up the ServiceMonitor

​Step 2: Set Up the Grafana Dashboard

​Option 1: Import Directly in Grafana UI

​Option 2: Deploy via ConfigMap

​Collector Configuration for Metrics

​Verifying the Setup

​1. Check ServiceMonitor Status

​2. Verify Prometheus Scraping

​3. Access the Grafana Dashboard

​Troubleshooting Guide

​Dashboard Not Appearing in Grafana

​Metrics Not Appearing in Dashboard

​Common Error Scenarios

​Best Practices

​Effective Monitoring Strategy

​Sample Prometheus Alert Rules

Prerequisites

Setting Up the Monitoring Stack

Step 1: Set Up the ServiceMonitor

Step 2: Set Up the Grafana Dashboard

Option 1: Import Directly in Grafana UI

Option 2: Deploy via ConfigMap

Collector Configuration for Metrics

Verifying the Setup

1. Check ServiceMonitor Status

2. Verify Prometheus Scraping

3. Access the Grafana Dashboard

Troubleshooting Guide

Dashboard Not Appearing in Grafana

Metrics Not Appearing in Dashboard

Common Error Scenarios

Best Practices

Effective Monitoring Strategy

Sample Prometheus Alert Rules