Prerequisites
Before you begin, ensure you have:- The Sawmills Collector deployed and running in the
sawmillsnamespace - A Kubernetes cluster with Prometheus Operator installed
- Grafana installed in your cluster
observability namespace, but adjust the commands to match your actual installation namespace.
Setting Up the Monitoring Stack
Step 1: Set Up the ServiceMonitor
The Sawmills Collector service already exposes metrics on port 19465. Create a ServiceMonitor to configure Prometheus to scrape these metrics:sawmills-servicemonitor.yaml and apply it:
Step 2: Set Up the Grafana Dashboard
The Sawmills Collector is based on OpenTelemetry. You can use the official OpenTelemetry Collector dashboard from the Grafana dashboard catalog. There are two methods to install this dashboard:Option 1: Import Directly in Grafana UI
- In your Grafana UI, go to the ”+” icon in the sidebar and select “Import”
- Enter dashboard ID
15983in the “Import via grafana.com” field - Click “Load”
- Select your Prometheus data source
- Click “Import”
Option 2: Deploy via ConfigMap
Create a ConfigMap for the Grafana dashboard. This should be created in the same namespace where Grafana is installed (we’ll useobservability as an example):
sawmills-grafana-dashboard.yaml and apply it:
Collector Configuration for Metrics
Ensure your Sawmills collector configuration includes proper telemetry settings to expose metrics. Here’s an example:Verifying the Setup
1. Check ServiceMonitor Status
Verify that the ServiceMonitor is properly configured:2. Verify Prometheus Scraping
Check if Prometheus is successfully scraping the metrics. Replaceobservability with your Prometheus installation namespace:
3. Access the Grafana Dashboard
View your metrics in Grafana. Replaceobservability with your Grafana installation namespace:
Troubleshooting Guide
Dashboard Not Appearing in Grafana
-
Verify Dashboard ConfigMap Location
Ensure the ConfigMap is in the same namespace as your Grafana installation.
-
Check Dashboard ConfigMap Labels
Common issues:
- Using
"true"instead of"1" - Missing or incorrect labels
- Using
-
Verify Dashboard JSON Format
- Ensure the dashboard data key has a
.jsonextension - Check that the JSON is properly formatted
- Verify the dashboard has proper tags
- Ensure the dashboard data key has a
-
Restart Grafana to Pick Up Changes
Metrics Not Appearing in Dashboard
-
Verify Collector Service
-
Check ServiceMonitor Configuration
Common issues:
- Incorrect label selectors
- Wrong port name (should be “prometheus”)
- Incorrect namespace selector
-
Verify Prometheus Target Discovery
-
Check Metric Availability
In Prometheus UI:
- Try querying basic OpenTelemetry metrics:
- Use the Graph view to verify data points are being collected
- Try querying basic OpenTelemetry metrics:
Common Error Scenarios
-
“No Data” in Grafana Panels
- Verify Prometheus data source is configured in Grafana
- Check metric names in panel queries match actual metrics
- Ensure time range is appropriate (default: last 1 hour)
-
ServiceMonitor Not Working
- Verify Prometheus Operator is watching the
sawmillsnamespace - Check if ServiceMonitor labels match Prometheus configuration
- Ensure endpoints section matches service port configuration
- Verify Prometheus Operator is watching the
-
Dashboard Import Issues
- Clear browser cache and reload Grafana
- Check Grafana logs for JSON parsing errors
- Verify dashboard version compatibility
Best Practices
Effective Monitoring Strategy
-
Resource Monitoring
- Monitor CPU and memory usage to ensure the collector operates efficiently
- Watch for unusual resource consumption patterns that could indicate issues
-
Data Flow Monitoring
- Track data volume with receiver metrics (adjust based on your receiver type, e.g., OTLP for receiving from instrumented services, Kafka for consuming from message brokers)
- Watch export performance with exporter metrics (adjust based on your destination, e.g., OTLP for forwarding to backends like Jaeger or Tempo, S3 for archiving)
-
Refusal and Error Detection
- Receiver Refusals: Monitor refused data at the receiver
- Create alerts when refusals occur
- Common causes: malformed data, incompatible protocol versions, rate limiting
- Exporter Refusals: Track refusals at the exporter
- Critical metric that indicates data isn’t reaching its final destination
- Common causes: network issues, authentication failures, backend overload
- Receiver Refusals: Monitor refused data at the receiver
-
Queue Monitoring
- Track queue sizes for each exporter (queue configuration will depend on your deployment environment and destination systems)
- Watch for sustained high queue sizes, which indicate backpressure between the collector and downstream systems