What is Sawmills.ai?
Sawmills.ai is an AI-powered telemetry (logs, metrics, traces) management platform built on open standards (OpenTelemetry) that helps engineering teams regain control of observability data before it becomes an expensive, noisy, and fragile firehose. Instead of treating logs, metrics, and traces as “ship everything and pray,” Sawmills continuously analyzes telemetry as it flows through your pipelines, identifies waste and risk (for example, verbose logging, duplicate attributes, or high-cardinality metrics), and helps you apply optimizations that reduce cost while preserving the visibility you actually need. Sawmills sits in front of your existing observability tools and works with them rather than replacing them. You can design and manage pipelines powered by the OpenTelemetry Collector, route data to multiple destinations, and keep things vendor-agnostic. After you understand the concepts below, you’ll be ready to deploy Collectors, define pipelines, and start tightening control over volume, quality, routing, and cost.Key Concepts
This section introduces the key concepts essential to understanding how Sawmills works. These components are the building blocks of all Sawmills deployments and collectively define how data flows through the system.1. Collector
The central component of Sawmills is theSawmills Collector. It is based on OpenTelemetry’s OpenTelemetry Collector. It is responsible for gathering, processing, and exporting telemetry logs, metrics, and traces. The Collector decouples generation and analysis of observability data, allowing teams to manage data collection, transformation, and forwarding in a streamlined, scalable manner.
Key Collector Features
- Vendor-agnostic: Integrates with many sources and destinations, reducing vendor lock-in.
- Scalable: Supports distributed deployments to handle high-throughput telemetry data.
- Flexible: Enables custom processing, transformation, and enrichment before data is sent downstream.
pipelines that specify how telemetry data flows from sources to destinations.
2. Remote Operator
TheSawmills Remote Operator is the control plane component that manages Collector behavior in the field. It coordinates configuration deployments, reports on collector health and held acessing live data stream.
3. Pipelines
APipeline is a sequence of sources, processors, and destinations that define how data moves through Sawmills. Each component of a pipeline can be configured to specify how telemetry data is received, transformed, enriched, and forwarded.
A pipeline can run on many Collectors, and Collectors can run many pipelines.
Sources
Sources are the entry points for telemetry data in Sawmills. They specify the origin of your data (application logs, infrastructure metrics, traces, and other telemetry signals). Under the hood, sources typically map to OpenTelemetry Collector receivers, supporting a variety of protocols and formats. Common source patterns:- Log sources: Ingest logs from applications, servers, or services.
- Metric sources: Collect metrics from infrastructure or applications.
- Trace sources: Gather traces from distributed applications.
Destinations
Destinations are endpoints where telemetry data is sent after processing. Under the hood, destinations typically map to OpenTelemetry Collector exporters. Sawmills supports many destination types, providing flexibility in routing data to the tools and storage systems your teams already use. Common destination patterns:- Monitoring tools: Datadog, Prometheus, New Relic, and other providers of real-time analysis.
- Logging and storage systems: Elasticsearch or object storage (for example, Amazon S3) for longer-term retention.
- Analytics platforms: Systems used for aggregation, reporting, and analysis.
Processors
Processors define how telemetry data is transformed as it flows through the pipeline (for example, filtering, enrichment, aggregation, routing logic). See theProcessors section below for deeper detail.
4. Processors
Processors modify and/or enrich telemetry data as it flows through a pipeline. They apply transformations that clean, filter, aggregate, or add context to the data, making it more meaningful and easier (and cheaper) to use downstream.
Processor Functions
Here are 5 common examples:- Filtering: Remove unwanted data to reduce noise and cost.
- Transformation: Rename/copy/drop/normalize fields and attributes to standardize data.
- Sampling: Keep a representative subset of high-volume telemetry while preserving key signals.
- Routing: Send different subsets of data to different destinations based on rules.
- Redaction/masking: Remove or obfuscate sensitive values before exporting data.
These are just examples—there are many other processors and functions available depending on the signal type (logs, metrics, traces) and the tranformation type.
In a pipeline, data flows from all sources to all destinations.
5. Live-tail
Live-tail provides a real-time streaming view of your telemetry as it flows through the pipeline. It is designed for rapid debugging and validation, allowing teams to confirm that data is arriving, being transformed, and routed correctly.
Live-tail Use Cases
- Pipeline validation: Verify that a new source or processor is emitting the expected output.
- Troubleshooting: Spot anomalies, errors, or unexpected attributes in near real time.
- Ad-hoc exploration: Filter and inspect current data without waiting for downstream storage or indexing.
6. Processor Simulation
Processor simulation lets you test processor logic against sample payloads before applying changes to production data. This helps teams iterate on filters, transforms, or enrichments with confidence.
Why Simulation Matters
- Safe experimentation: Try transformations without impacting live traffic.
- Faster iteration: Validate outputs immediately and refine logic.
- Predictable outcomes: Understand how new rules affect logs, metrics, and traces.
7. Dashboard
TheDashboard is the operational command center for Sawmills. It provides visibility into pipeline health, data volume, and processing effectiveness, all in a single view.
Dashboard Highlights
- Pipeline status: See which pipelines are active and where they run.
- Throughput metrics: Monitor data ingestion and export rates.
- Value provided: see total data reduced by time and by attribute.
8. Insights
Insights turn raw telemetry operations into actionable intelligence. They help you understand how data moves, where it is enriched or filtered, and what impact processors have on downstream costs and quality.
Insights Benefits
- Data optimization: Detect noisy sources and opportunities to reduce volume.
- Quality improvements: Track how enrichment changes data usefulness.
9. OTLP in the Collector
Sawmills usesOTLP (OpenTelemetry Protocol) to standardize telemetry data as it passes through the Collector. OTLP defines a clear data model for logs, metrics, and traces, which the Collector uses internally while processing pipeline traffic. For details, see the OTLP data model in the OpenTelemetry docs.
OTLP Structure by Signal
- Logs: Log Records are grouped in
ResourceLogs, and each log record includes attributes, timestamps, severity, and optional trace context for correlation. - Metrics: Metric data is organized in
ResourceMetrics, containing metric instruments (for example, counters, gauges, histograms) and their data points. - Traces: Traces are represented as
ResourceSpans, which contain spans with links, events, and attributes that describe distributed operations.
10. Deployment Model for Processors and Pipelines
Sawmills separates processor delivery from pipeline deployment to keep operational changes fast and safe:- Deploy processors without collector redeployment: Processor logic can be added or updated centrally and delivered to Collectors without rebuilding or redeploying the Collector binary.
- Pipeline changes still require deployment: Changes to pipeline structure (for example, adding a new source, destination, or reordering processors) require a pipeline deployment so the Collector loads the updated configuration.
Summary
- Collector: The core component that manages data collection, processing, and forwarding.
- Remote Operator: Control plane for managing Collector configuration.
- Pipelines: Define the flow of telemetry data through sources, processors, and destinations.
- Processors: Transform and enrich data as it moves through a pipeline.
- Live-tail: Real-time streaming visibility into active telemetry traffic.
- Processor simulation: Safe testing of processor logic before live deployment.
- Dashboard: Central view of pipeline health, throughput, and operations.
- Insights: Analytics about data quality, volume, and processing impact.
- OTLP model: Standardized internal structure for logs, metrics, and traces.
- Deployment model: Rapid processor updates with deliberate pipeline deployments.