Skip to main content

Overview

When deploying a Sawmills Collector, it is crucial to plan for capacity to ensure optimal performance and reliability. Proper capacity planning helps prevent bottlenecks, ensures consistent data processing, and enables cost-effective scaling as your telemetry volume grows. The Sawmills Collector is architected for horizontal scaling to handle enterprise-scale data volumes efficiently. Under default configuration with 4 CPU cores and no processors, a single collector instance can process approximately 14 million logs per day, providing a solid foundation for most production workloads.

Kubernetes Resource Requirements

We recommend allocating at least 4 vCPUs and 3 GiB memory per node as a kubernetes limit with a minimum (kubernetes request) of 1 vCPU and 1 GiB memory.

AWS Instance Types

The collector runs best on Graviton-based instances, preferably: M7g, M6g We also support amd64/x86_64 architectures (less performant), such as: M7i, M6i, M6a

Azure Instance Types

For Azure deployments, we recommend the following instance types based on workload requirements:
  • D4s_v5: 4 vCPUs, 16 GB RAM - Ideal for moderate telemetry volumes
  • D8s_v5: 8 vCPUs, 32 GB RAM - Suitable for high telemetry volumes

Google Cloud Platform (GCP) Instance Types

For GCP deployments, we recommend the following machine types:
  • n2-standard-4: 4 vCPUs, 16 GB RAM - Ideal for moderate telemetry volumes
  • n2-standard-8: 8 vCPUs, 32 GB RAM - Suitable for high telemetry volumes

Collector Sizing Table

You can reference this table as a starting point for collector system requirements. The resource recommendations do not consider multiple exporters or processors.
Telemetry ThroughputLogs / secondCollectors
5 GiB/m130,0001
10 GiB/m260,0002
20 GiB/m520,0004
40 GiB/m1,040,0008
80 GiB/m2,080,00016
160 GiB/m4,160,00032
320 GiB/m8,320,00064
640 GiB/m16,640,000128
1,280 GiB/m33,280,000256
2,560 GiB/m66,560,000512

Fault Tolerance and High Availability

Over-provisioning is essential for maintaining service continuity. We recommend maintaining 20-30% additional capacity beyond your baseline requirements to ensure your collector fleet can handle:
  • Node failures: When one or more collector instances fail unexpectedly
  • Maintenance windows: Planned downtime for updates, patches, or infrastructure changes
  • Traffic spikes: Sudden increases in telemetry volume during peak usage periods
  • Rolling deployments: Zero-downtime updates that temporarily reduce available capacity
This buffer ensures that your remaining collectors can seamlessly absorb the workload without dropping data or experiencing performance degradation. When dealing with a fixed number of collectors, you can scale their CPU and memory vertically in order to increase throughput. See collector sizing table above.