Skip to main content

Supported Data Types

📘 Logs

Configuring the Sawmills Aggregation Processor

The Sawmills Aggregation Processor allows you to group similar log events together over a specified time window, reducing noise and storage costs by collapsing repeated events into summary records.

Configuration Components

1. Name

  • Description: A unique identifier for your aggregation processor. Choose a descriptive name that helps you easily identify this processor in your pipeline.

2. Conditions (Optional)

  • Description: Specify attribute-based conditions to determine which events should be processed.
  • Functionality: Only aggregate events that match (or don’t match) these conditions.
  • Logic Options:
    • Match all (AND): All conditions must be met for the event to be aggregated.
    • Match any (OR): At least one condition must be met for the event to be aggregated.

3. Interval (seconds)

  • Description: The time window (in seconds) over which matching records are aggregated.
  • Functionality: Events occurring within this time window will be grouped together if they’re considered similar.

4. Dedup Threshold

  • Description: The number of identical events required before collapsing them into one summary record.
  • Functionality: Only when this threshold is reached will events be aggregated into a summary record.

Advanced Options

1. Aggregate Max Size

  • Description: Maximum number of distinct event patterns that can be tracked simultaneously for aggregation.
  • Functionality: When this limit is reached, the processor will flush all currently tracked aggregations, even if their time intervals haven’t elapsed or deduplication thresholds haven’t been met.

2. Timezone

  • Description: The timezone to apply to the timestamp on the new summary record.
  • Functionality: Ensures that timestamp information on aggregated records uses a consistent timezone.

3. Masks

  • Description: Regex patterns for stripping or masking dynamic content before deduplication.
  • Functionality: Helps identify similar events even when they contain variable elements like timestamps, IDs, or other dynamic content.
  • Components:
    • Name: An identifier for the mask pattern.
    • Pattern: The regular expression pattern used to identify the dynamic content to mask.

Aggregation Attributes

When an event group meets the dedup threshold (i.e., more than one matching log is found within the interval), the processor enriches the summary record with the following attributes:
AttributeTypeDescription
sawmills.log_countIntegerThe total number of similar events that were collapsed into this summary record.
sawmills.first_seenString (RFC 3339)Timestamp of the earliest event in the group, formatted in the configured timezone.
sawmills.last_seenString (RFC 3339)Timestamp of the latest event in the group, formatted in the configured timezone.
These attributes are nested under a sawmills key in the log record attributes. For example:
{
  "sawmills": {
    "log_count": 5,
    "first_seen": "2025-04-10T10:19:52Z",
    "last_seen": "2025-04-10T10:21:07Z"
  }
}
Events that do not meet the dedup threshold are forwarded as-is, without any aggregation attributes.

Use Cases

  • Error Aggregation: Collapse repeated error messages into a single summary, showing the count and time range.
  • Noise Reduction: Reduce the volume of high-frequency, low-value logs while preserving information about their occurrence.
  • Alerts De-duplication: Prevent alert fatigue by combining similar alerts that occur within the specified time window.

Implementation Notes

  • Set an appropriate interval based on your application’s log patterns and monitoring needs.
  • Use masks to improve grouping accuracy by removing variable parts of log messages.
  • The dedup threshold helps prevent premature aggregation of events that may not be part of a true pattern.