Lookup Processor - Sawmills

Supported Data Types

📘 Logs

Configuring the Sawmills Lookup Processor for Logs

The Sawmills Lookup Processor enriches log records with data from a CSV file by matching a log attribute value against a CSV column and adding all other CSV columns to the log record. This is useful for adding contextual information, such as service metadata or environment details, to your logs.

Configuration Components

1. Name

Description: Identifier for your processor. Use a unique and descriptive name to differentiate between multiple processors.

Conditions: Specify conditions to filter events before processing. Events must satisfy all conditions (AND) or at least one (OR) based on the selected logic. Each condition follows this sequence:
1. Choose the condition type:
  - Log Level (Severity)
  - Body as String
2. Select a comparison operator:
  - Equals / Not Equals
3. Provide a value:
  - Log Level: Select from a dropdown (INFO, WARN, ERROR, etc.).
  - Body as String: Enter a free-text value for matching.

3. Source Type

Shared Path: Absolute path to a CSV file accessible by the collector. The file must be readable by the collector process.
- Example: /etc/sawmills/lookup.csv
- Requirements: Must be an absolute path and end with .csv extension.

4. Lookup Key

Log Attribute: Select the log attribute whose value will be used to lookup matching rows in the CSV file. Supports nested attribute paths using dot notation (e.g., sawmills.service).
CSV Column: The CSV header name used as the lookup key. Must match the file header exactly. All CSV columns except this lookup column will be added to the log record.

5. Enriched Fields

Target Scope: Where the enriched fields will be written.
- Resources: Adds fields to resource attributes
- Attributes: Adds fields to log record attributes
- Body: Adds fields to the log body. Fields are only written if the body is already a structured object. Plain-text bodies are left unchanged.
Prefix: Optional namespace added in front of all enriched fields. For example, a prefix of enrichment will write fields as enrichment.<column_name>. Trailing dots are automatically removed. Slashes (/) and backslashes (\, except for escaping dots) are not accepted.

6. Field Conflicts

On Conflict: Controls what happens if a target field already exists on the record.
- Skip existing fields: Leaves existing fields unchanged and skips enrichment for those fields.
- Override existing fields: Replaces existing field values with values from the CSV file.

Dot Notation for Nested Attributes

The Lookup Processor supports dot notation for accessing nested attributes in both lookup keys and enrichment prefixes. This allows you to work with structured data where attributes are organized in nested maps.

Lookup Key with Dot Notation

When specifying a lookup key with dot notation, the processor traverses nested maps to find the value. For example, if you specify sawmills.service as the lookup key:

The processor looks for attributes["sawmills"] (which must be a map).
Then looks for attributes["sawmills"]["service"] to get the value.
This value is used to match against the CSV lookup column.

Example: If a log record has sawmills.service = "foo", the processor will:

Access attributes["sawmills"] (a map).
Read attributes["sawmills"]["service"] which equals "foo".
Use "foo" to lookup matching rows in the CSV file.

Enrichment Prefix with Dot Notation

When specifying an enrichment prefix with dot notation, the processor creates nested maps to organize the enriched fields. For example, if you specify enrichment.bar as the prefix:

The processor creates attributes["enrichment"] (a map) if it doesn’t exist
Then creates attributes["enrichment"]["bar"] (a map) if it doesn’t exist
All enriched key-value pairs from the CSV are inserted under attributes["enrichment"]["bar"]

Example: With prefix enrichment.bar and CSV columns region and environment, the enriched data will be structured as:

attributes["enrichment"]["bar"]["region"] = CSV value for region
attributes["enrichment"]["bar"]["environment"] = CSV value for environment

Escaping Dots

If you need to reference a flat attribute key that contains a literal dot (not a nested path), escape it with a backslash: sawmills\.service will look for attributes["sawmills.service"] as a single key rather than traversing nested maps.

Processor Operations

The Lookup Processor operates in the following sequence for each log record:

Condition Evaluation: If conditions are configured, the processor evaluates them. Logs that don’t match the conditions are skipped.
Lookup Key Extraction: The processor extracts the lookup key value from the specified log attribute (or resource attribute/body) using the configured lookup key path.
CSV Lookup: The extracted lookup key value is used to search the CSV file’s lookup column for a matching row.
Enrichment: If a match is found, all CSV columns (except the lookup column) are added to the log record at the specified target scope with the optional prefix.
Conflict Handling: If target fields already exist, the processor applies the configured conflict resolution strategy (skip or overwrite).

The processor processes logs individually and continues processing even if individual logs fail to enrich, ensuring high availability and fault tolerance.

CSV Structure Requirements and Limitations

Requirements

Header Row: The CSV file must have a header row as the first line containing column names.
Lookup Key Column: One column must match the configured CSV column name exactly (case-sensitive).
Column Count Consistency: All data rows must have the same number of columns as the header row.
File Format: The file must be a valid CSV file with proper formatting.

Limitations

Maximum Columns: 50 columns per CSV file
Maximum Rows: 5,000 rows per CSV file
Duplicate Keys: If multiple rows have the same lookup key value, the first matching row is used (subsequent duplicates are ignored).
Empty Values: Empty CSV cell values are allowed but are skipped during enrichment (not added to log records).

CSV File Example

service_name,region,environment,owner
checkout,us-east-1,production,team-a
payments,us-west-2,staging,team-b
auth,eu-central-1,production,team-c

In this example:

service_name is the lookup key column.
region, environment, and owner will be added as enrichment attributes.
The file has 4 columns and 3 data rows (within limits).

Edge Cases and Processor Behavior

Lookup Key Not Found in Log

Scenario: The log record doesn’t contain the specified lookup key attribute. Behavior: The processor skips enrichment for that log record and continues processing. No error is raised, and the log passes through unchanged.

CSV Lookup Key Not Found

Scenario: The lookup key value from the log doesn’t match any row in the CSV file. Behavior: The processor skips enrichment for that log record and continues processing. No error is raised, and the log passes through unchanged.

Empty CSV Values

Scenario: A CSV cell contains an empty value. Behavior: Empty values are skipped during enrichment. The attribute is not added to the log record.

Duplicate Lookup Keys in CSV

Scenario: Multiple rows in the CSV file have the same lookup key value. Behavior: The first matching row is used. Subsequent rows with the same key are ignored.

Body Target with Non-Map Body

Scenario: Enrichment target is set to “Body” but the log body is plain text (not a structured object). Behavior: The processor skips enrichment for that log record. Enrichment only works when the body is already a structured map/object.

Prefix Path Conflicts

Scenario: The enrichment prefix path conflicts with an existing non-map value (e.g., enrichment exists as a string but prefix is enrichment.bar). Behavior: Depends on the conflict resolution setting:

Skip: Processor skips enrichment and logs a debug message
Overwrite: Processor creates a new map at the conflicting path, replacing the existing value

Condition Evaluation Errors

Scenario: An error occurs while evaluating conditions for a log record. Behavior: The processor logs a debug message and skips that log record (does not enrich it).

Target Attribute Access Errors

Scenario: The processor cannot access the specified target (attributes, resource, or body). Behavior: The processor skips enrichment for that log record, logs a debug message, and continues processing other logs.

Shared Path Mount

The Lookup Processor requires access to CSV files via a shared path that is accessible by all collector pod instances. The shared path must be an absolute path on the filesystem where the collector is running.

Path Requirements

Absolute Path: Must start with / (e.g., /mnt/efs/lookup.csv).
File Extension: Must end with .csv extension.
Readable: The collector process must have read permissions for the file.
Persistent: The file should be available across collector pod restarts.

The CSV file is loaded once during processor initialization. If the file changes, the collector must be restarted to reload the updated CSV data. For setting up the shared path mount using EFS, see the EFS Volume Setup section below.

EFS Volume Setup for Sawmills Collector

This guide explains how to configure EFS (Elastic File System) volume mounts for the Sawmills Collector using the remote-operator Helm chart.

Overview

The Lookup Processor requires access to lookup files stored on an EFS volume. This setup mounts the EFS filesystem to /mnt/efs in the collector pods.

Prerequisites

Before installing the Helm chart with EFS support, you must create the StorageClass and PersistentVolumeClaim in your cluster.

1. Create the StorageClass

kubectl apply -f - <<EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: efs-rwx-enrichment-sawmills
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap
  fileSystemId: fs-xxxxxxxx
  basePath: "/k8s/sawmills-production-enrichment"
  directoryPerms: "0777"
reclaimPolicy: Delete
volumeBindingMode: Immediate
EOF

Note: Replace fileSystemId with your actual EFS filesystem ID.

2. Create the PersistentVolumeClaim

kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: enrichment-files
  namespace: <your-namespace>
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: efs-rwx-enrichment-sawmills
  resources:
    requests:
      storage: 500Mi
EOF

3. Verify Resources

## Check StorageClass
kubectl get storageclass efs-rwx-enrichment-sawmills

# Check PVC
kubectl get pvc enrichment-files -n <your-namespace>

Helm Installation

Option A: Using `--set` flags

helm upgrade --install sawmills-remote-operator \
  oci://public.ecr.aws/s7a5m1b4/sawmills-remote-operator-chart \
  --version 2.0.18 \
  --namespace <your-namespace> \
  --set apiKeyExistingSecret=sawmills-secret \
  --set operatorAddress=https://controller.ue1.prod.plat.sm-svc.com \
  --set collectorName="my-colllector" \
  --set image.tag="0.189.0" \
  --set remoteWriteUrl="https://ingress.sawmills.ai" \
  --set selfManaged=false \
  --set 'managedChartsValues.sawmills-collector.additionalVolumes[0].name=enrichment-files' \
  --set 'managedChartsValues.sawmills-collector.additionalVolumes[0].persistentVolumeClaim.claimName=enrichment-files' \
  --set 'managedChartsValues.sawmills-collector.additionalVolumeMounts[0].name=enrichment-files' \
  --set 'managedChartsValues.sawmills-collector.additionalVolumeMounts[0].mountPath=/mnt/efs'

Option B: Using a Values File (Recommended)

Create a values file values-efs.yaml:

# values-efs.yaml
managedChartsValues:
  sawmills-collector:
    additionalVolumes:
      - name: enrichment-files
        persistentVolumeClaim:
          claimName: enrichment-files
    additionalVolumeMounts:
      - name: enrichment-files
        mountPath: /mnt/efs
        readOnly: true  # Optional: set to true if files should be read-only

Then run:

helm upgrade --install sawmills-remote-operator \
  oci://public.ecr.aws/s7a5m1b4/sawmills-remote-operator-chart \
  --version 2.0.18 \
  --namespace <your-namespace> \
  --set apiKeyExistingSecret=sawmills-secret \
  --set operatorAddress=https://controller.ue1.prod.plat.sm-svc.com \
  --set collectorName="my-colllector" \
  --set image.tag="0.189.0" \
  --set remoteWriteUrl="https://ingress.sawmills.ai" \
  --set selfManaged=false \
  -f values-efs.yaml

Verification

After deployment, verify the EFS mount is working:

# Check if pods are running
kubectl get pods -n <your-namespace> -l app.kubernetes.io/name=sawmills-collector

# Verify the volume mount in a pod
kubectl exec -it deployment/sawmills-collector -n <your-namespace> -c main-collector -- ls -la /mnt/efs

# Check pod describe for volume mounts
kubectl describe pod -n <your-namespace> -l app.kubernetes.io/name=sawmills-collector | grep -A5 "Mounts:"

Lookup Processor Configuration

Once the EFS volume is mounted, configure the csvenrichmentprocessor in your collector config to use the mounted files:

processors:
  csvenrichmentprocessor:
    source:
      file:
        path: /mnt/efs/enrichment-files-service.csv
    lookup:
      key: sawmills.service  # Supports nested paths (e.g., attributes["sawmills"]["service"])
      target: attributes
      csv_key_column: service_name
    enrichment:
      target: attributes
      prefix: ""  # Optional prefix for enriched attributes
      on_conflict: overwrite  # or "skip"

Troubleshooting

PVC Not Bound

kubectl describe pvc enrichment-files -n <your-namespace>

Check that:

The StorageClass exists
The EFS CSI driver is installed
The fileSystemId is correct

Mount Permission Denied

Ensure the EFS access point has correct permissions:

directoryPerms: "0777" in StorageClass
Security groups allow NFS traffic (port 2049)

File Not Found in Pod

# List files in the EFS mount
kubectl exec -it deployment/sawmills-collector -n <your-namespace> -c main-collector -- ls -la /mnt/efs

# Check if the file exists
kubectl exec -it deployment/sawmills-collector -n <your-namespace> -c main-collector -- cat /mnt/efs/enrichment-files-service.csv | head -5

Architecture

+-------------------------------------------------------------+
|                  Kubernetes Cluster                         |
|                                                             |
|  +------------------+     +------------------------------+  |
|  |   StorageClass   |     |        EFS (AWS)             |  |
|  |  efs-rwx-...     |---->|  fs-062ce7e15a1f260c0        |  |
|  +------------------+     |  /k8s/sawmills-production-   |  |
|         |                 |   enrichment/                |  |
|         |                 +------------------------------+  |
|         |                            |                      |
|         v                            |                      |
|  +------------------+                |                      |
|  |       PVC        |                |                      |
|  | enrichment-files |                |                      |
|  +------------------+                |                      |
|         |                            |                      |
|         |                            |                      |
|         v                            v                      |
|  +-------------------------------------------------------+  |
|  |            Sawmills Collector Pod                     |  |
|  |  +--------------------------------------------------+ |  |
|  |  |      main-collector container                    | |  |
|  |  |                                                  | |  |
|  |  |  /mnt/efs/ <--- EFS Volume Mount                 | |  |
|  |  |    +-- enrichment-files-service.csv              | |  |
|  |  |                                                  | |  |
|  |  |  csvenrichmentprocessor reads from:              | |  |
|  |  |    /mnt/efs/enrichment-files-service.csv         | |  |
|  |  +--------------------------------------------------+ |  |
|  +-------------------------------------------------------+  |
|                                                             |
+-------------------------------------------------------------+

Getting Started

Collectors

Pipelines

Processors

Live Logs

Data Insights

Analytics Dashboard

How To Guides

Integrations

Administration

​Supported Data Types

​Configuring the Sawmills Lookup Processor for Logs

​Configuration Components

​1. Name

​2. Attribute Filters

​3. Source Type

​4. Lookup Key

​5. Enriched Fields

​6. Field Conflicts

​Dot Notation for Nested Attributes

​Lookup Key with Dot Notation

​Enrichment Prefix with Dot Notation

​Escaping Dots

​Processor Operations

​CSV Structure Requirements and Limitations

​Requirements

​Limitations

​CSV File Example

​Edge Cases and Processor Behavior

​Lookup Key Not Found in Log

​CSV Lookup Key Not Found

​Empty CSV Values

​Duplicate Lookup Keys in CSV

​Body Target with Non-Map Body

​Prefix Path Conflicts

​Condition Evaluation Errors

​Target Attribute Access Errors

​Shared Path Mount

​Path Requirements

​EFS Volume Setup for Sawmills Collector

​Overview

​Prerequisites

​1. Create the StorageClass

​2. Create the PersistentVolumeClaim

​3. Verify Resources

​Helm Installation

​Option A: Using --set flags

​Option B: Using a Values File (Recommended)

​Verification

​Lookup Processor Configuration

​Troubleshooting

​PVC Not Bound

​Mount Permission Denied

​File Not Found in Pod

​Architecture