> ## Documentation Index
> Fetch the complete documentation index at: https://docs.sawmills.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Lookup Processor

> Enrich log records by matching attributes against a CSV file using the Sawmills Lookup Processor. Supports dot notation and EFS-mounted lookup files.

## Supported Data Types

📘 **Logs**

### Configuring the Sawmills Lookup Processor for Logs

The Sawmills Lookup Processor enriches log records with data from a CSV file by matching a log attribute value against a CSV column and adding all other CSV columns to the log record. This is useful for adding contextual information, such as service metadata or environment details, to your logs.

### Configuration Components

#### 1. Name

* **Description**: Identifier for your processor. Use a unique and descriptive name to differentiate between multiple processors.

#### 2. Attribute Filters

* **Conditions**: Specify conditions to filter events before processing. Events must satisfy all conditions (AND) or at least one (OR) based on the selected logic.

  Each condition follows this sequence:

  1. **Choose the condition type:**
     * **Log Level (Severity)**
     * **Body as String**

  2. **Select a comparison operator:**
     * **Equals** / **Not Equals**

  3. **Provide a value:**
     * **Log Level:** Select from a dropdown (INFO, WARN, ERROR, etc.).
     * **Body as String:** Enter a free-text value for matching.

#### 3. Source Type

* **Shared Path**: Absolute path to a CSV file accessible by the collector. The file must be readable by the collector process.
  * **Example**: `/etc/sawmills/lookup.csv`
  * **Requirements**: Must be an absolute path and end with `.csv` extension.
* **S3 Remote Path**: URI pointing to a CSV file stored in an S3 bucket. The collector downloads the file at initialization.
  * **S3 Path**: Full S3 URI to the CSV file (e.g., `s3://my-bucket/enrichment/lookup.csv`).
  * **Region**: Required. Select the AWS region where the S3 bucket is located (e.g., `us-east-1`).
  * **Requirements**: The collector's service account must have `s3:GetObject` permission on the target bucket. On EKS, this is typically granted via [IRSA](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html).

#### 4. Lookup Key

* **Log Attribute**: Select the log attribute whose value will be used to lookup matching rows in the CSV file. Supports nested attribute paths using dot notation (e.g., `sawmills.service`).
* **CSV Column**: The CSV header name used as the lookup key. Must match the file header exactly. All CSV columns except this lookup column will be added to the log record.

#### 5. Enriched Fields

* **Target Scope**: Where the enriched fields will be written.
  * **Resources**: Adds fields to resource attributes
  * **Attributes**: Adds fields to log record attributes
  * **Body**: Adds fields to the log body. Fields are only written if the body is already a structured object. Plain-text bodies are left unchanged.
* **Prefix**: Optional namespace added in front of all enriched fields. For example, a prefix of `enrichment` will write fields as `enrichment.<column_name>`. Trailing dots are automatically removed. Slashes (`/`) and backslashes (`\`, except for escaping dots) are not accepted.

#### 6. Field Conflicts

* **On Conflict**: Controls what happens if a target field already exists on the record.
  * **Skip existing fields**: Leaves existing fields unchanged and skips enrichment for those fields.
  * **Override existing fields**: Replaces existing field values with values from the CSV file.

***

## Dot Notation for Nested Attributes

The Lookup Processor supports dot notation for accessing nested attributes in both lookup keys and enrichment prefixes. This allows you to work with structured data where attributes are organized in nested maps.

### Lookup Key with Dot Notation

When specifying a lookup key with dot notation, the processor traverses nested maps to find the value. For example, if you specify `sawmills.service` as the lookup key:

* The processor looks for `attributes["sawmills"]` (which must be a map).
* Then looks for `attributes["sawmills"]["service"]` to get the value.
* This value is used to match against the CSV lookup column.

**Example**: If a log record has `sawmills.service = "foo"`, the processor will:

1. Access `attributes["sawmills"]` (a map).
2. Read `attributes["sawmills"]["service"]` which equals `"foo"`.
3. Use `"foo"` to lookup matching rows in the CSV file.

### Enrichment Prefix with Dot Notation

When specifying an enrichment prefix with dot notation, the processor creates nested maps to organize the enriched fields. For example, if you specify `enrichment.bar` as the prefix:

* The processor creates `attributes["enrichment"]` (a map) if it doesn't exist
* Then creates `attributes["enrichment"]["bar"]` (a map) if it doesn't exist
* All enriched key-value pairs from the CSV are inserted under `attributes["enrichment"]["bar"]`

**Example**: With prefix `enrichment.bar` and CSV columns `region` and `environment`, the enriched data will be structured as:

* `attributes["enrichment"]["bar"]["region"]` = CSV value for region
* `attributes["enrichment"]["bar"]["environment"]` = CSV value for environment

### Escaping Dots

If you need to reference a flat attribute key that contains a literal dot (not a nested path), escape it with a backslash: `sawmills\.service` will look for `attributes["sawmills.service"]` as a single key rather than traversing nested maps.

***

## Processor Operations

The Lookup Processor operates in the following sequence for each log record:

1. **Condition Evaluation**: If conditions are configured, the processor evaluates them. Logs that don't match the conditions are skipped.

2. **Lookup Key Extraction**: The processor extracts the lookup key value from the specified log attribute (or resource attribute/body) using the configured lookup key path.

3. **CSV Lookup**: The extracted lookup key value is used to search the CSV file's lookup column for a matching row.

4. **Enrichment**: If a match is found, all CSV columns (except the lookup column) are added to the log record at the specified target scope with the optional prefix.

5. **Conflict Handling**: If target fields already exist, the processor applies the configured conflict resolution strategy (skip or overwrite).

The processor processes logs individually and continues processing even if individual logs fail to enrich, ensuring high availability and fault tolerance.

***

## CSV Structure Requirements and Limitations

### Requirements

* **Header Row**: The CSV file must have a header row as the first line containing column names.
* **Lookup Key Column**: One column must match the configured CSV column name exactly (case-sensitive).
* **Column Count Consistency**: All data rows must have the same number of columns as the header row.
* **File Format**: The file must be a valid CSV file with proper formatting.

### Limitations

* **Maximum Columns**: 50 columns per CSV file
* **Maximum Rows**: 5,000 rows per CSV file
* **Duplicate Keys**: If multiple rows have the same lookup key value, the first matching row is used (subsequent duplicates are ignored).
* **Empty Values**: Empty CSV cell values are allowed but are skipped during enrichment (not added to log records).

### CSV File Example

```csv theme={null}
service_name,region,environment,owner
checkout,us-east-1,production,team-a
payments,us-west-2,staging,team-b
auth,eu-central-1,production,team-c
```

In this example:

* `service_name` is the lookup key column.
* `region`, `environment`, and `owner` will be added as enrichment attributes.
* The file has 4 columns and 3 data rows (within limits).

***

## Edge Cases and Processor Behavior

### Lookup Key Not Found in Log

**Scenario**: The log record doesn't contain the specified lookup key attribute.

**Behavior**: The processor skips enrichment for that log record and continues processing. No error is raised, and the log passes through unchanged.

### CSV Lookup Key Not Found

**Scenario**: The lookup key value from the log doesn't match any row in the CSV file.

**Behavior**: The processor skips enrichment for that log record and continues processing. No error is raised, and the log passes through unchanged.

### Empty CSV Values

**Scenario**: A CSV cell contains an empty value.

**Behavior**: Empty values are skipped during enrichment. The attribute is not added to the log record.

### Duplicate Lookup Keys in CSV

**Scenario**: Multiple rows in the CSV file have the same lookup key value.

**Behavior**: The first matching row is used. Subsequent rows with the same key are ignored.

### Body Target with Non-Map Body

**Scenario**: Enrichment target is set to "Body" but the log body is plain text (not a structured object).

**Behavior**: The processor skips enrichment for that log record. Enrichment only works when the body is already a structured map/object.

### Prefix Path Conflicts

**Scenario**: The enrichment prefix path conflicts with an existing non-map value (e.g., `enrichment` exists as a string but prefix is `enrichment.bar`).

**Behavior**: Depends on the conflict resolution setting:

* **Skip**: Processor skips enrichment and logs a debug message
* **Overwrite**: Processor creates a new map at the conflicting path, replacing the existing value

### Condition Evaluation Errors

**Scenario**: An error occurs while evaluating conditions for a log record.

**Behavior**: The processor logs a debug message and skips that log record (does not enrich it).

### Target Attribute Access Errors

**Scenario**: The processor cannot access the specified target (attributes, resource, or body).

**Behavior**: The processor skips enrichment for that log record, logs a debug message, and continues processing other logs.

***

## Shared Path Mount

The Lookup Processor requires access to CSV files via a shared path that is accessible by all collector pod instances. The shared path must be an absolute path on the filesystem where the collector is running.

### Path Requirements

* **Absolute Path**: Must start with `/` (e.g., `/mnt/efs/lookup.csv`).
* **File Extension**: Must end with `.csv` extension.
* **Readable**: The collector process must have read permissions for the file.
* **Persistent**: The file should be available across collector pod restarts.

The CSV file is loaded once during processor initialization. If the file changes, the collector must be restarted to reload the updated CSV data.

For setting up the shared path mount using EFS, see the [EFS Volume Setup](#efs-volume-setup-for-sawmills-collector) section below.

***

***

## EFS Volume Setup for Sawmills Collector

This guide explains how to configure EFS (Elastic File System) volume mounts for the Sawmills Collector using the remote-operator Helm chart.

### Overview

The Lookup Processor requires access to lookup files stored on an EFS volume. This setup mounts the EFS filesystem to `/mnt/efs` in the collector pods.

### Prerequisites

Before installing the Helm chart with EFS support, you must create the StorageClass and PersistentVolumeClaim in your cluster.

#### 1. Create the StorageClass

```bash theme={null}
kubectl apply -f - <<EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: efs-rwx-enrichment-sawmills
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap
  fileSystemId: fs-xxxxxxxx
  basePath: "/k8s/sawmills-production-enrichment"
  directoryPerms: "0777"
reclaimPolicy: Delete
volumeBindingMode: Immediate
EOF
```

> **Note:** Replace `fileSystemId` with your actual EFS filesystem ID.

#### 2. Create the PersistentVolumeClaim

```bash theme={null}
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: enrichment-files
  namespace: <your-namespace>
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: efs-rwx-enrichment-sawmills
  resources:
    requests:
      storage: 500Mi
EOF
```

#### 3. Verify Resources

```bash theme={null}
## Check StorageClass
kubectl get storageclass efs-rwx-enrichment-sawmills

# Check PVC
kubectl get pvc enrichment-files -n <your-namespace>
```

### Helm Installation

#### Option A: Using `--set` flags

```bash theme={null}
helm upgrade --install sawmills-remote-operator \
  oci://public.ecr.aws/s7a5m1b4/sawmills-remote-operator-chart \
  --version 2.0.18 \
  --namespace <your-namespace> \
  --set apiKeyExistingSecret=sawmills-secret \
  --set operatorAddress=https://controller.ue1.prod.plat.sm-svc.com \
  --set collectorName="my-colllector" \
  --set image.tag="0.189.0" \
  --set remoteWriteUrl="https://ingress.sawmills.ai" \
  --set selfManaged=false \
  --set 'managedChartsValues.sawmills-collector.additionalVolumes[0].name=enrichment-files' \
  --set 'managedChartsValues.sawmills-collector.additionalVolumes[0].persistentVolumeClaim.claimName=enrichment-files' \
  --set 'managedChartsValues.sawmills-collector.additionalVolumeMounts[0].name=enrichment-files' \
  --set 'managedChartsValues.sawmills-collector.additionalVolumeMounts[0].mountPath=/mnt/efs'
```

#### Option B: Using a Values File (Recommended)

Create a values file `values-efs.yaml`:

```yaml theme={null}
# values-efs.yaml
managedChartsValues:
  sawmills-collector:
    additionalVolumes:
      - name: enrichment-files
        persistentVolumeClaim:
          claimName: enrichment-files
    additionalVolumeMounts:
      - name: enrichment-files
        mountPath: /mnt/efs
        readOnly: true  # Optional: set to true if files should be read-only
```

Then run:

```bash theme={null}
helm upgrade --install sawmills-remote-operator \
  oci://public.ecr.aws/s7a5m1b4/sawmills-remote-operator-chart \
  --version 2.0.18 \
  --namespace <your-namespace> \
  --set apiKeyExistingSecret=sawmills-secret \
  --set operatorAddress=https://controller.ue1.prod.plat.sm-svc.com \
  --set collectorName="my-colllector" \
  --set image.tag="0.189.0" \
  --set remoteWriteUrl="https://ingress.sawmills.ai" \
  --set selfManaged=false \
  -f values-efs.yaml
```

### Verification

After deployment, verify the EFS mount is working:

```bash theme={null}
# Check if pods are running
kubectl get pods -n <your-namespace> -l app.kubernetes.io/name=sawmills-collector

# Verify the volume mount in a pod
kubectl exec -it deployment/sawmills-collector -n <your-namespace> -c main-collector -- ls -la /mnt/efs

# Check pod describe for volume mounts
kubectl describe pod -n <your-namespace> -l app.kubernetes.io/name=sawmills-collector | grep -A5 "Mounts:"
```

### Lookup Processor Configuration

Once the EFS volume is mounted, configure the `csvenrichmentprocessor` in your collector config to use the mounted files:

```yaml theme={null}
processors:
  csvenrichmentprocessor:
    source:
      file:
        path: /mnt/efs/enrichment-files-service.csv
    lookup:
      key: sawmills.service  # Supports nested paths (e.g., attributes["sawmills"]["service"])
      target: attributes
      csv_key_column: service_name
    enrichment:
      target: attributes
      prefix: ""  # Optional prefix for enriched attributes
      on_conflict: overwrite  # or "skip"
```

### Troubleshooting

#### PVC Not Bound

```bash theme={null}
kubectl describe pvc enrichment-files -n <your-namespace>
```

Check that:

* The StorageClass exists
* The EFS CSI driver is installed
* The fileSystemId is correct

#### Mount Permission Denied

Ensure the EFS access point has correct permissions:

* `directoryPerms: "0777"` in StorageClass
* Security groups allow NFS traffic (port 2049)

#### File Not Found in Pod

```bash theme={null}
# List files in the EFS mount
kubectl exec -it deployment/sawmills-collector -n <your-namespace> -c main-collector -- ls -la /mnt/efs

# Check if the file exists
kubectl exec -it deployment/sawmills-collector -n <your-namespace> -c main-collector -- cat /mnt/efs/enrichment-files-service.csv | head -5
```

### Architecture

```
+-------------------------------------------------------------+
|                  Kubernetes Cluster                         |
|                                                             |
|  +------------------+     +------------------------------+  |
|  |   StorageClass   |     |        EFS (AWS)             |  |
|  |  efs-rwx-...     |---->|  fs-062ce7e15a1f260c0        |  |
|  +------------------+     |  /k8s/sawmills-production-   |  |
|         |                 |   enrichment/                |  |
|         |                 +------------------------------+  |
|         |                            |                      |
|         v                            |                      |
|  +------------------+                |                      |
|  |       PVC        |                |                      |
|  | enrichment-files |                |                      |
|  +------------------+                |                      |
|         |                            |                      |
|         |                            |                      |
|         v                            v                      |
|  +-------------------------------------------------------+  |
|  |            Sawmills Collector Pod                     |  |
|  |  +--------------------------------------------------+ |  |
|  |  |      main-collector container                    | |  |
|  |  |                                                  | |  |
|  |  |  /mnt/efs/ <--- EFS Volume Mount                 | |  |
|  |  |    +-- enrichment-files-service.csv              | |  |
|  |  |                                                  | |  |
|  |  |  csvenrichmentprocessor reads from:              | |  |
|  |  |    /mnt/efs/enrichment-files-service.csv         | |  |
|  |  +--------------------------------------------------+ |  |
|  +-------------------------------------------------------+  |
|                                                             |
+-------------------------------------------------------------+
```
