Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.sawmills.ai/llms.txt

Use this file to discover all available pages before exploring further.

This guide walks through setting up a “send a copy to S3, query it later in Splunk when needed” flow. The shape is identical to the Datadog Logs Rehydration pattern: the Sawmills collector dual-routes logs — live traffic continues to your Splunk HEC destination for alerting and dashboards, while a full-fidelity copy lands in your own S3 bucket as hourly-partitioned .json.gz. When you need older data in Splunk, the Splunk Add-on for AWS pulls those S3 objects back into a Splunk index. This is the simplest way to support Splunk customers who want to reduce live-ingest cost without losing access to historical logs.
Sawmills writes the same archive format that Datadog’s Logs Rehydration consumes (newline-delimited JSON, gzipped, hourly partitions). The format works with Splunk Cloud and Splunk Enterprise; the only thing that changes per-customer is which Splunk read mechanism you wire up at the end.

Prerequisites

  • A Sawmills pipeline already shipping to a Splunk HEC destination.
  • An AWS account where you can create an S3 bucket and an IAM user.
  • Splunk Cloud or Splunk Enterprise admin access.

1. Add the AWS S3 destination in Sawmills

This is the archive side — Sawmills will write a copy of every log record to your S3 bucket alongside the live Splunk feed.
  1. Open your pipeline in Sawmills and click + Add Destination.
  2. Pick AWS S3.
  3. Fill in the form:
FieldValue
Namee.g., splunk-rehydration-archive
RegionThe AWS region of your bucket (e.g., us-east-1)
S3 BucketThe bucket name you created (e.g., acme-splunk-archive)
S3 PrefixA path inside the bucket, e.g., /splunk/archive
Role ARNLeave blank for static IAM keys, or set if you use cross-account assume-role
Output FormatNDJSON (.json.gz)
When you pick Output Format = NDJSON (.json.gz), Sawmills automatically:
  • Writes one JSON object per line (newline-delimited JSON), gzipped — the format Splunk’s Generic S3 input handles cleanly with no parsing config.
  • Partitions objects by hour with a key like dt=YYYYMMDD/hour=HH/archive_HHMMSS.NNNN.<random>.json.gz.
  • Locks Enabled Data Types to Logs only.
For the AWS credentials the collector uses to write to S3, follow the credentials section in the AWS S3 destination reference.
  1. Click Save, then deploy the pipeline.
After deploy, list the bucket to confirm objects are arriving:
aws s3 ls s3://acme-splunk-archive/splunk/archive/dt=$(date -u +%Y%m%d)/ --recursive
You should see hourly-partitioned archive_*.json.gz files appearing within a minute or two.

2. Create an IAM user for Splunk to read the archive

Splunk needs read access to the S3 bucket. The Splunk Add-on for AWS only accepts long-term IAM access keys (not STS temporary credentials), so create a dedicated IAM user with read-only access. The policy below grants Splunk just enough to (a) discover the bucket in its UI dropdown and (b) read objects from your archive bucket — nothing else.
aws iam create-user --user-name splunk-archive-reader

aws iam put-user-policy --user-name splunk-archive-reader \
  --policy-name S3ArchiveReadOnly \
  --policy-document '{
    "Version": "2012-10-17",
    "Statement": [
      {
        "Sid": "ListBucketsForSplunkUI",
        "Effect": "Allow",
        "Action": ["s3:ListAllMyBuckets", "s3:GetBucketLocation"],
        "Resource": "*"
      },
      {
        "Sid": "ReadArchiveBucket",
        "Effect": "Allow",
        "Action": ["s3:GetObject", "s3:ListBucket"],
        "Resource": [
          "arn:aws:s3:::acme-splunk-archive",
          "arn:aws:s3:::acme-splunk-archive/*"
        ]
      }
    ]
  }'

aws iam create-access-key --user-name splunk-archive-reader
The last command prints AccessKeyId (starts with AKIA…) and SecretAccessKey. Save both — AWS only shows the secret once. Store them in a secret manager.
The ListAllMyBuckets permission lets Splunk see every bucket name in the account, but data access is still scoped to the archive bucket only. If that’s not acceptable, you can omit ListAllMyBuckets and type the bucket name manually in the Splunk input form instead of selecting it from the dropdown.

3. Install the Splunk Add-on for AWS

In Splunk:
  1. Go to Apps → Find more apps.
  2. Search for Splunk Add-on for Amazon Web Services.
  3. Click Install and follow the prompts. (On Splunk Cloud you may need to log in with your splunk.com account when prompted.)

4. Add the AWS account in the add-on

  1. Go to Apps → Splunk Add-on for AWS → Configuration → Account → Add.
  2. Fill in:
FieldValue
NameA label (e.g., acme-archive)
Key IDThe AKIA… access key from step 2
Secret KeyThe matching secret access key
Region CategoryGlobal (unless your bucket is in GovCloud or China)
  1. Click Add. Splunk validates the credentials before saving.

5. Create the Generic S3 input

This is the read side — Splunk crawls the archive bucket and ingests files into a chosen index.
  1. Go to Apps → Splunk Add-on for AWS → Inputs → Create New Input → Custom Data Type → Generic S3.
  2. Fill in:
FieldValue
Namee.g., splunk-rehydration-archive
AWS AccountThe account you added in step 4
Assume RoleLeave empty
AWS RegionYour bucket’s region
S3 BucketThe archive bucket (e.g., acme-splunk-archive)
S3 Key PrefixA narrow prefix for the first run, e.g., splunk/archive/dt=20260504/ — broaden later
Polling Interval30 (default)
Initial Scan DatetimeA timestamp slightly before the oldest object you want to ingest, e.g., 2026-05-04T00:00:00Z
Sourcetype_json (auto-extracts JSON fields with no parsing config)
IndexThe Splunk index to ingest into (e.g., main)
  1. Click Add. The input begins polling; the first scan typically starts within a minute.

Open Search & Reporting and run:
index=main sourcetype=_json earliest=-1h
| head 20
Within a couple of minutes you should see records with auto-extracted fields: date, service, host, message, tags{}, and any attributes.* keys you had on the originating logs. To confirm a specific service or host:
index=main sourcetype=_json service="my-service" earliest=-1h
| stats count by host
If you see zero events:
  • Check Apps → Splunk Add-on for AWS → Inputs, click the input name, and look for status / errors.
  • Search the add-on’s internal logs:
    index=_internal source=*splunk_ta_aws* log_level=ERROR earliest=-1h
    
  • Verify the IAM keys are still valid by checking the Configuration → Account tab.

Other ways to read S3 archives from Splunk

The Generic S3 input is the most universal option — it works on every Splunk tier, needs no special licensing, and accepts the format Sawmills writes. The trade-off is that it physically ingests S3 objects into a Splunk index, counting against your daily ingest license. A few alternatives, in case they fit your environment better:
  • Federated Search for Amazon S3 (FSS3)Splunk Cloud on AWS only. Queries the archive in place via an AWS Glue table, never ingests it. Closest UX to Datadog Logs Rehydration. Requires a Data Scan Unit license entitlement and a Glue table over the bucket. Best for ad-hoc forensic searches over large archives where you don’t want to pay for ingest.
  • Splunk Cloud Data Manager → “Promote data from AWS S3”Splunk Cloud on AWS only. A first-party “true rehydration” feature that ingests a chosen prefix and time range into a dedicated infinite-retention index in one click. Per-job, not continuous.
  • SQS-based S3 inputsame Splunk Add-on for AWS, different input type. Replaces polling with S3 → SNS → SQS event notifications, giving you live-tail latency on new objects. Splunk publishes an open-source crawler that emits synthetic SQS messages so you can backfill historical objects through the same path. Recommended over the Generic S3 input for ongoing tail at scale.
  • AWS Lambda → Splunk HEC — for environments where neither the Splunk Add-on nor FSS3 is available. S3 → SQS → Lambda function that POSTs to the HEC endpoint. Most code you’d own, but it works against any Splunk instance that exposes HEC.
Pick the Generic S3 input first to validate the pipeline end-to-end. You can always layer FSS3 or SQS-based input on top of the same archive bucket later — the Sawmills-side configuration doesn’t change.