How to rehydrate logs to Splunk from S3

This guide walks through setting up a “send a copy to S3, query it later in Splunk when needed” flow. The shape is identical to the Datadog Logs Rehydration pattern: the Sawmills collector dual-routes logs — live traffic continues to your Splunk HEC destination for alerting and dashboards, while a full-fidelity copy lands in your own S3 bucket as hourly-partitioned .json.gz. When you need older data in Splunk, the Splunk Add-on for AWS pulls those S3 objects back into a Splunk index. This is the simplest way to support Splunk customers who want to reduce live-ingest cost without losing access to historical logs.

Sawmills writes the same archive format that Datadog’s Logs Rehydration consumes (newline-delimited JSON, gzipped, hourly partitions). The format works with Splunk Cloud and Splunk Enterprise; the only thing that changes per-customer is which Splunk read mechanism you wire up at the end.

Prerequisites

A Sawmills pipeline already shipping to a Splunk HEC destination.
An AWS account where you can create an S3 bucket and an IAM user.
Splunk Cloud or Splunk Enterprise admin access.

1. Add the AWS S3 destination in Sawmills

This is the archive side — Sawmills will write a copy of every log record to your S3 bucket alongside the live Splunk feed.

Open your pipeline in Sawmills and click + Add Destination.
Pick AWS S3.
Fill in the form:

Field	Value
Name	e.g., `splunk-rehydration-archive`
Region	The AWS region of your bucket (e.g., `us-east-1`)
S3 Bucket	The bucket name you created (e.g., `acme-splunk-archive`)
S3 Prefix	A path inside the bucket, e.g., `/splunk/archive`
Role ARN	Leave blank for static IAM keys, or set if you use cross-account assume-role
Output Format	`NDJSON (.json.gz)`

When you pick Output Format = NDJSON (.json.gz), Sawmills automatically:

Writes one JSON object per line (newline-delimited JSON), gzipped — the format Splunk’s Generic S3 input handles cleanly with no parsing config.
Partitions objects by hour with a key like dt=YYYYMMDD/hour=HH/archive_HHMMSS.NNNN.<random>.json.gz.
Locks Enabled Data Types to Logs only.

For the AWS credentials the collector uses to write to S3, follow the credentials section in the AWS S3 destination reference.

Click Save, then deploy the pipeline.

After deploy, list the bucket to confirm objects are arriving:

aws s3 ls s3://acme-splunk-archive/splunk/archive/dt=$(date -u +%Y%m%d)/ --recursive

You should see hourly-partitioned archive_*.json.gz files appearing within a minute or two.

2. Create an IAM user for Splunk to read the archive

Splunk needs read access to the S3 bucket. The Splunk Add-on for AWS only accepts long-term IAM access keys (not STS temporary credentials), so create a dedicated IAM user with read-only access. The policy below grants Splunk just enough to (a) discover the bucket in its UI dropdown and (b) read objects from your archive bucket — nothing else.

aws iam create-user --user-name splunk-archive-reader

aws iam put-user-policy --user-name splunk-archive-reader \
  --policy-name S3ArchiveReadOnly \
  --policy-document '{
    "Version": "2012-10-17",
    "Statement": [
      {
        "Sid": "ListBucketsForSplunkUI",
        "Effect": "Allow",
        "Action": ["s3:ListAllMyBuckets", "s3:GetBucketLocation"],
        "Resource": "*"
      },
      {
        "Sid": "ReadArchiveBucket",
        "Effect": "Allow",
        "Action": ["s3:GetObject", "s3:ListBucket"],
        "Resource": [
          "arn:aws:s3:::acme-splunk-archive",
          "arn:aws:s3:::acme-splunk-archive/*"
        ]
      }
    ]
  }'

aws iam create-access-key --user-name splunk-archive-reader

The last command prints AccessKeyId (starts with AKIA…) and SecretAccessKey. Save both — AWS only shows the secret once. Store them in a secret manager.

The ListAllMyBuckets permission lets Splunk see every bucket name in the account, but data access is still scoped to the archive bucket only. If that’s not acceptable, you can omit ListAllMyBuckets and type the bucket name manually in the Splunk input form instead of selecting it from the dropdown.

3. Install the Splunk Add-on for AWS

In Splunk:

Go to Apps → Find more apps.
Search for Splunk Add-on for Amazon Web Services.
Click Install and follow the prompts. (On Splunk Cloud you may need to log in with your splunk.com account when prompted.)

4. Add the AWS account in the add-on

Go to Apps → Splunk Add-on for AWS → Configuration → Account → Add.
Fill in:

Field	Value
Name	A label (e.g., `acme-archive`)
Key ID	The `AKIA…` access key from step 2
Secret Key	The matching secret access key
Region Category	Global (unless your bucket is in GovCloud or China)

Click Add. Splunk validates the credentials before saving.

5. Create the Generic S3 input

This is the read side — Splunk crawls the archive bucket and ingests files into a chosen index.

Go to Apps → Splunk Add-on for AWS → Inputs → Create New Input → Custom Data Type → Generic S3.
Fill in:

Field	Value
Name	e.g., `splunk-rehydration-archive`
AWS Account	The account you added in step 4
Assume Role	Leave empty
AWS Region	Your bucket’s region
S3 Bucket	The archive bucket (e.g., `acme-splunk-archive`)
S3 Key Prefix	A narrow prefix for the first run, e.g., `splunk/archive/dt=20260504/` — broaden later
Polling Interval	`30` (default)
Initial Scan Datetime	A timestamp slightly before the oldest object you want to ingest, e.g., `2026-05-04T00:00:00Z`
Sourcetype	`_json` (auto-extracts JSON fields with no parsing config)
Index	The Splunk index to ingest into (e.g., `main`)

Click Add. The input begins polling; the first scan typically starts within a minute.

6. Verify rehydration in Splunk Search

Open Search & Reporting and run:

index=main sourcetype=_json earliest=-1h
| head 20

Within a couple of minutes you should see records with auto-extracted fields: date, service, host, message, tags{}, and any attributes.* keys you had on the originating logs. To confirm a specific service or host:

index=main sourcetype=_json service="my-service" earliest=-1h
| stats count by host

If you see zero events:

Check Apps → Splunk Add-on for AWS → Inputs, click the input name, and look for status / errors.

Search the add-on’s internal logs:

index=_internal source=*splunk_ta_aws* log_level=ERROR earliest=-1h

Verify the IAM keys are still valid by checking the Configuration → Account tab.

Overview

Sawmills Agent

Collectors

Pipelines

Processors

Live Logs

Data Insights

Analytics Dashboard

How To Guides

Integrations

CLI

MCP

Administration

How to rehydrate logs to Splunk from S3

Prerequisites

1. Add the AWS S3 destination in Sawmills

2. Create an IAM user for Splunk to read the archive

3. Install the Splunk Add-on for AWS

4. Add the AWS account in the add-on

5. Create the Generic S3 input

6. Verify rehydration in Splunk Search

Other ways to read S3 archives from Splunk

Overview

Sawmills Agent

Collectors

Pipelines

Processors

Live Logs

Data Insights

Analytics Dashboard

How To Guides

Integrations

CLI

MCP

Administration

Documentation Index

​Prerequisites

​1. Add the AWS S3 destination in Sawmills

​2. Create an IAM user for Splunk to read the archive

​3. Install the Splunk Add-on for AWS

​4. Add the AWS account in the add-on

​5. Create the Generic S3 input

​6. Verify rehydration in Splunk Search

​Other ways to read S3 archives from Splunk

Prerequisites

1. Add the AWS S3 destination in Sawmills

2. Create an IAM user for Splunk to read the archive

3. Install the Splunk Add-on for AWS

4. Add the AWS account in the add-on

5. Create the Generic S3 input

6. Verify rehydration in Splunk Search

Other ways to read S3 archives from Splunk