S3 Tables Destination

Supported Data Types

📘 Logs

Configuration

Field	Type	Default	Required	Description
Name	String	none	true	Unique identifier within Sawmills.
Region	String	`us-east-1`	true	AWS region for the table bucket.
Table Bucket ARN	String	none	true	ARN of the existing S3 Tables table bucket.
Namespace	String	none	true	Namespace inside the table bucket.
Table Name	String	none	true	Existing table name inside the namespace.

Advanced tuning fields

These settings are hidden under advanced settings in the UI. For most tables, start with the defaults.

Field	Type	Default	Allowed Range	Description
`max_batch_compressed_bytes`	Integer	`134217728`	`1 MiB` to `512 MiB`	Collector-side batch size before records move into partition buffering.
`max_batch_age`	Duration	`5m`	`30s` to `30m`	Collector-side age cap before a batch is handed to the partition buffer.
`partition_buffer_compression`	String	`zstd`	`none`, `snappy`, `zstd`	Compression used for sealed in-memory partition-buffer chunks.
`partition_buffer_chunk_target_bytes`	Integer	`16777216`	`1 MiB` to `64 MiB`	Chunk size before the partition buffer seals and compresses buffered payloads.
`target_partition_file_bytes`	Integer	`134217728`	`8 MiB` to `512 MiB`	Main efficiency knob for larger parquet files and fewer Iceberg snapshots.
`max_partition_file_age`	Duration	`15m`	`1m` to `2h`	Freshness cap for sparse partitions. Lower it when low-volume services need faster visibility.
`max_open_partitions`	Integer	`10000`	`1` to `100000`	Protects memory under high service fanout by forcing older partitions to flush earlier.
`max_buffered_partition_bytes`	Integer	`2147483648`	`64 MiB` to `16 GiB`	Global in-memory cap across all buffered partitions.
`max_records_per_partition_file`	Integer	`0`	`0` or positive integer	Escape hatch for very wide rows. Keep `0` unless row width forces an explicit record cap.

Prerequisites

An existing S3 Tables table. Sawmills does not create the table for you in v1.
A Sawmills Collector instance with AWS access configured.
S3 Tables REST control-plane access for the target table bucket.
Table-scoped S3 Tables write access for the target table or table bucket, plus data-plane access to the managed warehouse bucket used by that table.
If you will query the same table from Snowflake, create the Snowflake catalog integration and AWS IAM role before you depend on AUTO_REFRESH.

Notes

The current S3 Tables destination is logs-only.
Partitioning is fixed to service + ts_hour for this exporter path.
Existing tables must already match the required schema and partition contract.
The advanced knobs trade freshness for larger files, fewer commits, and slower Iceberg metadata growth.
Snowflake reads from the same Iceberg table through the AWS Glue catalog path after the table is provisioned correctly.

Tuning guidance

If metadata grows too fast or S3 Tables starts creating many tiny files, raise target_partition_file_bytes first.
If exporter memory is the main problem, keep partition_buffer_compression=zstd and raise partition_buffer_chunk_target_bytes before shrinking file targets.
If sparse services take too long to arrive, lower max_partition_file_age before shrinking file-size targets.
If memory usage grows under high-cardinality service fanout, lower max_open_partitions or max_buffered_partition_bytes.
If your rows are unusually wide, set max_records_per_partition_file to a positive value instead of aggressively shrinking byte targets.

Efficiency Metrics

The collector now emits additional metrics for this destination:

s3tablesexporter.open_partitions
s3tablesexporter.buffered_partition_bytes
s3tablesexporter.compressed_buffered_partition_bytes
s3tablesexporter.partition_buffer_compression_ratio
s3tablesexporter.partition_buffer_compression_duration_milliseconds
s3tablesexporter.partition_buffer_decompression_duration_milliseconds
s3tablesexporter.flush_size
s3tablesexporter.file_size
s3tablesexporter.files_per_commit
s3tablesexporter.flush_latency_milliseconds
s3tablesexporter.upload_latency_milliseconds
s3tablesexporter.commit_latency_milliseconds
s3tablesexporter.commit_conflicts
s3tablesexporter.forced_partition_flushes
s3tablesexporter.unknown_commit_outcomes

Create The Table

For the current native writer, create an Iceberg table that matches the fixed Sawmills v1 contract exactly. Recommended naming:

Namespace: telemetry_logs
Table name: logs_service_hour_v1

Required schema

Use these 16 columns:

Column	Type	Notes
`ts`	`timestamp`	Event timestamp in microseconds.
`ts_raw`	`string`	Raw timestamp as text.
`service`	`string`	Lowercased service name.
`status`	`string`	Log status / severity.
`message_text`	`string`	Rendered message body.
`host`	`string`	Host name.
`source`	`string`	Log source.
`trace_id`	`string`	Trace ID when present.
`span_id`	`string`	Span ID when present.
`schema_version`	`string`	Current value is `snowflake_v1`.
`body_json_text`	`string`	Optional JSON body payload.
`attributes_hot_text`	`string`	Canonical JSON text.
`attributes_cold_text`	`string`	Canonical JSON text.
`tags_hot_text`	`string`	Canonical JSON text.
`tags_cold_text`	`string`	Canonical JSON text.
`source_file`	`string`	File / source marker.

Required partition spec

Partition the table by:

service using the identity transform
ts using the hour transform, with partition field name ts_hour

Example AWS CLI create flow

The AWS CLI supports creating S3 Tables Iceberg tables directly. Save the metadata below as s3tables-schema.json and create the table with:

aws s3tables create-table \
  --region us-east-1 \
  --table-bucket-arn arn:aws:s3tables:us-east-1:123456789012:bucket/your-table-bucket \
  --namespace telemetry_logs \
  --name logs_service_hour_v1 \
  --format ICEBERG \
  --metadata file://s3tables-schema.json

This command requires at least:

s3tables:CreateTable
s3tables:PutTableData when you pass --metadata

Example metadata file:

{
  "iceberg": {
    "schema": {
      "fields": [
        { "id": 1, "name": "ts", "type": "timestamp", "required": false },
        { "id": 2, "name": "ts_raw", "type": "string", "required": false },
        { "id": 3, "name": "service", "type": "string", "required": false },
        { "id": 4, "name": "status", "type": "string", "required": false },
        { "id": 5, "name": "message_text", "type": "string", "required": false },
        { "id": 6, "name": "host", "type": "string", "required": false },
        { "id": 7, "name": "source", "type": "string", "required": false },
        { "id": 8, "name": "trace_id", "type": "string", "required": false },
        { "id": 9, "name": "span_id", "type": "string", "required": false },
        { "id": 10, "name": "schema_version", "type": "string", "required": false },
        { "id": 11, "name": "body_json_text", "type": "string", "required": false },
        { "id": 12, "name": "attributes_hot_text", "type": "string", "required": false },
        { "id": 13, "name": "attributes_cold_text", "type": "string", "required": false },
        { "id": 14, "name": "tags_hot_text", "type": "string", "required": false },
        { "id": 15, "name": "tags_cold_text", "type": "string", "required": false },
        { "id": 16, "name": "source_file", "type": "string", "required": false }
      ]
    },
    "partitionSpec": {
      "specId": 0,
      "fields": [
        { "sourceId": 3, "transform": "identity", "name": "service", "fieldId": 1000 },
        { "sourceId": 1, "transform": "hour", "name": "ts_hour", "fieldId": 1001 }
      ]
    }
  }
}

AWS-side permissions

The collector needs:

S3 Tables REST access to the table bucket
table-scoped write permissions such as s3tables:PutTableData on the target table or table bucket
any additional S3 Tables actions your provisioning path needs, such as s3tables:GetTable during validation or planning
direct S3 access to the managed warehouse bucket behind the table

If you plan to query the same table from Snowflake through the Glue catalog path, you also need to grant the Snowflake side access to:

the AWS Glue catalog entry for the table
Lake Formation DESCRIBE and SELECT on the target table if Lake Formation is enforcing permissions

Example collector IAM

The collector needs both S3 Tables control-plane access and direct S3 data-plane access to the managed warehouse bucket for the table. Attach the AWS-managed policy:

{
  "PolicyName": "AmazonS3TablesFullAccess",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3tables:*"
      ],
      "Resource": "*"
    }
  ]
}

Then add an inline policy for the managed warehouse bucket:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "WarehouseBucket",
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::<managed-warehouse-bucket-name>"
      ]
    },
    {
      "Sid": "WarehouseObjects",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:AbortMultipartUpload"
      ],
      "Resource": [
        "arn:aws:s3:::<managed-warehouse-bucket-name>/*"
      ]
    }
  ]
}

You can discover the managed warehouse bucket after table creation:

aws s3tables get-table \
  --region us-east-1 \
  --table-bucket-arn arn:aws:s3tables:us-east-1:<aws-account-id>:bucket/<table-bucket-name> \
  --namespace telemetry_logs \
  --name logs_service_hour_v1 \
  --query '{tableARN:tableARN,metadataLocation:metadataLocation,warehouseLocation:warehouseLocation}' \
  --output json

The warehouseLocation value looks like:

s3://<managed-warehouse-bucket-name>

Use that bucket name in the inline IAM policy above.

Query From Snowflake

S3 Tables uses the AWS Glue Iceberg REST catalog path. The validation path that worked end to end used:

AWS IAM role for Snowflake SigV4 auth
Snowflake catalog integration with CATALOG_SOURCE = ICEBERG_REST
Snowflake Iceberg table object pointing at CATALOG_TABLE_NAME and CATALOG_NAMESPACE
explicit GRANT USAGE ON INTEGRATION
explicit ALTER ICEBERG TABLE ... SET AUTO_REFRESH = TRUE

1. Create the AWS IAM role Snowflake will assume

Create a role in your AWS account with permissions for Glue and, if Lake Formation is enabled, the matching Lake Formation table permissions. Permissions policy example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "GlueRead",
      "Effect": "Allow",
      "Action": [
        "glue:GetCatalog",
        "glue:GetDatabase",
        "glue:GetDatabases",
        "glue:GetTable",
        "glue:GetTables",
        "glue:GetTableVersion",
        "glue:GetTableVersions",
        "glue:GetPartition",
        "glue:GetPartitions"
      ],
      "Resource": "*"
    }
  ]
}

You can create the role first with a placeholder trust relationship, then tighten it after Snowflake returns the final API_AWS_IAM_USER_ARN and API_AWS_EXTERNAL_ID. Initial trust policy example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<snowflake-aws-account-id>:user/<snowflake-api-user>"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "<snowflake-external-id>"
        }
      }
    }
  ]
}

2. Create the Snowflake catalog integration

Use ICEBERG_REST, not the older GLUE catalog integration shape.

USE ROLE ACCOUNTADMIN;

CREATE OR REPLACE CATALOG INTEGRATION MY_S3TABLES_GLUE_REST_INT
  CATALOG_SOURCE = ICEBERG_REST
  TABLE_FORMAT = ICEBERG
  CATALOG_NAMESPACE = 'telemetry_logs'
  REFRESH_INTERVAL_SECONDS = 30
  REST_CONFIG = (
    CATALOG_URI = 'https://glue.us-east-1.amazonaws.com/iceberg'
    CATALOG_API_TYPE = AWS_GLUE
    WAREHOUSE = '<aws-account-id>:s3tablescatalog/<table-bucket-name>'
    ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS
  )
  REST_AUTHENTICATION = (
    TYPE = SIGV4
    SIGV4_IAM_ROLE = 'arn:aws:iam::<aws-account-id>:role/<snowflake-s3tables-role>'
    SIGV4_SIGNING_REGION = 'us-east-1'
  )
  ENABLED = TRUE;

Important:

In REST_CONFIG, specify WAREHOUSE and do not also specify CATALOG_NAME.
The WAREHOUSE format is <aws-account-id>:s3tablescatalog/<table-bucket-name>.

Immediately inspect the integration:

DESCRIBE INTEGRATION MY_S3TABLES_GLUE_REST_INT;

Record these properties:

API_AWS_IAM_USER_ARN
API_AWS_EXTERNAL_ID
REST_CONFIG
REST_AUTHENTICATION

Then update the AWS IAM role trust policy so the principal and external ID exactly match the values returned by DESCRIBE INTEGRATION.

3. Grant Lake Formation permissions

If Lake Formation is enabled, grant DESCRIBE and SELECT on the table to the Snowflake AWS role you used in SIGV4_IAM_ROLE.

aws lakeformation grant-permissions \
  --region us-east-1 \
  --principal DataLakePrincipalIdentifier=arn:aws:iam::<aws-account-id>:role/<snowflake-s3tables-role> \
  --resource '{
    "Table": {
      "CatalogId": "<aws-account-id>",
      "DatabaseName": "telemetry_logs",
      "Name": "logs_service_hour_v1"
    }
  }' \
  --permissions DESCRIBE SELECT

If your Snowflake environment uses a Snowflake-managed AWS IAM user for the integration handshake, grant the same permissions to that user too:

aws lakeformation grant-permissions \
  --region us-east-1 \
  --principal DataLakePrincipalIdentifier=arn:aws:iam::<snowflake-aws-account-id>:user/<snowflake-api-user> \
  --resource '{
    "Table": {
      "CatalogId": "<aws-account-id>",
      "DatabaseName": "telemetry_logs",
      "Name": "logs_service_hour_v1"
    }
  }' \
  --permissions DESCRIBE SELECT

4. Create the Snowflake Iceberg table object

Create the database, schema, and query role first if they do not already exist:

USE ROLE ACCOUNTADMIN;

CREATE DATABASE IF NOT EXISTS OBSERVABILITY;
CREATE SCHEMA IF NOT EXISTS OBSERVABILITY.ICEBERG;
CREATE ROLE IF NOT EXISTS OBSERVABILITY_ICEBERG_READER;

GRANT USAGE ON DATABASE OBSERVABILITY TO ROLE OBSERVABILITY_ICEBERG_READER;
GRANT USAGE ON SCHEMA OBSERVABILITY.ICEBERG TO ROLE OBSERVABILITY_ICEBERG_READER;

Then create the Iceberg table object:

USE ROLE ACCOUNTADMIN;
USE DATABASE OBSERVABILITY;
USE SCHEMA ICEBERG;

CREATE OR REPLACE ICEBERG TABLE LOGS_SERVICE_HOUR_V1
  CATALOG = 'MY_S3TABLES_GLUE_REST_INT'
  CATALOG_TABLE_NAME = 'logs_service_hour_v1'
  CATALOG_NAMESPACE = 'telemetry_logs';

If another role will own or query this table, grant it integration usage explicitly:

GRANT USAGE ON INTEGRATION MY_S3TABLES_GLUE_REST_INT TO ROLE OBSERVABILITY_ICEBERG_READER;
GRANT SELECT ON TABLE OBSERVABILITY.ICEBERG.LOGS_SERVICE_HOUR_V1 TO ROLE OBSERVABILITY_ICEBERG_READER;

Then switch to the role that will operate on the table and enable refresh:

USE ROLE OBSERVABILITY_ICEBERG_READER;

ALTER ICEBERG TABLE OBSERVABILITY.ICEBERG.LOGS_SERVICE_HOUR_V1
  SET AUTO_REFRESH = TRUE;

Why document it this way:

the table object can exist while AUTO_REFRESH is still off
the reader can stop on an integration-permission issue
the exact fixes were GRANT USAGE ON INTEGRATION ... and ALTER ICEBERG TABLE ... SET AUTO_REFRESH = TRUE

5. Validate Snowflake refresh health

Run this before debugging the collector:

SELECT SYSTEM$AUTO_REFRESH_STATUS(
  'OBSERVABILITY.ICEBERG.LOGS_SERVICE_HOUR_V1'
);

Read these fields carefully:

executionState
currentSnapshotId
lastSnapshotTime
pendingSnapshotCount
invalidExecutionStateReason

Healthy example:

executionState = RUNNING

Common failure patterns:

GENERALIZED_PIPE_STOPPED
invalidExecutionStateReason mentioning integration access control
pendingSnapshotCount growing faster than lastSnapshotTime advances

6. Query it

Example:

USE ROLE OBSERVABILITY_ICEBERG_READER;

SELECT
  service,
  message_text,
  schema_version,
  ts_raw
FROM OBSERVABILITY.ICEBERG.LOGS_SERVICE_HOUR_V1
WHERE service = 'your_service_name'
ORDER BY ts DESC
LIMIT 100;

If this query fails while the collector is writing successfully, re-check:

the table schema matches the 16-column contract above
the partition spec is service + hour(ts)
AUTO_REFRESH is enabled on the Snowflake table object
GRANT USAGE ON INTEGRATION ... was applied to the role you are using
Lake Formation grants are present
the Snowflake catalog integration trust relationship is valid
SYSTEM$AUTO_REFRESH_STATUS(...) is healthy

7. Production operating notes

For a production table, the main operational risk is usually Snowflake refresh lag rather than collector write correctness. Watch these regularly:

SYSTEM$AUTO_REFRESH_STATUS(...)
executionState
lastSnapshotTime
pendingSnapshotCount

If executionState is healthy but pendingSnapshotCount keeps growing, Snowflake is falling behind the table’s snapshot churn. In that case:

reduce commit frequency from writers if possible
avoid unnecessary tiny commits
verify the Snowflake reader role still has USAGE on the integration
verify Lake Formation access has not drifted

If executionState is not healthy, inspect invalidExecutionStateReason first. That field is usually the fastest path to the real issue.

AWS Credential Configuration

Sawmills Collector runs on Kubernetes. Prefer IAM-based access, such as IRSA on EKS, so the collector can reach both the S3 Tables API and the managed warehouse bucket. If you use a pod role or node role, make sure that role includes:

AmazonS3TablesFullAccess
the inline managed-warehouse-bucket policy shown above

Only use static credentials if your environment cannot provide a workload identity.

extraEnvs:
  - name: AWS_ACCESS_KEY_ID
    value: "<YOUR AWS ACCESS KEY>"
  - name: AWS_SECRET_ACCESS_KEY
    value: "<YOUR AWS SECRET ACCESS KEY>"

​Supported Data Types

​Configuration

​Advanced tuning fields

​Prerequisites

​Notes

​Tuning guidance

​Efficiency Metrics

​Create The Table

​Required schema

​Required partition spec

​Example AWS CLI create flow

​AWS-side permissions

​Example collector IAM

​Query From Snowflake

​1. Create the AWS IAM role Snowflake will assume

​2. Create the Snowflake catalog integration

​3. Grant Lake Formation permissions

​4. Create the Snowflake Iceberg table object

​5. Validate Snowflake refresh health

​6. Query it

​7. Production operating notes

​AWS Credential Configuration