Skip to main content

Supported Data Types

📘 Logs

Configuration

FieldTypeDefaultRequiredDescription
NameStringnonetrueUnique identifier within Sawmills.
RegionStringus-east-1trueAWS region for the table bucket.
Table Bucket ARNStringnonetrueARN of the existing S3 Tables table bucket.
NamespaceStringnonetrueNamespace inside the table bucket.
Table NameStringnonetrueExisting table name inside the namespace.

Prerequisites

  • An existing S3 Tables table. Sawmills does not create the table for you in v1.
  • A Sawmills Collector instance with AWS access configured.
  • S3 Tables REST control-plane access for the target table bucket.
  • Table-scoped S3 Tables write access for the target table or table bucket, plus data-plane access to the managed warehouse bucket used by that table.
  • If you will query the same table from Snowflake, create the Snowflake catalog integration and AWS IAM role before you depend on AUTO_REFRESH.

Notes

  • The current S3 Tables destination is logs-only.
  • Snowflake reads from the same Iceberg table through the AWS Glue catalog path after the table is provisioned correctly.

Create The Table

For the current native writer, create an Iceberg table that matches the fixed Sawmills v1 contract exactly. Recommended naming:
  • Namespace: telemetry_logs
  • Table name: logs_service_hour_v1

Required schema

Use these 16 columns:
ColumnTypeNotes
tstimestampEvent timestamp in microseconds.
ts_rawstringRaw timestamp as text.
servicestringLowercased service name.
statusstringLog status / severity.
message_textstringRendered message body.
hoststringHost name.
sourcestringLog source.
trace_idstringTrace ID when present.
span_idstringSpan ID when present.
schema_versionstringCurrent value is snowflake_v1.
body_json_textstringOptional JSON body payload.
attributes_hot_textstringCanonical JSON text.
attributes_cold_textstringCanonical JSON text.
tags_hot_textstringCanonical JSON text.
tags_cold_textstringCanonical JSON text.
source_filestringFile / source marker.

Required partition spec

Partition the table by:
  • service using the identity transform
  • ts using the hour transform, with partition field name ts_hour

Example AWS CLI create flow

The AWS CLI supports creating S3 Tables Iceberg tables directly. Save the metadata below as s3tables-schema.json and create the table with:
aws s3tables create-table \
  --region us-east-1 \
  --table-bucket-arn arn:aws:s3tables:us-east-1:123456789012:bucket/your-table-bucket \
  --namespace telemetry_logs \
  --name logs_service_hour_v1 \
  --format ICEBERG \
  --metadata file://s3tables-schema.json
This command requires at least:
  • s3tables:CreateTable
  • s3tables:PutTableData when you pass --metadata
Example metadata file:
{
  "iceberg": {
    "schema": {
      "fields": [
        { "id": 1, "name": "ts", "type": "timestamp", "required": false },
        { "id": 2, "name": "ts_raw", "type": "string", "required": false },
        { "id": 3, "name": "service", "type": "string", "required": false },
        { "id": 4, "name": "status", "type": "string", "required": false },
        { "id": 5, "name": "message_text", "type": "string", "required": false },
        { "id": 6, "name": "host", "type": "string", "required": false },
        { "id": 7, "name": "source", "type": "string", "required": false },
        { "id": 8, "name": "trace_id", "type": "string", "required": false },
        { "id": 9, "name": "span_id", "type": "string", "required": false },
        { "id": 10, "name": "schema_version", "type": "string", "required": false },
        { "id": 11, "name": "body_json_text", "type": "string", "required": false },
        { "id": 12, "name": "attributes_hot_text", "type": "string", "required": false },
        { "id": 13, "name": "attributes_cold_text", "type": "string", "required": false },
        { "id": 14, "name": "tags_hot_text", "type": "string", "required": false },
        { "id": 15, "name": "tags_cold_text", "type": "string", "required": false },
        { "id": 16, "name": "source_file", "type": "string", "required": false }
      ]
    },
    "partitionSpec": {
      "specId": 0,
      "fields": [
        { "sourceId": 3, "transform": "identity", "name": "service", "fieldId": 1000 },
        { "sourceId": 1, "transform": "hour", "name": "ts_hour", "fieldId": 1001 }
      ]
    }
  }
}

AWS-side permissions

The collector needs:
  • S3 Tables REST access to the table bucket
  • table-scoped write permissions such as s3tables:PutTableData on the target table or table bucket
  • any additional S3 Tables actions your provisioning path needs, such as s3tables:GetTable during validation or planning
  • direct S3 access to the managed warehouse bucket behind the table
If you plan to query the same table from Snowflake through the Glue catalog path, you also need to grant the Snowflake side access to:
  • the AWS Glue catalog entry for the table
  • Lake Formation DESCRIBE and SELECT on the target table if Lake Formation is enforcing permissions

Example collector IAM

The collector needs both S3 Tables control-plane access and direct S3 data-plane access to the managed warehouse bucket for the table. Attach the AWS-managed policy:
{
  "PolicyName": "AmazonS3TablesFullAccess",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3tables:*"
      ],
      "Resource": "*"
    }
  ]
}
Then add an inline policy for the managed warehouse bucket:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "WarehouseBucket",
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::<managed-warehouse-bucket-name>"
      ]
    },
    {
      "Sid": "WarehouseObjects",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:AbortMultipartUpload"
      ],
      "Resource": [
        "arn:aws:s3:::<managed-warehouse-bucket-name>/*"
      ]
    }
  ]
}
You can discover the managed warehouse bucket after table creation:
aws s3tables get-table \
  --region us-east-1 \
  --table-bucket-arn arn:aws:s3tables:us-east-1:<aws-account-id>:bucket/<table-bucket-name> \
  --namespace telemetry_logs \
  --name logs_service_hour_v1 \
  --query '{tableARN:tableARN,metadataLocation:metadataLocation,warehouseLocation:warehouseLocation}' \
  --output json
The warehouseLocation value looks like:
s3://<managed-warehouse-bucket-name>
Use that bucket name in the inline IAM policy above.

Query From Snowflake

S3 Tables uses the AWS Glue Iceberg REST catalog path. The validation path that worked end to end used:
  • AWS IAM role for Snowflake SigV4 auth
  • Snowflake catalog integration with CATALOG_SOURCE = ICEBERG_REST
  • Snowflake Iceberg table object pointing at CATALOG_TABLE_NAME and CATALOG_NAMESPACE
  • explicit GRANT USAGE ON INTEGRATION
  • explicit ALTER ICEBERG TABLE ... SET AUTO_REFRESH = TRUE

1. Create the AWS IAM role Snowflake will assume

Create a role in your AWS account with permissions for Glue and, if Lake Formation is enabled, the matching Lake Formation table permissions. Permissions policy example:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "GlueRead",
      "Effect": "Allow",
      "Action": [
        "glue:GetCatalog",
        "glue:GetDatabase",
        "glue:GetDatabases",
        "glue:GetTable",
        "glue:GetTables",
        "glue:GetTableVersion",
        "glue:GetTableVersions",
        "glue:GetPartition",
        "glue:GetPartitions"
      ],
      "Resource": "*"
    }
  ]
}
You can create the role first with a placeholder trust relationship, then tighten it after Snowflake returns the final API_AWS_IAM_USER_ARN and API_AWS_EXTERNAL_ID. Initial trust policy example:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<snowflake-aws-account-id>:user/<snowflake-api-user>"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "<snowflake-external-id>"
        }
      }
    }
  ]
}

2. Create the Snowflake catalog integration

Use ICEBERG_REST, not the older GLUE catalog integration shape.
USE ROLE ACCOUNTADMIN;

CREATE OR REPLACE CATALOG INTEGRATION MY_S3TABLES_GLUE_REST_INT
  CATALOG_SOURCE = ICEBERG_REST
  TABLE_FORMAT = ICEBERG
  CATALOG_NAMESPACE = 'telemetry_logs'
  REFRESH_INTERVAL_SECONDS = 30
  REST_CONFIG = (
    CATALOG_URI = 'https://glue.us-east-1.amazonaws.com/iceberg'
    CATALOG_API_TYPE = AWS_GLUE
    WAREHOUSE = '<aws-account-id>:s3tablescatalog/<table-bucket-name>'
    ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS
  )
  REST_AUTHENTICATION = (
    TYPE = SIGV4
    SIGV4_IAM_ROLE = 'arn:aws:iam::<aws-account-id>:role/<snowflake-s3tables-role>'
    SIGV4_SIGNING_REGION = 'us-east-1'
  )
  ENABLED = TRUE;
Important:
  • In REST_CONFIG, specify WAREHOUSE and do not also specify CATALOG_NAME.
  • The WAREHOUSE format is <aws-account-id>:s3tablescatalog/<table-bucket-name>.
Immediately inspect the integration:
DESCRIBE INTEGRATION MY_S3TABLES_GLUE_REST_INT;
Record these properties:
  • API_AWS_IAM_USER_ARN
  • API_AWS_EXTERNAL_ID
  • REST_CONFIG
  • REST_AUTHENTICATION
Then update the AWS IAM role trust policy so the principal and external ID exactly match the values returned by DESCRIBE INTEGRATION.

3. Grant Lake Formation permissions

If Lake Formation is enabled, grant DESCRIBE and SELECT on the table to the Snowflake AWS role you used in SIGV4_IAM_ROLE.
aws lakeformation grant-permissions \
  --region us-east-1 \
  --principal DataLakePrincipalIdentifier=arn:aws:iam::<aws-account-id>:role/<snowflake-s3tables-role> \
  --resource '{
    "Table": {
      "CatalogId": "<aws-account-id>",
      "DatabaseName": "telemetry_logs",
      "Name": "logs_service_hour_v1"
    }
  }' \
  --permissions DESCRIBE SELECT
If your Snowflake environment uses a Snowflake-managed AWS IAM user for the integration handshake, grant the same permissions to that user too:
aws lakeformation grant-permissions \
  --region us-east-1 \
  --principal DataLakePrincipalIdentifier=arn:aws:iam::<snowflake-aws-account-id>:user/<snowflake-api-user> \
  --resource '{
    "Table": {
      "CatalogId": "<aws-account-id>",
      "DatabaseName": "telemetry_logs",
      "Name": "logs_service_hour_v1"
    }
  }' \
  --permissions DESCRIBE SELECT

4. Create the Snowflake Iceberg table object

Create the database, schema, and query role first if they do not already exist:
USE ROLE ACCOUNTADMIN;

CREATE DATABASE IF NOT EXISTS OBSERVABILITY;
CREATE SCHEMA IF NOT EXISTS OBSERVABILITY.ICEBERG;
CREATE ROLE IF NOT EXISTS OBSERVABILITY_ICEBERG_READER;

GRANT USAGE ON DATABASE OBSERVABILITY TO ROLE OBSERVABILITY_ICEBERG_READER;
GRANT USAGE ON SCHEMA OBSERVABILITY.ICEBERG TO ROLE OBSERVABILITY_ICEBERG_READER;
Then create the Iceberg table object:
USE ROLE ACCOUNTADMIN;
USE DATABASE OBSERVABILITY;
USE SCHEMA ICEBERG;

CREATE OR REPLACE ICEBERG TABLE LOGS_SERVICE_HOUR_V1
  CATALOG = 'MY_S3TABLES_GLUE_REST_INT'
  CATALOG_TABLE_NAME = 'logs_service_hour_v1'
  CATALOG_NAMESPACE = 'telemetry_logs';
If another role will own or query this table, grant it integration usage explicitly:
GRANT USAGE ON INTEGRATION MY_S3TABLES_GLUE_REST_INT TO ROLE OBSERVABILITY_ICEBERG_READER;
GRANT SELECT ON TABLE OBSERVABILITY.ICEBERG.LOGS_SERVICE_HOUR_V1 TO ROLE OBSERVABILITY_ICEBERG_READER;
Then switch to the role that will operate on the table and enable refresh:
USE ROLE OBSERVABILITY_ICEBERG_READER;

ALTER ICEBERG TABLE OBSERVABILITY.ICEBERG.LOGS_SERVICE_HOUR_V1
  SET AUTO_REFRESH = TRUE;
Why document it this way:
  • the table object can exist while AUTO_REFRESH is still off
  • the reader can stop on an integration-permission issue
  • the exact fixes were GRANT USAGE ON INTEGRATION ... and ALTER ICEBERG TABLE ... SET AUTO_REFRESH = TRUE

5. Validate Snowflake refresh health

Run this before debugging the collector:
SELECT SYSTEM$AUTO_REFRESH_STATUS(
  'OBSERVABILITY.ICEBERG.LOGS_SERVICE_HOUR_V1'
);
Read these fields carefully:
  • executionState
  • currentSnapshotId
  • lastSnapshotTime
  • pendingSnapshotCount
  • invalidExecutionStateReason
Healthy example:
  • executionState = RUNNING
Common failure patterns:
  • GENERALIZED_PIPE_STOPPED
  • invalidExecutionStateReason mentioning integration access control
  • pendingSnapshotCount growing faster than lastSnapshotTime advances

6. Query it

Example:
USE ROLE OBSERVABILITY_ICEBERG_READER;

SELECT
  service,
  message_text,
  schema_version,
  ts_raw
FROM OBSERVABILITY.ICEBERG.LOGS_SERVICE_HOUR_V1
WHERE service = 'your_service_name'
ORDER BY ts DESC
LIMIT 100;
If this query fails while the collector is writing successfully, re-check:
  • the table schema matches the 16-column contract above
  • the partition spec is service + hour(ts)
  • AUTO_REFRESH is enabled on the Snowflake table object
  • GRANT USAGE ON INTEGRATION ... was applied to the role you are using
  • Lake Formation grants are present
  • the Snowflake catalog integration trust relationship is valid
  • SYSTEM$AUTO_REFRESH_STATUS(...) is healthy

7. Production operating notes

For a production table, the main operational risk is usually Snowflake refresh lag rather than collector write correctness. Watch these regularly:
  • SYSTEM$AUTO_REFRESH_STATUS(...)
  • executionState
  • lastSnapshotTime
  • pendingSnapshotCount
If executionState is healthy but pendingSnapshotCount keeps growing, Snowflake is falling behind the table’s snapshot churn. In that case:
  • reduce commit frequency from writers if possible
  • avoid unnecessary tiny commits
  • verify the Snowflake reader role still has USAGE on the integration
  • verify Lake Formation access has not drifted
If executionState is not healthy, inspect invalidExecutionStateReason first. That field is usually the fastest path to the real issue.

AWS Credential Configuration

Sawmills Collector runs on Kubernetes. Prefer IAM-based access, such as IRSA on EKS, so the collector can reach both the S3 Tables API and the managed warehouse bucket. If you use a pod role or node role, make sure that role includes:
  • AmazonS3TablesFullAccess
  • the inline managed-warehouse-bucket policy shown above
Only use static credentials if your environment cannot provide a workload identity.
extraEnvs:
  - name: AWS_ACCESS_KEY_ID
    value: "<YOUR AWS ACCESS KEY>"
  - name: AWS_SECRET_ACCESS_KEY
    value: "<YOUR AWS SECRET ACCESS KEY>"