Supported Data Types
📘 LogsConfiguration
| Field | Type | Default | Required | Description |
|---|---|---|---|---|
| Name | String | none | true | Unique identifier within Sawmills. |
| Region | String | us-east-1 | true | AWS region for the table bucket. |
| Table Bucket ARN | String | none | true | ARN of the existing S3 Tables table bucket. |
| Namespace | String | none | true | Namespace inside the table bucket. |
| Table Name | String | none | true | Existing table name inside the namespace. |
Prerequisites
- An existing S3 Tables table. Sawmills does not create the table for you in v1.
- A Sawmills Collector instance with AWS access configured.
- S3 Tables REST control-plane access for the target table bucket.
- Table-scoped S3 Tables write access for the target table or table bucket, plus data-plane access to the managed warehouse bucket used by that table.
- If you will query the same table from Snowflake, create the Snowflake catalog integration and AWS IAM role before you depend on
AUTO_REFRESH.
Notes
- The current S3 Tables destination is logs-only.
- Snowflake reads from the same Iceberg table through the AWS Glue catalog path after the table is provisioned correctly.
Create The Table
For the current native writer, create an Iceberg table that matches the fixed Sawmills v1 contract exactly. Recommended naming:- Namespace:
telemetry_logs - Table name:
logs_service_hour_v1
Required schema
Use these 16 columns:| Column | Type | Notes |
|---|---|---|
ts | timestamp | Event timestamp in microseconds. |
ts_raw | string | Raw timestamp as text. |
service | string | Lowercased service name. |
status | string | Log status / severity. |
message_text | string | Rendered message body. |
host | string | Host name. |
source | string | Log source. |
trace_id | string | Trace ID when present. |
span_id | string | Span ID when present. |
schema_version | string | Current value is snowflake_v1. |
body_json_text | string | Optional JSON body payload. |
attributes_hot_text | string | Canonical JSON text. |
attributes_cold_text | string | Canonical JSON text. |
tags_hot_text | string | Canonical JSON text. |
tags_cold_text | string | Canonical JSON text. |
source_file | string | File / source marker. |
Required partition spec
Partition the table by:serviceusing theidentitytransformtsusing thehourtransform, with partition field namets_hour
Example AWS CLI create flow
The AWS CLI supports creating S3 Tables Iceberg tables directly. Save the metadata below ass3tables-schema.json and create the table with:
s3tables:CreateTables3tables:PutTableDatawhen you pass--metadata
AWS-side permissions
The collector needs:- S3 Tables REST access to the table bucket
- table-scoped write permissions such as
s3tables:PutTableDataon the target table or table bucket - any additional S3 Tables actions your provisioning path needs, such as
s3tables:GetTableduring validation or planning - direct S3 access to the managed warehouse bucket behind the table
- the AWS Glue catalog entry for the table
- Lake Formation
DESCRIBEandSELECTon the target table if Lake Formation is enforcing permissions
Example collector IAM
The collector needs both S3 Tables control-plane access and direct S3 data-plane access to the managed warehouse bucket for the table. Attach the AWS-managed policy:warehouseLocation value looks like:
Query From Snowflake
S3 Tables uses the AWS Glue Iceberg REST catalog path. The validation path that worked end to end used:- AWS IAM role for Snowflake SigV4 auth
- Snowflake catalog integration with
CATALOG_SOURCE = ICEBERG_REST - Snowflake Iceberg table object pointing at
CATALOG_TABLE_NAMEandCATALOG_NAMESPACE - explicit
GRANT USAGE ON INTEGRATION - explicit
ALTER ICEBERG TABLE ... SET AUTO_REFRESH = TRUE
1. Create the AWS IAM role Snowflake will assume
Create a role in your AWS account with permissions for Glue and, if Lake Formation is enabled, the matching Lake Formation table permissions. Permissions policy example:API_AWS_IAM_USER_ARN and API_AWS_EXTERNAL_ID.
Initial trust policy example:
2. Create the Snowflake catalog integration
UseICEBERG_REST, not the older GLUE catalog integration shape.
- In
REST_CONFIG, specifyWAREHOUSEand do not also specifyCATALOG_NAME. - The
WAREHOUSEformat is<aws-account-id>:s3tablescatalog/<table-bucket-name>.
API_AWS_IAM_USER_ARNAPI_AWS_EXTERNAL_IDREST_CONFIGREST_AUTHENTICATION
DESCRIBE INTEGRATION.
3. Grant Lake Formation permissions
If Lake Formation is enabled, grantDESCRIBE and SELECT on the table to the Snowflake AWS role you used in SIGV4_IAM_ROLE.
4. Create the Snowflake Iceberg table object
Create the database, schema, and query role first if they do not already exist:- the table object can exist while
AUTO_REFRESHis still off - the reader can stop on an integration-permission issue
- the exact fixes were
GRANT USAGE ON INTEGRATION ...andALTER ICEBERG TABLE ... SET AUTO_REFRESH = TRUE
5. Validate Snowflake refresh health
Run this before debugging the collector:executionStatecurrentSnapshotIdlastSnapshotTimependingSnapshotCountinvalidExecutionStateReason
executionState = RUNNING
GENERALIZED_PIPE_STOPPEDinvalidExecutionStateReasonmentioning integration access controlpendingSnapshotCountgrowing faster thanlastSnapshotTimeadvances
6. Query it
Example:- the table schema matches the 16-column contract above
- the partition spec is
service + hour(ts) AUTO_REFRESHis enabled on the Snowflake table objectGRANT USAGE ON INTEGRATION ...was applied to the role you are using- Lake Formation grants are present
- the Snowflake catalog integration trust relationship is valid
SYSTEM$AUTO_REFRESH_STATUS(...)is healthy
7. Production operating notes
For a production table, the main operational risk is usually Snowflake refresh lag rather than collector write correctness. Watch these regularly:SYSTEM$AUTO_REFRESH_STATUS(...)executionStatelastSnapshotTimependingSnapshotCount
executionState is healthy but pendingSnapshotCount keeps growing, Snowflake is falling behind the table’s snapshot churn. In that case:
- reduce commit frequency from writers if possible
- avoid unnecessary tiny commits
- verify the Snowflake reader role still has
USAGEon the integration - verify Lake Formation access has not drifted
executionState is not healthy, inspect invalidExecutionStateReason first. That field is usually the fastest path to the real issue.
AWS Credential Configuration
Sawmills Collector runs on Kubernetes. Prefer IAM-based access, such as IRSA on EKS, so the collector can reach both the S3 Tables API and the managed warehouse bucket. If you use a pod role or node role, make sure that role includes:AmazonS3TablesFullAccess- the inline managed-warehouse-bucket policy shown above