Exploring Sym Logs Using AWS Athena

If you're using Kinesis Firehose and S3 to store your Sym event logs, Athena is a helpful tool for parsing and viewing.

Overview

Sym lets you easily configure an S3 Log Destination for archiving Sym activity for audit and compliance purposes in a reliable, cost-effective manner. This guide walks you through how to set up AWS Athena to report on your Sym data in S3.

Sym Configuration

  1. Refer to our docs for configuring an AWS Kinesis Firehose and S3 bucket to add an S3 Log Destination to your Sym Environment.
  2. Once the log destination is set up, you should start to see log data in the bucket in this format:
    s3://bucket/prefix/yyyy/MM/dd/HH/SymS3ReportingLogsMain-date-uuid

Athena configuration

  1. If this is your first time working with Athena in this AWS account, create a database using the “Getting started” instructions: https://docs.aws.amazon.com/athena/latest/ug/getting-started.html
  2. If you have an existing Athena setup, you can use the query console and run: CREATE DATABASE sym_audit
  3. Now to create the table. This DDL depends on the name of the S3 bucket where your Firehose data is being output to:
CREATE EXTERNAL TABLE IF NOT EXISTS sym_events (
      ID STRING,
      SRN STRING,
      Flow STRING,
      Event STRING,
      Status STRING,
      Reason STRING,
      Requester STRING,
      Approver String,
      Denier String,
      Duration STRING,
      Target STRING,
      ErrorCode STRING
    )
    PARTITIONED BY (
     day STRING,
     hour INT
    )
    ROW FORMAT SERDE 'com.amazon.ionhiveserde.IonHiveSerDe'
    WITH SERDEPROPERTIES (
     'ion.srn.path_extractor' = '(run srn)',
     'ion.flow.path_extractor' = '(run flow)',
     'ion.event.path_extractor' = '(event type)',
     'ion.status.path_extractor' = '(state status)',
     'ion.reason.path_extractor' = '(fields reason)',
     'ion.requester.path_extractor' = '(run actors request username)',
     'ion.approver.path_extractor' = '(run actors approve username)',
     'ion.denier.path_extractor' = '(run actors deny username)',
     'ion.duration.path_extractor' = '(fields duration)',
     'ion.target.path_extractor' = '(fields target name)',
     'ion.errorcode.path_extractor' = '(state errors 0 code)'
    )
    STORED AS ION
    LOCATION 's3://sym-firehose-logs-main-ACCOUNTID/'
    TBLPROPERTIES (
     'projection.enabled' = 'true',
     'projection.day.type' = 'date',
     'projection.day.format' = 'yyyy/MM/dd',
     'projection.day.range' = '2021/01/01,NOW',
     'projection.day.interval' = '1',
     'projection.day.interval.unit' = 'DAYS',
     'projection.hour.type' = 'integer',
     'projection.hour.range' = '0,23',
     'projection.hour.digits' = '2',
     'storage.location.template' = 's3://sym-firehose-logs-main-ACCOUNTID/${day}/${hour}/'
    )

More details on the sym_events Athena Table

ION SerDe
The Athena table we create uses a SerDe (Serializer/Deserializer) to parse the JSON data in our S3 bucket and turn it into tabular data that is suitable for reporting.

Kinesis does not output JSON log objects on separate lines, so our table uses the Apache ION SerDE. ION is a superset of the JSON SerDe options that supports files where there is no newline separator between JSON objects.

In our example, we are using ION Path Extractors to map nested JSON properties to columns of our table.

Partition Projection
AWS provides guidance on how to partition data by date using the Kinesis Firehose output prefixes. Our table uses separate columns for Day and Time. You can query on date ranges such as WHERE DAY >= '2022/02/01'.

Running Reports

Once your table is set up, you can run queries for specific date ranges or users, such as:

SELECT *
FROM sym_events
WHERE DAY >= '2023/01/01'
AND DAY < '2023/02/01' 
AND APPROVER = '[email protected]'

Next steps

Downloading results

If you need local copies of your queries for compliance or other long-term use, you can download your query results from the Athena console in CSV format.

Manage your query in Terraform

If you'd like to manage your Athena query in Terraform, AWS provides a resource for a named Athena query, which you can use to manage your reporting queries as well as the create DDL if you want to:

resource "aws_athena_named_query" "foo" {
  name      = "bar"
  workgroup = aws_athena_workgroup.test.id
  database  = aws_athena_database.hoge.name
  query     = "<YOUR QUERY>"
}

Adding more fields to your query

Refer to our Reporting Overview for reference to any other fields you might want to include in your query.