Athena Glue Service Logs

AWS Service Logs come in all different formats. Ideally they could all be queried in place by Athena and, while some can, for cost and performance reasons it can be better to convert the logs into partitioned Parquet files. The general approach is that for any given type of service log, we have Glue Jobs that can do the following: Create source tables in the Data Catalog Create destination tables in the Data Catalog Know how to convert the source data to partitioned, Parquet files Maintain new partitions for both tables This library was created as part of my role as a Big Data Architect and is available at awslabs/athena-glue-service-logs....

 · 1 min

Athena SQLite 💾

Athena SQLite is a project that allows you to query SQLite databases in S3 using Athena’s Federated Query functionality. Install it from the Serverless Application Repository: AthenaSQLiteConnector. Wait, what?! SQLite in S3? Yea! As quite possibly the most prevelant database in the world, it’s not unsual for me to have various SQLite files laying around. This Athena data connector allows you to query those databases directly from Athena. Cool! Right?! One of the fun things about this project is that SQLite is not intended to be a network database....

 · 1 min

Demo Code

I’m a fan of making the code I run in presentations publicly available. Most of the demo code I use is available in my demo-code repository on GitHub. In there, you’ll find such interesting things as: EMR CloudFormation Templates EMR on EKS notes EMR Studio notes And even a big data CDK stack for easy deployment of my demos ...

 · 1 min