AWS Blog Posts by Damon Cortesi

I’ve written or collaborated on many different blog posts while at AWS. This is a list of a few of them. Easily query AWS service logs using Amazon Athena - One of the first tools I built at AWS, was a set of Glue scripts to parse, process, and convert AWS service logs for VPC Flow Logs, CloudTrail, AWS Load Balancers, CloudFront, and S3 Access Logs. Announcing Amazon EMR Serverless (Preview): - Launch post for a new serverless service for EMR....

 · 1 min

Developer Experience tools for Amazon EMR

As part of my role as a developer advocate, I’ve created several different open source tools or integrations to make working with Amazon EMR easier for data engineers and other data wranglers. Amazon EMR CLI The EMR CLI is an open-source command-line interface that makes packaging, deploying, and running jobs across all EMR deployment models as simple as an emr run. The tool supports PySpark projects and automatically bundles the required dependencies in a consistent manner no matter whether you’re using EMR on EC2, EMR on EKS (coming soon), or EMR Serverless....

 · 2 min

Athena Glue Service Logs

AWS Service Logs come in all different formats. Ideally they could all be queried in place by Athena and, while some can, for cost and performance reasons it can be better to convert the logs into partitioned Parquet files. The general approach is that for any given type of service log, we have Glue Jobs that can do the following: Create source tables in the Data Catalog Create destination tables in the Data Catalog Know how to convert the source data to partitioned, Parquet files Maintain new partitions for both tables This library was created as part of my role as a Big Data Architect and is available at awslabs/athena-glue-service-logs....

 · 1 min

Athena SQLite 💾

Athena SQLite is a project that allows you to query SQLite databases in S3 using Athena’s Federated Query functionality. Install it from the Serverless Application Repository: AthenaSQLiteConnector. Wait, what?! SQLite in S3? Yea! As quite possibly the most prevelant database in the world, it’s not unsual for me to have various SQLite files laying around. This Athena data connector allows you to query those databases directly from Athena. Cool! Right?! One of the fun things about this project is that SQLite is not intended to be a network database....

 · 1 min

Demo Code

I’m a fan of making the code I run in presentations publicly available. Most of the demo code I use is available in my demo-code repository on GitHub. In there, you’ll find such interesting things as: EMR CloudFormation Templates EMR on EKS notes EMR Studio notes And even a big data CDK stack for easy deployment of my demos ...

 · 1 min