AWS

I’ve had two roles at AWS – Big Data Architect && Developer Advocate. As part of both of those roles, I’ve created a few different projects on top of EMR, Glue, and Athena....

 · 1 min

Personal

I love building. I often whip up quick little utilities to make my life easier or sometimes just to have fun with a project....

 · 1 min

Athena Glue Service Logs

AWS Service Logs come in all different formats. Ideally they could all be queried in place by Athena and, while some can, for cost and performance reasons it can be better to convert the logs into partitioned Parquet files. The general approach is that for any given type of service log, we have Glue Jobs that can do the following: Create source tables in the Data Catalog Create destination tables in the Data Catalog Know how to convert the source data to partitioned, Parquet files Maintain new partitions for both tables This library was created as part of my role as a Big Data Architect and is available at awslabs/athena-glue-service-logs....

 · 1 min

Athena SQLite 💾

Athena SQLite is a project that allows you to query SQLite databases in S3 using Athena’s Federated Query functionality. Install it from the Serverless Application Repository: AthenaSQLiteConnector. Wait, what?! SQLite in S3? Yea! As quite possibly the most prevelant database in the world, it’s not unsual for me to have various SQLite files laying around. This Athena data connector allows you to query those databases directly from Athena. Cool! Right?! One of the fun things about this project is that SQLite is not intended to be a network database....

 · 1 min

Demo Code

I’m a fan of making the code I run in presentations publicly available. Most of the demo code I use is available in my demo-code repository on GitHub. In there, you’ll find such interesting things as: EMR CloudFormation Templates EMR on EKS notes EMR Studio notes And even a big data CDK stack for easy deployment of my demos ...

 · 1 min

Ideas 💡

I love building. I often whip up quick little utilities to make my life easier or sometimes just to have fun with a project. I believe ideas should be freely available - it’s the execution of an idea that turns it into something larger. As such, I try to document all the random ideas I have in my personal GitHub. Feel free to go check them out and let me know if you find anything interesting....

 · 1 min

BingDaily 🌄

If you weren’t aware, Bing.com has an awesome image of the day. Even better, they have a daily quiz(!) for every image. I like both a/ beautiful wallpapers and b/ mini-quizzes so I wrote a little macOS menubar app that updates the wallpaper daily with the Bing image of the day and also gives you a little link where you can take the quiz. Installation Either download from my GitHub releases or use Homebrew....

 · 1 min

Byteable Calc 🧮

What is it? A simple calculator geared towards converting data sizes. For example, I often need to do some conversions from bytes to something more human-readable: 585828 👇 572.10 KB You can even do math! Say we transferred 233033728 bytes in 7 days, we can divide by 7 to get the per-day number. 233033728/7 👇 910.29 KB Where can I find it? I’ve got a version anybody can use at dacort....

 · 1 min

Cargo Crates 🚚

All good ideas start with a Tweet. 😁 Perhaps a silly question, but does anybody have an ETL framework where they just spin up docker containers with different params? pic.twitter.com/rvDRVTD3gY — Damon Cortesi (@dacort) February 10, 2021 I’m often hopping around to different APIs and only want a slice of data. And I get frustrated with having to: Find the right API client Reimplement authentication Store credentials in different places Maintain it And so on, and so on....

 · 3 min

Log4j.us 🪵

I made this while working at Metabase. As the Head of Success Engineering, I was often tasked with helping customers reproduce various edge cases. As part of this, we often needed more logs. The classes I wanted to enable DEBUG logs for also often differed depending on the specific situation the customer was encountering. 💁 Did you know that log4j supports loading a configuration from an https URL? I didn’t....

 · 1 min