profile

Start Data Engineering

Build resilient & optimized pipelines to fast-track your DE career!


Hello Reader,

It's a tough market for data engineers right now. Companies expect a lot from their data engineers, and the hiring bar is exceptionally high. Take a look at some job postings, and you will find a variation of "advanced distributed data systems knowledge" as a requirement. But, the catch is that there is no single definition of what this means. Additionally, there is a perceived impact of LLMs on the job market.

However, it's not all lost; let's take a step back and consider what companies (more specifically, your leaders and interviewers) reward. The people you will report to and work with want you to make their lives easier.

> Want to impress your manager? Make them look good in front of their manager

> Want to impress your colleagues? Build systems to make their lives easy.

> Want to impress your interviewer? Show them that having you on their team will lower their workload

And so on.

But how do you do that? The most straightforward approach is to design easy-to-maintain pipelines.

Imagine your (current or future) colleague's joy when they don't have to toil away with breaking pipelines. What if you can show an interviewer that you will help their team by stabilizing their pipeline?

To do this, start from the first principles

  1. Make complex pipelines easy to understand
  2. Create idempotent pipelines
  3. Understand how distributed systems processing and storage works

If this resonates with you, I have something big coming up.

"Advanced Spark SQL for Data Engineers"

πŸ“…Date: June 28th

⌚Time: 1 PM - 5 PM EST ( 10 AM - 2 PM PST)

🏫 Format: Workshop style with exercises, & assignment

We'll cover when and how to use window functions, create idempotent pipelines, write clean and maintainable code with modern SQL functions, optimize Spark queries using the Spark UI and query planner, and explore data storage patterns for efficient data processing.

Registrations open on June 21st (I will send out an email); only a limited number of seats are available (due to this being in person).


I am also hosting a free workshop on Advanced JOIN and GROUP BY techniques in Spark SQL!

πŸ“… Date: June 21st, 2025

⌚Time: 1:00 PM - 2:00 PM EST (10:00 AM - 11:00 AM PST)

πŸ’» ​Where: YouTube Live Link​

πŸ’° ​Cost: FREE

β€‹πŸ« Format: Hands-on coding workshop with live Q&A

​Click here to stop receiving workshop launch emails​

Regards,

Joseph Machado

​startdataengineering.com​

Start Data Engineering

Over the last decade, I've built highly scalable distributed data platforms and helped companies scale to processing multiple exabytes of data. My mission is to bring software practices followed by top tech companies to data engineering and help data engineers level up. I help data engineers land high paying tech jobs and significantly up skill themselves.

Share this page