Browsing Tag
dataengineering
20 posts
Engineer’s Diary: Leaving Windows Behind and Building the ETL Engine I Always Dreamed Of, PardoX v0.1
Introduction: The Calm Before the Storm I write these lines as the hum of my laptop fades for…
🔥 Day 7: PySpark Joins, Unions, and GroupBy Guide
Welcome to Day 7 of your Spark Mastery journey! Today is one of the most practical days because…
Why Your Enterprise Data Platform Is No Longer Just for Analytics
Key Takeaways The relationship between data and applications is undergoing a fundamental shift. For decades, we’ve moved data…
Introducing ReelTrust: What if data engineering could solve our AI deepfakes problem?
TL;DR: This weekend I built ReelTrust a new type of video authentication software, which I hope others will…
Decommissioning the Dinosaur: A 4-Phase Playbook for Migrating Your Legacy Data Warehouse to Databricks
Let’s talk about the dinosaur in your server room. It’s not a fossil, but it might as well…
AI-Powered Data Engineering Pipelines: Smarter, Faster, Scalable
Ever wondered what happens when Artificial Intelligence meets Data Engineering? Answer: The pipeline gets a brain. In today’s…
What I Learned Cleaning 1 Million Rows of CSV Data Without Pandas
Cleaning a small CSV? Pandas is perfect. Cleaning up a million rows on a limited machine or using…
Introduction to Data Engineering Concepts |18| The Power of Dremio in the Modern Lakehouse
Free Resources Free Apache Iceberg Course Free Copy of “Apache Iceberg: The Definitive Guide” Free Copy of “Apache…
Pair and Transpose Adjacent Records within the Group - From SQL to SPL #13
Problem description & analysis: A certain table stores records of personnel from external sources entering and leaving a…
Study Notes 6.3-4: What is Kafka & Confluent Cloud
1. Introduction to Kafka in Stream Processing Context of Stream Processing: Stream processing involves continuously handling data as…