Browsing Tag
dataengineering
28 posts
Stop Naming Your Healthcare Columns Wrong — ISO-11179 Explained
If you’ve ever inherited a healthcare database with columns named DOB, PatientID, or CLAIM_NUMBER — this guide is…
How I cut Python JSON memory overhead from 1.9GB to ~0MB (11x Speedup)
The Problem: The “PyObject” TaxWe all love Python for its developer velocity, but for high-scale data engineering, the…
The data engineer’s Cortex Code cheat sheet
A practical guide to the commands, prompts, patterns, and habits that make Cortex Code useful in real data…
How I built a 39 compression pipeline with AES-256-GCM in Python (and why the dictionary is everything)
I store LLM training data. Every tool I found either compresses it or encrypts it — nothing did…
From Silent None to Insight: Debugging PySpark UDFs on AWS Glue with Decorators
Last month I was debugging a PySpark UDF that was silently returning None for about 2% of rows…
Building a Real-Time Data Pipeline: Streaming TCP Socket Data to PostgreSQL with Node.js
Real-time data streams are the lifeblood of many modern applications, ranging from financial market tickers to IoT sensor…
How I Redesigned a Failing Data Pipeline to Eliminate Cascading Failures
My client’s activity tracking system was breaking under load. During peak hours, employee activity submissions would time out,…
Data Engineering vs Data Science: What’s the Difference? (And Which Career Should You Choose?)
Understanding the distinction between these two crucial tech roles Data Engineers -build and maintain the infrastructure that makes…
Engineer’s Diary: Leaving Windows Behind and Building the ETL Engine I Always Dreamed Of, PardoX v0.1
Introduction: The Calm Before the Storm I write these lines as the hum of my laptop fades for…
🔥 Day 7: PySpark Joins, Unions, and GroupBy Guide
Welcome to Day 7 of your Spark Mastery journey! Today is one of the most practical days because…