Browsing Tag
dataengineering
14 posts
What I Learned Cleaning 1 Million Rows of CSV Data Without Pandas
Cleaning a small CSV? Pandas is perfect. Cleaning up a million rows on a limited machine or using…
Introduction to Data Engineering Concepts |18| The Power of Dremio in the Modern Lakehouse
Free Resources Free Apache Iceberg Course Free Copy of “Apache Iceberg: The Definitive Guide” Free Copy of “Apache…
Pair and Transpose Adjacent Records within the Group - From SQL to SPL #13
Problem description & analysis: A certain table stores records of personnel from external sources entering and leaving a…
Study Notes 6.3-4: What is Kafka & Confluent Cloud
1. Introduction to Kafka in Stream Processing Context of Stream Processing: Stream processing involves continuously handling data as…
🤯 #NODES24: a practical path to Cloud-Native Knowledge Graph Automation & AI Agents
🤔 About IT system cartography Four years ago I spoke about how to cartography an information system with…
🚀 Beyond Data Ingestion: Advanced Strategies for Optimizing API Data Pipelines
In my previous blog, we explored how to build a dynamic and robust data ingestion pipeline using Azure…
OLAP (Online Analytical Processing)
OLAP (Online Analytical Processing) is a technology that enables analysts to extract and query data interactively from multidimensional…
Different file formats, a benchmark doing basic operations
Recently, I’ve been designing a data lake to store different types of data from various sources, catering to…
Introduction to Data Science
Data Science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and…
Big data models 📊 vs. Computer memory 💾
Data pipelines are the backbone of any data-intensive project. As datasets grow beyond memory size (“out-of-core”), handling them…