PinnedDuckDB Beyond the HypeA Powerful Addition to the Data Scientist’s and Data Engineer’s ToolboxSep 181Sep 181
PinnedOpen Source Data Engineering Landscape 2024Exploration of the open source software in data engineering ecosystemFeb 416Feb 416
Building a High-Performance Data Pipeline Using DuckDBUsing DuckDB to Serialise, Transform, and Aggregate Data in Data LakesOct 202Oct 202
The History and Evolution of Open Table FormatsFrom Hive to High Performance: A Journey Through the Evolution of Data Management on Data LakesAug 23Aug 23
How to build a dual Incremental + snapshot data ingestion pipelineA useful batch data ingestion pattern for maximum data correctness and reliability as well as providing low latency accessOct 1, 2023Oct 1, 2023
Techniques For Periodically Extracting Data From Relational DatabasesPresenting techniques for extracting data from relational databases when building ETL pipelines for a data lake, DWH or data lakehouseSep 19, 2023Sep 19, 2023
Techniques for Managing Dependency Between Data PipelinesIt’s a common challenge to manage dependency between data pipelines on data-driven systems and analytical platforms which having data…Aug 29, 2023Aug 29, 2023
Internal Storage Design of Modern Key-value Database Engines [Part 1]Deep dive into physical storage design implemented by many modern popular key-value stores such as Amazon Dynamo DB, Apache Cassandra, RiakAug 14, 2023Aug 14, 2023
Airflow callbacks to Slack notifications for DAG monitoring and alertingIn this post I’ll demonstrate the step by step guide to integrate Airflow workflows with Slack for notification and monitoring purpose. The…Jul 23, 20232Jul 23, 20232
Published inTowards DevAdding Custom Country Map to Apache SupersetIn this post I demonstrate the steps followed to add a custom country map to superset repository and rebuild the app.Jul 12, 20231Jul 12, 20231