Posts

ELT approach for Data Pipelines

Sachin Sunkle

Introduction

While gathering data for Analytics, one often has to source data from multiple sources. Traditionally, the approach has been to do ETL (Extract-Transform-load) where,

  • Extract - typically involves retrieving data from source. This could also be via streaming
  • Transform - Apply transformation to the extracted data.
  • Load - Loading the data in Operation Data store (ODS) or data warehouse Refer here for more details on ETL. ETL has been made easy by tools like Talend, SSIS and so on.

However, there has been shift from above approach due to,

Learnings from Jeff Richter's Designing and Versioning HTTP REST APIs Video Course

Sachin Sunkle

Background

Recently, i went through excellent video series on Designing & Versioning HTTP_REST APIs presented by Jeffrey Richter. It is available here. In the past, i had read Jeff’s books on CLR and found his writing to be very clear and understandable. So is my experience with this Video Series. Below is summary of learnings from this Video Series. I do not claim that every aspect is covered here so please do check out the videos.

Resiliency Testing with Toxiproxy

Sachin Sunkle

Background

In a typical workflow of software development, Developer implements a Unit/component, tests it and pushes the changes to source control repository. It then goes through Continuous integration, automated testing, provisioning and deployment. Given High availability requirements expected (or should i say assumed) nowadays, As much as functional correctness of the Unit, it is also important to test how a Unit/Component handles failures, delays etc. in distributed environment. Often, such behavior is observed in production itself, unless project team is following practices of Chaos engineering.

Using Temporal.io to build Long running Workflows

Sachin Sunkle

Background

In a typical business Application, there are often requirements for,

  • Batch processing - Often long running Tasks like data import/export, End of day processing etc. These tasks are often scheduled to be executed at pre-defined interval or on occurance of an Event.
  • Asychronous processing - Tasks, often part of business process / workflow, that can be performed asychronously or offloaded.

Such requirements are often fulfilled with custom approaches like batch processing frameworks, ETL Tools or using Queues or specific database features.

Getting Started with OpenTelemetry

Sachin Sunkle

Background

How many times have we landed up in a meeting staring at random slowness or such production issues in a distributed Application ? only to experience helplessness with limited (or often times no) visibility available about the runtime behavior of the Application. It often ends up in manually correlating whatever diagnostic data available from Application and combining it with trace/logs that are available from O/S, databases etc. and trying to figure out “Root cause” of the issue.