ELT approach for Data Pipelines

Introduction

While gathering data for Analytics, one often has to source data from multiple sources. Traditionally, the approach has been to do ETL (Extract-Transform-load) where,

  • Extract - typically involves retrieving data from source. This could also be via streaming
  • Transform - Apply transformation to the extracted data.
  • Load - Loading the data in Operation Data store (ODS) or data warehouse Refer here for more details on ETL. ETL has been made easy by tools like Talend, SSIS and so on.

However, there has been shift from above approach due to,

Learnings from Jeff Richter's Designing and Versioning HTTP REST APIs Video Course

Background

Recently, i went through excellent video series on Designing & Versioning HTTP_REST APIs presented by Jeffrey Richter. It is available here. In the past, i had read Jeff’s books on CLR and found his writing to be very clear and understandable. So is my experience with this Video Series. Below is summary of learnings from this Video Series. I do not claim that every aspect is covered here so please do check out the videos.

Resiliency Testing with Toxiproxy

Background

In a typical workflow of software development, Developer implements a Unit/component, tests it and pushes the changes to source control repository. It then goes through Continuous integration, automated testing, provisioning and deployment. Given High availability requirements expected (or should i say assumed) nowadays, As much as functional correctness of the Unit, it is also important to test how a Unit/Component handles failures, delays etc. in distributed environment. Often, such behavior is observed in production itself, unless project team is following practices of Chaos engineering.

Using Temporal.io to build Long running Workflows

Background

In a typical business Application, there are often requirements for,

  • Batch processing - Often long running Tasks like data import/export, End of day processing etc. These tasks are often scheduled to be executed at pre-defined interval or on occurance of an Event.
  • Asychronous processing - Tasks, often part of business process / workflow, that can be performed asychronously or offloaded.

Such requirements are often fulfilled with custom approaches like batch processing frameworks, ETL Tools or using Queues or specific database features.

Getting Started with OpenTelemetry

Background

How many times have we landed up in a meeting staring at random slowness or such production issues in a distributed Application ? only to experience helplessness with limited (or often times no) visibility available about the runtime behavior of the Application. It often ends up in manually correlating whatever diagnostic data available from Application and combining it with trace/logs that are available from O/S, databases etc. and trying to figure out “Root cause” of the issue.

Ninja - Using lightweight build system for Go projects

Background

I primarily work on Windows for development purposes. Whenever its about writing code in Golang, invariably one comes across usage of Make. A quick check on popular Go projects on Github will show Makefile being used to automate tasks like linting, build, testing and deployment.

Being on Windows, i have been looking for alternative build tool that is easy to setup (i.e. doesn’t require mingw and such environments) and use compared to Make (which is primarily targetted at Unix and Unix like Operating Systems).

Validating urls from 'Useful Links' section using bash / command line tools

Background

I started this blog, https://sachinsu.github.io few months back .

In this relatively short period of time, Blog has sizeable number of useful links across various categories in addition to the detailed blog post like this one.

As an ongoing activity, I think that it is necessary to verify links mentioned on this blog.

So how can it be done ? obviously one way is to do it manually by visiting each link and updating/removing those that are no longer available. but there is always of better way of doing things.

Trobleshooting TCP Connection request time outs

Background

I recently had opportunity to support team who has been battling with Intermittent (scary i know :)) issues with TCP connectivity in Production.

Simplified deployment Architecture is as below,

High Level Architecture

Technology Stack used is Microsoft .NET Framework 4.8 using ODP.NET for Oracle Connectivity (Oracle Server is 8 CPU box). Each of Web Servers in cluster have IIS hosted on it with multiple Applications (Application domains) serving HTTP(s) based traffic. These applications connect to Oracle Database.

Tool to mass DM followers on Twitter in Go

Background

I recently came across bounty by Balaji Srinivasan to send Direct Message to all twitter followers. Currently, i do not intend to participate in bounty and this is mere exercise.

This is an attempt to write CLI tool in Golang in response to it.

For detailed requirements, refer here

Approach

In Brief,

  • CLI should,

    • accept arguments like Twitter API Key,Auth token, DM Message
    • Download all followers (with profile details)
    • Rank them by Criteria (e.g. Location)
    • Send each follower a DM with provided message (upto daily DM Limit)
    • be easy to use and maintain
  • Notes,

Web Security Measures in ASP.NET Applications

At my current workplace, All Applications are expected to adhere to PCI DSS standards meant for Data protection, Access Regulation and so on. Dedicated SOC Team,consisting of Security analyst who are continously on the prawl to identify breach, conduct periodic auditing of Applications, hardening of Servers.

While all our .NET applications adhere to below guidelines,

We also use tools like Snyk to perform code vulnerability analysis as part of Jenkins driven CI/CD pipeline. In spite of above, we do come across vulnerabilities identified by SOC Team which we needs to be addressed quickly. SOC team uses tools such as Burp Suite.