Posts

Presto - A distributed SQL Engine for variety of data stores

Introduction

In a company/enterprise, typically there are multiple sources of data. This could be result of M&A (where each of those add in a new data store) or result of multi year process of using data stores that are in vogue at that time. Result is combination of various types of relational databases, flat file systems, queues and so on. This results in Data Silos. This scenario is typically observed in companies who are running workloads On-prem (i.e. Pre-cloud, Companies who started on Cloud or have moved to it, typically tend to organize data platform better. This could be because of ease of migrating data on cloud. Typically, they centralize it around cheaper object storage (say AWS S3)).

ELT approach for Data Pipelines

Introduction

While gathering data for Analytics, one often has to source data from multiple sources. Traditionally, the approach has been to do ETL (Extract-Transform-load) where,

  • Extract - typically involves retrieving data from source. This could also be via streaming
  • Transform - Apply transformation to the extracted data.
  • Load - Loading the data in Operation Data store (ODS) or data warehouse Refer here for more details on ETL. ETL has been made easy by tools like Talend, SSIS and so on.

However, there has been shift from above approach due to,

Learnings from Jeff Richter's Designing and Versioning HTTP REST APIs Video Course

Background

Recently, i went through excellent video series on Designing & Versioning HTTP_REST APIs presented by Jeffrey Richter. It is available here. In the past, i had read Jeff’s books on CLR and found his writing to be very clear and understandable. So is my experience with this Video Series. Below is summary of learnings from this Video Series. I do not claim that every aspect is covered here so please do check out the videos.

Resiliency Testing with Toxiproxy

Background

In a typical workflow of software development, Developer implements a Unit/component, tests it and pushes the changes to source control repository. It then goes through Continuous integration, automated testing, provisioning and deployment. Given High availability requirements expected (or should i say assumed) nowadays, As much as functional correctness of the Unit, it is also important to test how a Unit/Component handles failures, delays etc. in distributed environment. Often, such behavior is observed in production itself, unless project team is following practices of Chaos engineering.

Using Temporal.io to build Long running Workflows

Background

In a typical business Application, there are often requirements for,

  • Batch processing - Often long running Tasks like data import/export, End of day processing etc. These tasks are often scheduled to be executed at pre-defined interval or on occurance of an Event.
  • Asychronous processing - Tasks, often part of business process / workflow, that can be performed asychronously or offloaded.

Such requirements are often fulfilled with custom approaches like batch processing frameworks, ETL Tools or using Queues or specific database features.

Getting Started with OpenTelemetry

Background

How many times have we landed up in a meeting staring at random slowness or such production issues in a distributed Application ? only to experience helplessness with limited (or often times no) visibility available about the runtime behavior of the Application. It often ends up in manually correlating whatever diagnostic data available from Application and combining it with trace/logs that are available from O/S, databases etc. and trying to figure out “Root cause” of the issue.

Ninja - Using lightweight build system for Go projects

Background

I primarily work on Windows for development purposes. Whenever its about writing code in Golang, invariably one comes across usage of Make. A quick check on popular Go projects on Github will show Makefile being used to automate tasks like linting, build, testing and deployment.

Being on Windows, i have been looking for alternative build tool that is easy to setup (i.e. doesn’t require mingw and such environments) and use compared to Make (which is primarily targetted at Unix and Unix like Operating Systems).

Validating urls from 'Useful Links' section using bash / command line tools

Background

I started this blog, https://sachinsu.github.io few months back .

In this relatively short period of time, Blog has sizeable number of useful links across various categories in addition to the detailed blog post like this one.

As an ongoing activity, I think that it is necessary to verify links mentioned on this blog.

So how can it be done ? obviously one way is to do it manually by visiting each link and updating/removing those that are no longer available. but there is always of better way of doing things.

Trobleshooting TCP Connection request time outs

Background

I recently had opportunity to support team who has been battling with Intermittent (scary i know :)) issues with TCP connectivity in Production.

Simplified deployment Architecture is as below,

High Level Architecture

Technology Stack used is Microsoft .NET Framework 4.8 using ODP.NET for Oracle Connectivity (Oracle Server is 8 CPU box). Each of Web Servers in cluster have IIS hosted on it with multiple Applications (Application domains) serving HTTP(s) based traffic. These applications connect to Oracle Database.

Tool to mass DM followers on Twitter in Go

Background

I recently came across bounty by Balaji Srinivasan to send Direct Message to all twitter followers. Currently, i do not intend to participate in bounty and this is mere exercise.

This is an attempt to write CLI tool in Golang in response to it.

For detailed requirements, refer here

Approach

In Brief,

  • CLI should,

    • accept arguments like Twitter API Key,Auth token, DM Message
    • Download all followers (with profile details)
    • Rank them by Criteria (e.g. Location)
    • Send each follower a DM with provided message (upto daily DM Limit)
    • be easy to use and maintain
  • Notes,