Database Reliability Engineering - My Notes

Sachin Sunkle
Introduction I have been reading excellent Database Reliability Engineering book and below are my notes from it. Key Incentive(s) for Automation Elimination of Toil - Toil is the kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows. Important System Characteristics Latency, also known as response time, is a time-based measurement indicating how long it takes to receive a response from a request.

Near real time API Monitoring with Grafana and PostgreSQL

Sachin Sunkle
Introduction Suppose you have a distributed application running in production and it is based on Micro services/Service Oriented Architecture and have SLA of being “always on” (be available 24*7, barring deployments of course !!). In such cases, having proper monitoring of Application health in place is absolutely essential. What if Monitoring is an afterthought (i.e. application is already in production) ? and that there is little apetite for additional components like (Visualization tools, specialized storage for logs/metrics/traces) for monitoring?

Upgrading API: Learnings

Sachin Sunkle
Introduction One of the design considerations stressed upon by Jeffrey richter about APIs (Read more here) is that “API is expected to be stable over long period of time”. Recently,for a .NET based project, we decided to upgrade some of the ASMX (legacy SOAP based approach) based APIs and were immediately reminded by Customer(s) to avoid any kind of impact on existing users. This means that upgrade must be done keeping in mind,

Presto - A distributed SQL Engine for variety of data stores

Sachin Sunkle
Introduction In a company/enterprise, typically there are multiple sources of data. This could be result of M&A (where each of those add in a new data store) or result of multi year process of using data stores that are in vogue at that time. Result is combination of various types of relational databases, flat file systems, queues and so on. This results in Data Silos. This scenario is typically observed in companies who are running workloads On-prem (i.

ELT approach for Data Pipelines

Sachin Sunkle
Introduction While gathering data for Analytics, one often has to source data from multiple sources. Traditionally, the approach has been to do ETL (Extract-Transform-load) where, Extract - typically involves retrieving data from source. This could also be via streaming Transform - Apply transformation to the extracted data. Load - Loading the data in Operation Data store (ODS) or data warehouse Refer here for more details on ETL. ETL has been made easy by tools like Talend, SSIS and so on.