Near real time API Monitoring with Grafana and PostgreSQL

Page content

Introduction

Suppose you have a distributed application running in production and it is based on Micro services/Service Oriented Architecture and have SLA of being “always on” (be available 24*7, barring deployments of course !!). In such cases, having proper monitoring of Application health in place is absolutely essential.

What if Monitoring is an afterthought (i.e. application is already in production) ? and that there is little apetite for additional components like (Visualization tools, specialized storage for logs/metrics/traces) for monitoring?

Is it even possible to have near real time Monitoring of Application’s behaviour using already-in-use technologies (like PostgreSQL) ?

Monitoring and more generically, “Observability” has three pillars. They are Logs, Metrics and traces. Many of the existing applications are producing either (mostly logs or traces) but seldom all. Hence, it is necessary to use existing logs/traces as basis for Metrics generation.

There are on-going developments With standards like Opentelemetry in this field. Some have even suggested ( here & here) that traces (distributed) will eventually replace logging.

Approach

The high level architecture looks like below,

High Level Architecture

Considering as-is state of Application Architecture and given the constraints (mentioned earlier), this post covers approach that is based upon,

  • PostgreSQL - Data store for Analytics and Reporting
  • TimescaleDB - Timescale plugin for PostgreSQL
  • FluentBit - Processing of Web Server logs & forwarding to database
  • Grafana - Data Visualization and Monitoring platform.

Lets see how to get this done step by step.

Data Collection and Storage

TimescaleDB

Timescale is a Postgresql Plugin for time-series data management.

Rationale

  • The reports and dashboards expected for near real time API monitoring are time intensive in nature.
  • TimescaleDB is optimized for such time intensive reporting and suits well for this use case as it is a plugin over PostgreSQL, which is already being used for analytics/reporting.

Installation of plugin is straightforward. Step by Step tutorial is very helpful.

Next step is to create a database for the data to be used for Monitoring. Hyper table(s) in this database will contain Metrics data, collected from Application and web server (IIS).

One of the required dashboard/report was to monitor API request(s) in terms of success & failure (%), Response times (in buckets like 1-5 secs,5-10 secs and so on).

For each API request, application collects specific details and persists it in database. Currently below attributes are stored in database as part of each log entry,

AttributeDescription
TimeTimestamp of event
ServiceAttribute indicating service name
OperationAttribute indicating Operation of the service for which request was received
OutcomeOutcome of the API Invocation i.e. Success or Failure
TimeoutTimestamp of completion of API invocation

The DDL command will look like,


create table apilog
(time    timestamptz  not null,
 service  text not null,
 operation  text, 
 outcome  text not null,
 timeout  timestamptz );

After creating the table, it will have to be converted into Hypertable by using command,

SELECT create_hypertable('apilog', 'time');

Note: Timescale transparently manages storage for hyper table and PostgreSQL Developer can continue to use standard SQL/plpgsql with it.

For the sake of quick testing, One can add dummy data to this table using below SQL,

insert into apilog
SELECT (current_timestamp - '0 day'::interval), (case when x = 1 then 'finance' 
                                  else 'it' end),(case when x = 1 then 'getPrices' 
                                  else 'getUptime' end),  (case when x < 2 then 'success' else 'failure' end),   (current_timestamp - '0 day'::interval) + trunc(random()  * 20) * '1 second'::interval FROM generate_series(0, 5000, 5) AS t(x);

Currently, Application generates log events in OLTP Database and data from this database is replicated to Reporting database. Since we have created new Hyper table to host this data,a simple approach of Trigger can be used to populate it from current table.

In real scenario, you may want to consider replicating the data directly to hyper table.

FluentBit

So far , we have collected Application logs in the database. There is one more source which is of importance in the context of Monitoring and that is infrastructure software. It could be Operating System, Web Servers and so on. They generate lot of logs and metrices that can be ingested and consumed in conjuction with Application log to get better picture. We will look at how Web server logs can be sourced in data source.

There are many monitoring tools (refer Useful links below for comparison) available with focus on IT and Network monitoring. Such tools readily include infrastructure software too. For the sake of this article, We can use Log collector tool for this purpose. As such there are many log collector tools available, we will use Fluentbit.At a very High level, It has concepts of,

  • Input - Log sources
  • Parsers - Plugins to parse & transform the logs
  • Output - Log Destination like Prometheus, Kafka, PostgreSQL and so on.

Some of the advantages of Fluentbit are,

  • High log delivery performance with efficient resource utilization
  • Robut and Lightweight approach
  • Log enrichment at Node level itself than on the destination
  • Simpler configuration format

Setup FluentBit - Fluentbit provides binaries that are bundled with package managers in case of Linux and as installers for Windows.

As of writing of this post, Pre-built binaries do not include output plugin for PostgreSQL. So Fluentbit has to be built from source after modifying Cmakelist so,

  • Clone the github repository

  • Modify CMakeLists.txt file as below,

    option(FLB_OUT_PGSQL "Enable PostgreSQL output plugin" No)

    to

    option(FLB_OUT_PGSQL "Enable PostgreSQL output plugin" Yes)

  • Refer to Compiling from Source for further details.

Configuration - Once fluentbit is installed, It needs to be configured to read Web server logs , parse them and push them to PostgreSQL.

Below is sample configuration to periodically read Web Server Logs (in w3c log format), parse and push them to PostgreSQL,

[SERVICE]
    Flush        5
    Daemon       Off
    Log_Level    debug
    Log_File     d:\monitoring\fluentbit.log
    Parsers_File parsers.conf
    Parsers_File generated/parsers_generated.conf
    HTTP_Server  On
    HTTP_Listen  0.0.0.0
    HTTP_Port    2020

[INPUT]
    Name           tail
    Tag            format.iis
    Parser         dips-w3c
    path           d:\temp\iis.log
    DB             d:\temp\raw.iis.db                    


[OUTPUT]
    Name          pgsql
    Match         *
    Host          172.0.0.1
    Port          5432
    User          fluentbit
    Password      fluentbit
    Database      timescalepoc
    Table         iislogs
    Timestamp_Key time

Configuration for Parser is as below,

[PARSER]
    Name           dips-w3c
    Format         regex
    Regex         ^(?<time>\d{4}-\d{2}-\d{2} \d{2}[\:\.]\d{2}[\:\.]\d{2}) (?<serverip>\S+) (?<method>\S+) (?<uristem>\S+) (?<uriquery>\S+) (?<serverport>\S+) (?<username>\S+) (?<clientip>\S+) (?<userAgent>\S+) (?<referrer>\S+) (?<status>\S+) (?<substatus>\S+) (?<win32status>\S+) (?<timetaken>\S+) (?<useragent1>\S+) (?<auth>\S+) (?<contenttype>\S+)
    Time_Key       time
    Time_Format    %F %T
    Time_Keep      True
    types          serverPort:integer httpStatus:integer httpSubStatus:integer win32Status:integer timetaken:integer

This parser basically uses Regular Expression to parse each line in log file into key - value pairs with data points of interest.

In terms of output, Fluentbit’s Postgresql plugin provisions the table itself with a structure that stores entire JSON in field as part of row. Either this table can be used as is or use “Before insert” trigger as suggested by Fluentbit’s manual to parse the Json and populate separate table.

Fluentbit can be easily configured to run as daemon (on Linux) or Service (on windows).

Visualization

With data getting added to timescaledb Hyper table,Lets see how it can be visualized.

Typically, there are 2 approaches to be considered for Visualization,

  • Custom-built Web UI - This only makes sense if,

    • There is already a Reporting/Visualization Web UI in place and adding new dashboards/reports is not much pain
    • Not much customization and/or slicing-dicing is expected.
    • Limited Efforts available.
  • Off the shelf Tools - This approach makes sense if,

  • It is expected that Monitoring dashboards should be flexible and provide ease of customization by business or power users.

  • Additional dashboards are expected or can be provisioned with minimal or no coding.

There are many paid and open source tools available. Notable OSS options are,

  • Grafana - Tailor made for Monitoring and extensive analysis of Time series data.
  • Apache Superset - open-source application for data exploration and data visualization able to handle data at petabyte scale.

Lets see how Grafana can be used for visualization (Probably, i may evaluate superset some time and update this post.)

Grafana

Grafana has multiple offerings and one of them being Open source, Self-hosted Application. It has Go backend and is very easy to install. For Windows, Just follow the steps at Installation.

Once grafana is setup, one can quickly start it by running grafana-server. By default, it starts Web server at port 3000. With Grafana Web-based GUI up and running, lets perform below steps to get dashboard in place.

  • Connectivity to PostgreSQL - One needs to add Data Source in Grafana which in this case is PostgreSQL Database. It can be added from sidebar on Grafana UI, by hovering over “Configuration” option. In below screenshot, it shows configuration.

Grafana: Connect to PostgreSQL

  • Add Dashboard - Once the Data source is setup, next step is to add a dashboard. Dashboard essentially is a visualization or a report. It has Query (in this case SQL Query) to fetch the data. Below screenshot shows configuration of simple query for Dashboard,

Grafana: Query for Dashboard

Grafana requires certain functions to be included (like $__time(..) and $__timeFilter(..)) in query so as to facilitate filtering/ordering by user through UI, like shown below,

Grafana: View data and apply Filter

Grafana provides extensive ways to transform on the data fetched by SQL Query. This feature is more aimed at business and power user who may want to perform additional analysis on it. Alternative is to provide desired SQL and get the visualization like Time series or Graph as shown below,

Grafana: Complex SQL Query with minimal transformation

Grafana: Time Series Visualization

Note that there are many more features provided by Grafana (in terms of transformations, Visualization options, Access Control to UI itself and so on.

Key points with this approach are,

  • Leveraging tools/products currently in use.
  • Greater Flexibility in Visualization over custom built tool containing canned reports/graphs
  • Lesser learning curve than inducting new tools.

This post barely touches surface of what each of the individual tools mentioned have on offer, one would do well to go through their documentation to derive most value out of it.

Happy Coding !!