Database Knowledge base around general database related topics.
General Links Which Data Architecture to choose Prisma’s Data Guide - A growing library of articles focused on making databases more approachable. Query optimization guide Database performance for Developers Heimdall data -Database scale-out without Application changes Database of databases Modern SQL in databases Eventual consistency by Werner Vogels Amazon Aurora ascendant: How we designed a cloud-native relational database - All Things Distributed Options for scaling from 1 to 100,000 tenants Amazon Aurora: design considerations for high throughput cloud-native relational databases | the morning paper NOSQL - Key Points Criteria for Choosing Data store Building Real Time Analytics APIs at Scale Streaming Database Changes with Debezium Why you should pick strong consistency, whenever possible Change Data Capture, Outbox and Event Sourcing Debezium Engine - setup without Apache Kafka Debezium without kafka connect Using Streamsets for CDC From Oracle to Other destinations Transactions in Google Spanner Things I Wished More Developers Knew About Databases Interactive Book about SQL SQL Interview Questions Hadoop or Laptop The lightweight, distributed relational database built on SQLite Optimizing SQL Queries, Regardless of Platform How to do Data Modelling the right way Primer on Database Replication Connection pool sizing for databases Some SQL tricks from Application DBA Best Practices while writing SQL Using checksums to verify syncing 100M database records How to populate a table with 1 million records using single query How databases optimize Sub-queries Approaches to database migration Tigetbeetle - Fast financial accounting database Opinionated thoughts on SQL Databases Tools Collection DBMS Tools OctoSQL - Query, Join CSV with Postgresql/mysql from Command line TSBS - tool to benchmark bulk load performance and query execution performance. Goose - Database schema migrations HammerDB - Benchmarking Suite for databases Sysbench - Scriptable database and system performance benchmark Soda core - Data schema checks, for Quality, as code Readyset - MySQL and Postgres wire-compatible caching layer that sits in front of existing databases to speed up queries and horizontally scale read throughput. Data Analytics Understanding avro, parquet and ORC Guidance on Data Visualizations Simple data pipeline Powertools Cube.dev - Open source Headless BI platform Evidence.dev - BI as Code - SQL + Markdown to generate Reports Apache spark defined Getting started with Spark in Python About Data Mesh Architecture Data mesh vs. Data Fabric Emerging Architectures for Modern Data Infrastructure Data Visualization/Exploration platforms Comparion Matrix Supercharging Apache Superset Snowplow - Cloud Native Behavioral data engine (e.g. User Analytics) Redash - Collaboration, dashboards Why data culture matters Designing a data transformation that delivers value right from the beginning List of Computational Data Analysis Workflow Systems Data Visualization framework for Python Analytics Academy by Segment Analytics Whitepapers by Sisense SQL Analytics Training A Beginner’s Guide to Data Engineering - 3-part series Chart types and its usage Rudder - Open source Customer Data Infrastructure Catalog of Widgets for Data Visualization Open source OLAP Database Modern Data stack guide by Castor Data Stack of 1mg A Unified Data Infrastructure Architecture Data and AI Product Landscape Transformations for DWH using DBT Awesome list of Business Intelligence Tools Article Series on Open source Data Analytics Stack (Postgres,Meltano, Airflow, dbt and Superset) Posthog - open source product analytics platform Typical Analytics Stack Flat Data - Scheduled Data Download on GitHub Actions in Repository and visualization Nocodb - Turn *MySQL/PostgreSQL data in smart Spreadsheet Real time data analysis with Apache Pinot and kafka UUIds are bad for performance Noria - Caching and updating Relational query results Differential Datalog - Language for incremental computation Using NanoIDs (not longer UUID) for public APis In-memory Databases Dragonfly - Compatible with REDIS Duck DB DuckDB - Embeddable OLAP DBMS SQL Workbench - run Duckdb on WASM DuckDB - Connect and join on external databases Using duckdb and postgres together ETL,ELT, Database-as-a-queue, Evolutionary Practices All about ETL Airbyte-Open source ELT Database CI/CD practices using Redshift Awesome Apache Airflow A Python library for building data applications: ETL, ML, Data Pipelines, and more. A modern data workflow platform Databus - Change Data capture System from Linkedin Dolt - Git for Data GridDB - next generation database for IoT & big data with both NoSQL interface & SQL Interface. Compressing data with Parquet Lance - alternate columnar, compressed format for ML Mara pipelines - Opinionated ETL framework Enso - Interactive Data Workflow builder with no coding Database for Event Sourcing What are Data Contracts Centrifuge - Database as a Queue Database scaling Database Hardware Selection Scaling TIDB to 1 million QPS Sharding a database MySQL Sharding at Quora CUID-Collision-resistant ids optimized for horizontal scaling and performance. Data Discovery OpenMetadata - Data Discovery, Lineage, Data Quality Evaluation of Data Discovery Platforms Data Discovery at Shopify Great Expectations - Data Documentation and Profiling tool Database Migration Practices Zero downtime database migrations Stripe - Database Online migration at scale using dual writes How big companies migrate from one database to another without losing data i.e database independent? Efficiently diff rows across two different databases. Metadata Management Growing importance of Metadata Management Systems SQLite Scaling SQLite to 4M QPS on a single server (EC2 vs Bare Metal) Query against multiple SQLite databases using ATTACH Command Online SQLite Fiddle Why you should be using SQLITE(2023) Performance tuning settings Pocketbase - SQlite database with Go-based Wrapper to expose API Scaling SQLITE to 4M QPS on Single Server Streaming S3 Replication for SQLite lightweight, distributed relational database built on SQLite Interesting use cases for SQLITE Hosting SQLite databases on Github Pages Joining CSV and JSON data with an in-memory SQLite database Baked Data Architecture Pattern -DB side by side Web App Cron based backups for SQLITE Data Security, GDPR Tool for Sensitive Data Detection from Capital one Data bunker - Secure storage for personal records built to comply with GDPR Search Google Code Search using Inverted Index Open source Google Code Search tool in Go Manticore Search - easy to use open source fast database for search ZincSearch - lightweight alternative to ElasticSearch Why OpenSearch, fork of ElasticSearch Peer to peer web search and Intranet Search Appliance Get Started with Opensearch Capacity Planning About Oracle Capacity Planning Guidelines for SQL Server Capacity Planning Database Documentation [Schema spy - ER Diagram, Metadata Reports][https://github.com/schemaspy/schemaspy] Data Engineering Concepts Choosing a Data Catalog Awesome Data Catalog Create a Serverless Data Lake on AWS and Migrate your On-Prem Data to it Data Engineering How tos- List of Curated Articles/Videos Guide to Data lake, Data lake house Data Lake - Solution Patterns What is delta lake house? Poor man’s Data lake with Duckdb Data Model for Managing Collaborative Editing of Data Data platform playbook Dictionary of databases Database of Databases