Databases
Page content
Database
Knowledge base around general database related topics.
General Links
- Prismas Data Guide - A growing library of articles focused on making databases more approachable.
- Query optimization guide
- Database performance for Developers
- Heimdall data -Database scale-out without Application changes
- Database of databases
- Modern SQL in databases
- Eventual consistency by Werner Vogels
- Amazon Aurora ascendant: How we designed a cloud-native relational database - All Things Distributed
- Options for scaling from 1 to 100,000 tenants
- Amazon Aurora: design considerations for high throughput cloud-native relational databases | the morning paper
- NOSQL - Key Points
- Criteria for Choosing Data store
- Building Real Time Analytics APIs at Scale
- Streaming Database Changes with Debezium
- Why you should pick strong consistency, whenever possible
- Change Data Capture, Outbox and Event Sourcing
- Debezium Engine - setup without Apache Kafka
- Debezium without kafka connect
- Using Streamsets for CDC From Oracle to Other destinations
- Transactions in Google Spanner
- Things I Wished More Developers Knew About Databases
- Interactive Book about SQL
- SQL Interview Questions
- Hadoop or Laptop
- The lightweight, distributed relational database built on SQLite
- Optimizing SQL Queries, Regardless of Platform
- How to do Data Modelling the right way
- Primer on Database Replication
- Connection pool sizing for databases
- Some SQL tricks from Application DBA
- Best Practices while writing SQL
- Using checksums to verify syncing 100M database records
- How to populate a table with 1 million records using single query
- How databases optimize Sub-queries
- Approaches to database migration
- Tigetbeetle - Fast financial accounting database
- Opinionated thoughts on SQL Databases
Tools Collection
- DBMS Tools
- OctoSQL - Query, Join CSV with Postgresql/mysql from Command line
- TSBS - tool to benchmark bulk load performance and query execution performance.
- Goose - Database schema migrations
- HammerDB - Benchmarking Suite for databases
- Sysbench - Scriptable database and system performance benchmark
- Soda core - Data schema checks, for Quality, as code
- Readyset - MySQL and Postgres wire-compatible caching layer that sits in front of existing databases to speed up queries and horizontally scale read throughput.
Data Analytics
- Understanding avro, parquet and ORC
- Guidance on Data Visualizations
- Simple data pipeline Powertools
- Cube.dev - Open source Headless BI platform
- Evidence.dev - BI as Code - SQL + Markdown to generate Reports
- Apache spark defined
- Getting started with Spark in Python
- About Data Mesh Architecture
- Data mesh vs. Data Fabric
- Emerging Architectures for Modern Data Infrastructure
- Data Visualization/Exploration platforms Comparion Matrix
- Supercharging Apache Superset
- Snowplow - Cloud Native Behavioral data engine (e.g. User Analytics)
- Redash - Collaboration, dashboards
- Why data culture matters
- Designing a data transformation that delivers value right from the beginning
- List of Computational Data Analysis Workflow Systems
- Data Visualization framework for Python
- Analytics Academy by Segment
- Analytics Whitepapers by Sisense
- SQL Analytics Training
- A Beginners Guide to Data Engineering - 3-part series
- Chart types and its usage
- Rudder - Open source Customer Data Infrastructure
- Catalog of Widgets for Data Visualization
- Open source OLAP Database
- Modern Data stack guide by Castor
- Data Stack of 1mg
- A Unified Data Infrastructure Architecture
- Data and AI Product Landscape
- Transformations for DWH using DBT
- Awesome list of Business Intelligence Tools
- Article Series on Open source Data Analytics Stack (Postgres,Meltano, Airflow, dbt and Superset)
- Posthog - open source product analytics platform
- Typical Analytics Stack
- Flat Data - Scheduled Data Download on GitHub Actions in Repository and visualization
- Nocodb - Turn *MySQL/PostgreSQL data in smart Spreadsheet
- Real time data analysis with Apache Pinot and kafka
- UUIds are bad for performance
- Noria - Caching and updating Relational query results
- Differential Datalog - Language for incremental computation
- Using NanoIDs (not longer UUID) for public APis
Duck DB
- DuckDB - Embeddable OLAP DBMS
- SQL Workbench - run Duckdb on WASM
- DuckDB - Connect and join on external databases
ETL,ELT, Database-as-a-queue, Evolutionary Practices
- All about ETL
- Airbyte-Open source ELT
- Database CI/CD practices using Redshift
- Awesome Apache Airflow
- A Python library for building data applications: ETL, ML, Data Pipelines, and more.
- A modern data workflow platform
- Databus - Change Data capture System from Linkedin
- Dolt - Git for Data
- GridDB - next generation database for IoT & big data with both NoSQL interface & SQL Interface.
- Compressing data with Parquet
- Lance - alternate columnar, compressed format for ML
- Mara pipelines - Opinionated ETL framework
- Enso - Interactive Data Workflow builder with no coding
- Database for Event Sourcing
- What are Data Contracts
- Centrifuge - Database as a Queue
Database scaling
- Scaling TIDB to 1 million QPS
- Sharding a database
- MySQL Sharding at Quora
- CUID-Collision-resistant ids optimized for horizontal scaling and performance.
Data Discovery
- OpenMetadata - Data Discovery, Lineage, Data Quality
- Evaluation of Data Discovery Platforms
- Data Discovery at Shopify
- Great Expectations - Data Documentation and Profiling tool
Database Migration Practices
- Zero downtime database migrations
- Stripe - Database Online migration at scale using dual writes
- How big companies migrate from one database to another without losing data i.e database independent?
- Efficiently diff rows across two different databases.
Metadata Management
SQLite
- Query against multiple SQLite databases using ATTACH Command
- Online SQLite Fiddle
- Why you should be using SQLITE(2023)
- Performance tuning settings
- Pocketbase - SQlite database with Go-based Wrapper to expose API
- Scaling SQLITE to 4M QPS on Single Server
- Streaming S3 Replication for SQLite
- lightweight, distributed relational database built on SQLite
- Interesting use cases for SQLITE
- Hosting SQLite databases on Github Pages
- Joining CSV and JSON data with an in-memory SQLite database
- Baked Data Architecture Pattern -DB side by side Web App
- Cron based backups for SQLITE
Data Security, GDPR
- Tool for Sensitive Data Detection from Capital one
- Data bunker - Secure storage for personal records built to comply with GDPR
Search
- Google Code Search using Inverted Index
- Open source Google Code Search tool in Go
- Manticore Search - easy to use open source fast database for search
- ZincSearch - lightweight alternative to ElasticSearch
- Why OpenSearch, fork of ElasticSearch
- Peer to peer web search and Intranet Search Appliance
- Get Started with Opensearch
Capacity Planning
Database Documentation
- [Schema spy - ER Diagram, Metadata Reports][https://github.com/schemaspy/schemaspy]
Data Engineering
- Concepts
- Choosing a Data Catalog
- Awesome Data Catalog
- Create a Serverless Data Lake on AWS and Migrate your On-Prem Data to it
- Data Engineering How tos- List of Curated Articles/Videos
- Guide to Data lake, Data lake house
- Data Lake - Solution Patterns
- What is delta lake house?
- Poor mans Data lake with Duckdb
- Data Model for Managing Collaborative Editing of Data