- •About…
- •About the Book
- •About the Author
- •Acknowledgements
- •About the organisation of the books
- •Structured Query Language
- •A First Use Case
- •Loading the Data Set
- •Application Code and SQL
- •Back to Discovering SQL
- •Computing Weekly Changes
- •Software Architecture
- •Why PostgreSQL?
- •The PostgreSQL Documentation
- •Getting Ready to read this Book
- •Business Logic
- •Every SQL query embeds some business logic
- •Business Logic Applies to Use Cases
- •Correctness
- •Efficiency
- •Stored Procedures — a Data Access API
- •Procedural Code and Stored Procedures
- •Where to Implement Business Logic?
- •A Small Application
- •Readme First Driven Development
- •Chinook Database
- •Top-N Artists by Genre
- •Intro to psql
- •The psqlrc Setup
- •Transactions and psql Behavior
- •Discovering a Schema
- •Interactive Query Editor
- •SQL is Code
- •SQL style guidelines
- •Comments
- •Unit Tests
- •Regression Tests
- •A Closer Look
- •Indexing Strategy
- •Indexing for Queries
- •Choosing Queries to Optimize
- •PostgreSQL Index Access Methods
- •Advanced Indexing
- •Adding Indexes
- •An Interview with Yohann Gabory
- •Get Some Data
- •Structured Query Language
- •Queries, DML, DDL, TCL, DCL
- •Select, From, Where
- •Anatomy of a Select Statement
- •Projection (output): Select
- •Restrictions: Where
- •Order By, Limit, No Offset
- •Ordering with Order By
- •kNN Ordering and GiST indexes
- •Top-N sorts: Limit
- •No Offset, and how to implement pagination
- •Group By, Having, With, Union All
- •Aggregates (aka Map/Reduce): Group By
- •Aggregates Without a Group By
- •Restrict Selected Groups: Having
- •Grouping Sets
- •Common Table Expressions: With
- •Distinct On
- •Result Sets Operations
- •Understanding Nulls
- •Three-Valued Logic
- •Not Null Constraints
- •Outer Joins Introducing Nulls
- •Using Null in Applications
- •Understanding Window Functions
- •Windows and Frames
- •Partitioning into Different Frames
- •Available Window Functions
- •When to Use Window Functions
- •Relations
- •SQL Join Types
- •An Interview with Markus Winand
- •Serialization and Deserialization
- •Some Relational Theory
- •Attribute Values, Data Domains and Data Types
- •Consistency and Data Type Behavior
- •PostgreSQL Data Types
- •Boolean
- •Character and Text
- •Server Encoding and Client Encoding
- •Numbers
- •Floating Point Numbers
- •Sequences and the Serial Pseudo Data Type
- •Universally Unique Identifier: UUID
- •Date/Time and Time Zones
- •Time Intervals
- •Date/Time Processing and Querying
- •Network Address Types
- •Denormalized Data Types
- •Arrays
- •Composite Types
- •Enum
- •PostgreSQL Extensions
- •An interview with Grégoire Hubert
- •Object Relational Mapping
- •Tooling for Database Modeling
- •How to Write a Database Model
- •Generating Random Data
- •Modeling Example
- •Normalization
- •Data Structures and Algorithms
- •Normal Forms
- •Database Anomalies
- •Modeling an Address Field
- •Primary Keys
- •Foreign Keys Constraints
- •Not Null Constraints
- •Check Constraints and Domains
- •Exclusion Constraints
- •Practical Use Case: Geonames
- •Features
- •Countries
- •Modelization Anti-Patterns
- •Entity Attribute Values
- •Multiple Values per Column
- •UUIDs
- •Denormalization
- •Premature Optimization
- •Functional Dependency Trade-Offs
- •Denormalization with PostgreSQL
- •Materialized Views
- •History Tables and Audit Trails
- •Validity Period as a Range
- •Pre-Computed Values
- •Enumerated Types
- •Multiple Values per Attribute
- •The Spare Matrix Model
- •Denormalize wih Care
- •Not Only SQL
- •Schemaless Design in PostgreSQL
- •Durability Trade-Offs
- •Another Small Application
- •Insert, Update, Delete
- •Insert Into
- •Insert Into … Select
- •Update
- •Inserting Some Tweets
- •Delete
- •Tuples and Rows
- •Deleting All the Rows: Truncate
- •Isolation and Locking
- •About SSI
- •Putting Concurrency to the Test
- •Computing and Caching in SQL
- •Views
- •Materialized Views
- •Triggers
- •Transactional Event Driven Processing
- •Trigger and Counters Anti-Pattern
- •Fixing the Behavior
- •Event Triggers
- •Listen and Notify
- •PostgreSQL Notifications
- •Notifications and Cache Maintenance
- •Listen and Notify Support in Drivers
- •Batch Update, MoMA Collection
- •Updating the Data
- •Concurrency Patterns
- •On Conflict Do Nothing
- •An Interview with Kris Jenkins
- •Installing and Using PostgreSQL Extensions
- •Finding PostgreSQL Extensions
- •A Short List of Noteworthy Extensions
- •Auditing Changes with hstore
- •Introduction to hstore
- •Comparing hstores
- •Auditing Changes with a Trigger
- •Testing the Audit Trigger
- •From hstore Back to a Regular Record
- •Last.fm Million Song Dataset
- •Using Trigrams For Typos
- •The pg_trgm PostgreSQL Extension
- •Trigrams, Similarity and Searches
- •Complete and Suggest Song Titles
- •Trigram Indexing
- •Denormalizing Tags with intarray
- •Advanced Tag Indexing
- •User-Defined Tags Made Easy
- •The Most Popular Pub Names
- •A Pub Names Database
- •Normalizing the Data
- •Geolocating the Nearest Pub (k-NN search)
- •How far is the nearest pub?
- •The earthdistance PostgreSQL contrib
- •Pubs and Cities
- •The Most Popular Pub Names by City
- •Geolocation with PostgreSQL
- •Geolocation Data Loading
- •Geolocation Metadata
- •Emergency Pub
- •Counting Distinct Users with HyperLogLog
- •HyperLogLog
- •Installing postgresql-hll
- •Counting Unique Tweet Visitors
- •Lossy Unique Count with HLL
- •Getting the Visits into Unique Counts
- •Scheduling Estimates Computations
- •Combining Unique Visitors
- •An Interview with Craig Kerstiens
Chapter 31 Denormalization j 274
When dealing with a short list of items, the normalized way to do that is to handle the catalog of accepted values in a dedicated table and reference this table everywhere your schema uses that catalog of values.
When using more than join_collapse_limit or from_collapse_limit relations in SQL queries, the PostgreSQL optimizer might be defeated… so in some schema using an ENUM data type rather than a reference table can be bene cial.
Multiple Values per Attribute
In the CSV anti-pattern database model, we saw all the disadvantages of using multiple values per attribute in general, with a text-based schema and a separator used in the attribute values.
Managing several values per attribute, in the same row, can help reduce how many rows your application must manage. The normalized alternative has a side table for the entries, with a reference to the main table’s primary key.
Given PostgreSQL array support for searching and indexing, it is more e cient at times to manage the list of entries as an array attribute in our main table. This is particularly e fective when the application of en has to delete entries and all referenced data.
In some cases, multiple attributes each containing multiple values are needed. PostgreSQL arrays of composite type instances might then be considered. Cases when that model beats the normalized schema are rare, though, and managing this complexity isn’t free.
The Spare Matrix Model
In cases where your application manages lots of optional attributes per row, most of them never being used, they can be denormalized to a JSONB extra column with those attributes, all managed into a single document.
When restricting this extra jonsb attribute to values never referenced anywhere else in the schema, and when the application only needs this extra data as a whole, then jsonb is a very good trade-o f for a normalized schema.
Chapter 31 Denormalization j 275
Partitioning
Partitioning refers to splitting a table with too many rows into a set of tables each containing a part of those rows. Several kinds of partitioning are available, such as list or range partitioning. Starting in PostgreSQL 10, table partitioning is supported directly.
While partitioning isn’t denormalization as such, the limits of the PostgreSQL implementation makes it valuable to include the technique in this section. Quoting the PostgreSQL documentation:
•There is no facility available to create the matching indexes on all partitions automatically. Indexes must be added to each partition with separate commands. This also means that there is no way to create a primary key, unique constraint, or exclusion constraint spanning all partitions; it is only possible to constrain each leaf partition individually.
•Since primary keys are not supported on partitioned tables, foreign keys referencing partitioned tables are not supported, nor are foreign key references from a partitioned table to some other table.
•Using the ON CONFLICT clause with partitioned tables will cause an error, because unique or exclusion constraints can only be created on individual partitions. There is no support for enforcing uniqueness (or an exclusion constraint) across an entire partitioning hierarchy.
•An UPDATE that causes a row to move from one partition to another fails, because the new value of the row fails to satisfy the implicit partition constraint of the original partition.
•Row triggers, if necessary, must be de ned on individual partitions, not the partitioned table.
So when using partitioning in PostgreSQL 10, we lose the ability to reach even the rst normal form by the lack of covering primary key. Then we lose the ability to maintain a reference to the partitioned table with a foreign key.
Before partitioning any table in PostgreSQL, including PostgreSQL 10, as with any other denormalization technique (covered here or not), please do your homework: check that it’s really not possible to sustain the application’s workload with a normalized model.
Chapter 31 Denormalization j 276
Other Denormalization Tools
PostgreSQL extensions such as hstore, ltree, intarray or pg_trgm o fer another set of interesting trade-o fs to implement speci c use cases.
For example ltree can be used to implement nested category catalogs and reference articles precisely in this catalog.
Denormalize wih Care
It’s been mentioned already, and it is worth saying it again. Only denormalize your application’s schema when you know what you’re doing, and when you’ve double-checked that there’s no other possibility for implementing your application and business cases with the required level of performance.
First, query optimization techniques — mainly rewriting until it’s obvious for PostgreSQL how to best execute a query — can go a long way. Production examples of query rewrite improving durations from minutes to milliseconds are commonly achieved, in particular against queries written by ORMs or other naive toolings.
Second, denormalization is an optimization technique meant to leverage trade- o fs. Allow me to quote Rob Pike again, as he establishes his rst rule of programming in Notes on Programming in C as the following:
Rule 1. You can’t tell where a program going to spend its time. Bottlenecks occur in surprising plac , so don’t try to second guess and put in a speed hack until you’ve proven that’s where the bottleneck
.
The rule works as well for a database model as it does for a program. Maybe the database model is even more tricky because we only measure time spent by ran queries, usually, and not the time it takes to:
•Understand the database model
•Understand how to use the database model to solve a new business case
•Write the SQL queries necessary to the application code
•Validate data quality
Chapter 31 Denormalization j 277
So again, only put all those nice properties at risk with denormalizing the schema when there’s no other choice.
