- •About…
- •About the Book
- •About the Author
- •Acknowledgements
- •About the organisation of the books
- •Structured Query Language
- •A First Use Case
- •Loading the Data Set
- •Application Code and SQL
- •Back to Discovering SQL
- •Computing Weekly Changes
- •Software Architecture
- •Why PostgreSQL?
- •The PostgreSQL Documentation
- •Getting Ready to read this Book
- •Business Logic
- •Every SQL query embeds some business logic
- •Business Logic Applies to Use Cases
- •Correctness
- •Efficiency
- •Stored Procedures — a Data Access API
- •Procedural Code and Stored Procedures
- •Where to Implement Business Logic?
- •A Small Application
- •Readme First Driven Development
- •Chinook Database
- •Top-N Artists by Genre
- •Intro to psql
- •The psqlrc Setup
- •Transactions and psql Behavior
- •Discovering a Schema
- •Interactive Query Editor
- •SQL is Code
- •SQL style guidelines
- •Comments
- •Unit Tests
- •Regression Tests
- •A Closer Look
- •Indexing Strategy
- •Indexing for Queries
- •Choosing Queries to Optimize
- •PostgreSQL Index Access Methods
- •Advanced Indexing
- •Adding Indexes
- •An Interview with Yohann Gabory
- •Get Some Data
- •Structured Query Language
- •Queries, DML, DDL, TCL, DCL
- •Select, From, Where
- •Anatomy of a Select Statement
- •Projection (output): Select
- •Restrictions: Where
- •Order By, Limit, No Offset
- •Ordering with Order By
- •kNN Ordering and GiST indexes
- •Top-N sorts: Limit
- •No Offset, and how to implement pagination
- •Group By, Having, With, Union All
- •Aggregates (aka Map/Reduce): Group By
- •Aggregates Without a Group By
- •Restrict Selected Groups: Having
- •Grouping Sets
- •Common Table Expressions: With
- •Distinct On
- •Result Sets Operations
- •Understanding Nulls
- •Three-Valued Logic
- •Not Null Constraints
- •Outer Joins Introducing Nulls
- •Using Null in Applications
- •Understanding Window Functions
- •Windows and Frames
- •Partitioning into Different Frames
- •Available Window Functions
- •When to Use Window Functions
- •Relations
- •SQL Join Types
- •An Interview with Markus Winand
- •Serialization and Deserialization
- •Some Relational Theory
- •Attribute Values, Data Domains and Data Types
- •Consistency and Data Type Behavior
- •PostgreSQL Data Types
- •Boolean
- •Character and Text
- •Server Encoding and Client Encoding
- •Numbers
- •Floating Point Numbers
- •Sequences and the Serial Pseudo Data Type
- •Universally Unique Identifier: UUID
- •Date/Time and Time Zones
- •Time Intervals
- •Date/Time Processing and Querying
- •Network Address Types
- •Denormalized Data Types
- •Arrays
- •Composite Types
- •Enum
- •PostgreSQL Extensions
- •An interview with Grégoire Hubert
- •Object Relational Mapping
- •Tooling for Database Modeling
- •How to Write a Database Model
- •Generating Random Data
- •Modeling Example
- •Normalization
- •Data Structures and Algorithms
- •Normal Forms
- •Database Anomalies
- •Modeling an Address Field
- •Primary Keys
- •Foreign Keys Constraints
- •Not Null Constraints
- •Check Constraints and Domains
- •Exclusion Constraints
- •Practical Use Case: Geonames
- •Features
- •Countries
- •Modelization Anti-Patterns
- •Entity Attribute Values
- •Multiple Values per Column
- •UUIDs
- •Denormalization
- •Premature Optimization
- •Functional Dependency Trade-Offs
- •Denormalization with PostgreSQL
- •Materialized Views
- •History Tables and Audit Trails
- •Validity Period as a Range
- •Pre-Computed Values
- •Enumerated Types
- •Multiple Values per Attribute
- •The Spare Matrix Model
- •Denormalize wih Care
- •Not Only SQL
- •Schemaless Design in PostgreSQL
- •Durability Trade-Offs
- •Another Small Application
- •Insert, Update, Delete
- •Insert Into
- •Insert Into … Select
- •Update
- •Inserting Some Tweets
- •Delete
- •Tuples and Rows
- •Deleting All the Rows: Truncate
- •Isolation and Locking
- •About SSI
- •Putting Concurrency to the Test
- •Computing and Caching in SQL
- •Views
- •Materialized Views
- •Triggers
- •Transactional Event Driven Processing
- •Trigger and Counters Anti-Pattern
- •Fixing the Behavior
- •Event Triggers
- •Listen and Notify
- •PostgreSQL Notifications
- •Notifications and Cache Maintenance
- •Listen and Notify Support in Drivers
- •Batch Update, MoMA Collection
- •Updating the Data
- •Concurrency Patterns
- •On Conflict Do Nothing
- •An Interview with Kris Jenkins
- •Installing and Using PostgreSQL Extensions
- •Finding PostgreSQL Extensions
- •A Short List of Noteworthy Extensions
- •Auditing Changes with hstore
- •Introduction to hstore
- •Comparing hstores
- •Auditing Changes with a Trigger
- •Testing the Audit Trigger
- •From hstore Back to a Regular Record
- •Last.fm Million Song Dataset
- •Using Trigrams For Typos
- •The pg_trgm PostgreSQL Extension
- •Trigrams, Similarity and Searches
- •Complete and Suggest Song Titles
- •Trigram Indexing
- •Denormalizing Tags with intarray
- •Advanced Tag Indexing
- •User-Defined Tags Made Easy
- •The Most Popular Pub Names
- •A Pub Names Database
- •Normalizing the Data
- •Geolocating the Nearest Pub (k-NN search)
- •How far is the nearest pub?
- •The earthdistance PostgreSQL contrib
- •Pubs and Cities
- •The Most Popular Pub Names by City
- •Geolocation with PostgreSQL
- •Geolocation Data Loading
- •Geolocation Metadata
- •Emergency Pub
- •Counting Distinct Users with HyperLogLog
- •HyperLogLog
- •Installing postgresql-hll
- •Counting Unique Tweet Visitors
- •Lossy Unique Count with HLL
- •Getting the Visits into Unique Counts
- •Scheduling Estimates Computations
- •Combining Unique Visitors
- •An Interview with Craig Kerstiens
Chapter 31 Denormalization j 270
in queries and join operations and then build a materialized view on top of it. This makes it easy to see how much the materialized view has drif ed from the authoritative version of the content with a simple except query. It also helps to disable the cache provided by the materialized view in your application: only change the name of the relation and have the same result set, only known to be current.
This cache now is to be invalidated af er every race and implementing cache invalidation is as easy as running the following refresh materialized view query:
1refresh materialized view cache.season_points;
The cache.season_points relation is locked out from even select activity while its content is being computed again. For very simple materialized view de nitions it is possible to refresh concurrently and avoid locking out concurrent readers.
Now that we have a cache, the application query to retrieve the same result set as before is the following:
1 select driver, constructor, points
2 from cache.season_points
3 where season = 2017
4and points > 150;
History Tables and Audit Trails
Some business cases require having a full history of changes available for audit trails. What’s usually done is to maintain live data into the main table, modeled with the rules we already saw, and model a speci c history table convering where to maintain previous versions of the rows, or an archive.
A history table itself isn’t a denormalized version of the main table but rather another version of the model entirely, with a di ferent primary key to begin with.
What parts that might require denormalization for history tables are?
•Foreign key references to other tables won’t be possible when those reference changes and you want to keep a history that, by de nition, doesn’t change.
•The schema of your main table evolves and the history table shouldn’t rewrite the history for rows already written.
Chapter 31 Denormalization j 271
The second point depends on your business needs. It might be possible to add new columns to both the main table and its history table when the processing done on the historical records is pretty light, i.e. mainly listing and comparing.
An alternative to classic history tables, when using PostgreSQL, takes advantage of the advanced data type JSONB.
1create schema if not exists archive;
2
3create type archive.action_t
4as enum('insert', 'update', 'delete');
5
6create table archive.older_versions
7(
8table_name text,
9 |
date |
timestamptz default now(), |
10 |
action |
archive.action_t, |
11 |
data |
jsonb |
12 |
); |
|
Then it’s possible to ll in the archive older_versions table with data from another table:
1 insert into archive.older_versions(table_name, action, data) 2 select 'hashtag', 'delete', row_to_json(hashtag)
3from hashtag
4where id = 720554371822432256
5returning table_name, date, action, jsonb_pretty(data) as data;
This returns:
─[ RECORD 1 ]──────────────────────────────────────────────────────────────── |
|
||
table_name │ hashtag |
|
||
date |
│ 2017-09-12 23:04:56.100749+02 |
|
|
action |
│ delete |
|
|
data |
│ { |
|
|
|
│ |
"id": 720554371822432256, |
|
|
│ |
"date": "2016-04-14T10:08:00+02:00", |
|
|
│ |
"uname": "Brand 1LIVESTEW", |
|
│"message": "#FB @ Atlanta, Georgia https://t.co/mUJdxaTbyC",
│ |
"hashtags": [ |
|
│ |
"#FB" |
|
│ |
], |
|
│ |
"location": "(-84.3881,33.7489)" |
|
│ } |
|
|
INSERT 0 1
When using the PostgreSQL extension hstore it is also possible to compute the diff between versions thanks to the support for the - operator on this data type.
Chapter 31 Denormalization j 272
Recording the data as jsonb or hstore in the history table allows for having a single table for a whole application. More importantly, it means that dealing with an application life cycle where the database model evolves is allowed as well as dealing with di ferent versions of objects into the same archive.
As seen in the previous sections though, dealing with jsonb in PostgreSQL is quite powerful, but not as powerful as dealing with the full power of a structured data model with an advanced SQL engine. That said, of en enough the application and business needs surrounding the history entries are relaxed compared to live data processing.
Validity Period as a Range
As we already covered in the rates example already, a variant of the historic table requirement is when your application even needs to process the data even af er its date of validity. When doing nancial analysis or accounting, it is crucial to relate an invoice in a foreign currency to the valid exchange rate at the time of the invoice rather than the most current value of the currency.
1create table rates
2(
3currency text,
4validity daterange,
5 |
rate |
numeric, |
6 |
|
|
7exclude using gist (currency with =,
8 |
validity with &&) |
9);
An example of using this model follows:
1 select currency, validity, rate
2from rates
3where currency = 'Euro'
4and validity @> date '2017-05-18';
And here’s what the application would receive, a single line of data of course, thanks to the exclude using constraint:
currency │ |
validity |
│ |
rate |
══════════╪═════════════════════════╪══════════
Euro |
│ [2017-05-18,2017-05-19) │ 1.240740 |
(1 row)
Chapter 31 Denormalization j 273
This query is kept fast thanks to the special GiST indexing, as we can see in the query plan:
1 \pset format wrapped
2\pset columns 57
3
4explain
5 select currency, validity, rate
6from rates
7where currency = 'Euro'
8and validity @> date '2017-05-18';
QUERY PLAN
═════════════════════════════════════════════════════════
Index Scan using rates_currency_validity_excl on rates …
…(cost=0.15..8.17 rows=1 width=34)
Index Cond: ((currency = 'Euro'::text) AND (validity …
…@> '2017-05-18'::date)) (2 rows)
So when you need to keep around values that are only valid for a period of time, consider using the PostgreSQL range data types and the exclusion constraint that guarantees no overlapping of values in your data set. This is a powerful technique.
Pre-Computed Values
In some cases, the application keeps computing the same derived values each time it accesses to the data. It’s easy to pre-compute the value with PostgreSQL:
•As a default value for the column if the computation rules only include information available in the same tuple
• With a before tri er that computes the value and stores it into a column right in your table
Triggers are addressed later in this book with an example to solve this use case.
Enumerated Types
It is possible to use ENUM rather than a reference table.
