- •About…
- •About the Book
- •About the Author
- •Acknowledgements
- •About the organisation of the books
- •Structured Query Language
- •A First Use Case
- •Loading the Data Set
- •Application Code and SQL
- •Back to Discovering SQL
- •Computing Weekly Changes
- •Software Architecture
- •Why PostgreSQL?
- •The PostgreSQL Documentation
- •Getting Ready to read this Book
- •Business Logic
- •Every SQL query embeds some business logic
- •Business Logic Applies to Use Cases
- •Correctness
- •Efficiency
- •Stored Procedures — a Data Access API
- •Procedural Code and Stored Procedures
- •Where to Implement Business Logic?
- •A Small Application
- •Readme First Driven Development
- •Chinook Database
- •Top-N Artists by Genre
- •Intro to psql
- •The psqlrc Setup
- •Transactions and psql Behavior
- •Discovering a Schema
- •Interactive Query Editor
- •SQL is Code
- •SQL style guidelines
- •Comments
- •Unit Tests
- •Regression Tests
- •A Closer Look
- •Indexing Strategy
- •Indexing for Queries
- •Choosing Queries to Optimize
- •PostgreSQL Index Access Methods
- •Advanced Indexing
- •Adding Indexes
- •An Interview with Yohann Gabory
- •Get Some Data
- •Structured Query Language
- •Queries, DML, DDL, TCL, DCL
- •Select, From, Where
- •Anatomy of a Select Statement
- •Projection (output): Select
- •Restrictions: Where
- •Order By, Limit, No Offset
- •Ordering with Order By
- •kNN Ordering and GiST indexes
- •Top-N sorts: Limit
- •No Offset, and how to implement pagination
- •Group By, Having, With, Union All
- •Aggregates (aka Map/Reduce): Group By
- •Aggregates Without a Group By
- •Restrict Selected Groups: Having
- •Grouping Sets
- •Common Table Expressions: With
- •Distinct On
- •Result Sets Operations
- •Understanding Nulls
- •Three-Valued Logic
- •Not Null Constraints
- •Outer Joins Introducing Nulls
- •Using Null in Applications
- •Understanding Window Functions
- •Windows and Frames
- •Partitioning into Different Frames
- •Available Window Functions
- •When to Use Window Functions
- •Relations
- •SQL Join Types
- •An Interview with Markus Winand
- •Serialization and Deserialization
- •Some Relational Theory
- •Attribute Values, Data Domains and Data Types
- •Consistency and Data Type Behavior
- •PostgreSQL Data Types
- •Boolean
- •Character and Text
- •Server Encoding and Client Encoding
- •Numbers
- •Floating Point Numbers
- •Sequences and the Serial Pseudo Data Type
- •Universally Unique Identifier: UUID
- •Date/Time and Time Zones
- •Time Intervals
- •Date/Time Processing and Querying
- •Network Address Types
- •Denormalized Data Types
- •Arrays
- •Composite Types
- •Enum
- •PostgreSQL Extensions
- •An interview with Grégoire Hubert
- •Object Relational Mapping
- •Tooling for Database Modeling
- •How to Write a Database Model
- •Generating Random Data
- •Modeling Example
- •Normalization
- •Data Structures and Algorithms
- •Normal Forms
- •Database Anomalies
- •Modeling an Address Field
- •Primary Keys
- •Foreign Keys Constraints
- •Not Null Constraints
- •Check Constraints and Domains
- •Exclusion Constraints
- •Practical Use Case: Geonames
- •Features
- •Countries
- •Modelization Anti-Patterns
- •Entity Attribute Values
- •Multiple Values per Column
- •UUIDs
- •Denormalization
- •Premature Optimization
- •Functional Dependency Trade-Offs
- •Denormalization with PostgreSQL
- •Materialized Views
- •History Tables and Audit Trails
- •Validity Period as a Range
- •Pre-Computed Values
- •Enumerated Types
- •Multiple Values per Attribute
- •The Spare Matrix Model
- •Denormalize wih Care
- •Not Only SQL
- •Schemaless Design in PostgreSQL
- •Durability Trade-Offs
- •Another Small Application
- •Insert, Update, Delete
- •Insert Into
- •Insert Into … Select
- •Update
- •Inserting Some Tweets
- •Delete
- •Tuples and Rows
- •Deleting All the Rows: Truncate
- •Isolation and Locking
- •About SSI
- •Putting Concurrency to the Test
- •Computing and Caching in SQL
- •Views
- •Materialized Views
- •Triggers
- •Transactional Event Driven Processing
- •Trigger and Counters Anti-Pattern
- •Fixing the Behavior
- •Event Triggers
- •Listen and Notify
- •PostgreSQL Notifications
- •Notifications and Cache Maintenance
- •Listen and Notify Support in Drivers
- •Batch Update, MoMA Collection
- •Updating the Data
- •Concurrency Patterns
- •On Conflict Do Nothing
- •An Interview with Kris Jenkins
- •Installing and Using PostgreSQL Extensions
- •Finding PostgreSQL Extensions
- •A Short List of Noteworthy Extensions
- •Auditing Changes with hstore
- •Introduction to hstore
- •Comparing hstores
- •Auditing Changes with a Trigger
- •Testing the Audit Trigger
- •From hstore Back to a Regular Record
- •Last.fm Million Song Dataset
- •Using Trigrams For Typos
- •The pg_trgm PostgreSQL Extension
- •Trigrams, Similarity and Searches
- •Complete and Suggest Song Titles
- •Trigram Indexing
- •Denormalizing Tags with intarray
- •Advanced Tag Indexing
- •User-Defined Tags Made Easy
- •The Most Popular Pub Names
- •A Pub Names Database
- •Normalizing the Data
- •Geolocating the Nearest Pub (k-NN search)
- •How far is the nearest pub?
- •The earthdistance PostgreSQL contrib
- •Pubs and Cities
- •The Most Popular Pub Names by City
- •Geolocation with PostgreSQL
- •Geolocation Data Loading
- •Geolocation Metadata
- •Emergency Pub
- •Counting Distinct Users with HyperLogLog
- •HyperLogLog
- •Installing postgresql-hll
- •Counting Unique Tweet Visitors
- •Lossy Unique Count with HLL
- •Getting the Visits into Unique Counts
- •Scheduling Estimates Computations
- •Combining Unique Visitors
- •An Interview with Craig Kerstiens
Chapter 36 Isolation and Locking j 315
activity records associated with a given messageid:
1select count(*) filter(where action = 'rt')
2- count(*) filter(where action = 'de-rt')
3as rts,
4count(*) filter(where action = 'fav')
5- count(*) filter(where action = 'de-fav')
6as favs
7from tweet.activity
8 |
join tweet.message using(messageid) |
9where messageid = :id;
Reading the current counter value has become quite complex when compared to just adding a column to your query output list. On the other hand, when adding a rt or a fav action to a message, we transform the SQL:
1update tweet.message set rts = rts +1 where messageid = :id;
This is what we use instead:
1insert into tweet.activity(messageid, action) values(:id, 'rt');
The reason why replacing an update with an insert is interesting is concurrency behavior and locking. In the rst version, retweeting has to wait until all concurrent retweets are done, and the business model wants to sustain as many concurrent activities on the same small set of messages as possible (read about influencer accounts).
The insert has no concurrency because it targets a row that doesn’t exist yet. We register each action into its own tuple and require no locking to do that, allowing our production setup of PostgreSQL to sustain a much larger load.
Now, computing the counters each time we want to display them is costly. And the counters are displayed on every tweet message. We need a way to cache that information, and we’ll see about that in the Computing and Caching in SQL section.
Putting Concurrency to the Test
When we benchmark the concurrency properties of the two statements above, we quickly realize that the activity table is badly designed. The unique constraint includes a timestamptz eld, which in PostgreSQL is only precise down to the microsecond.
Chapter 36 Isolation and Locking j 316
This kind of made-up unique constraint means we now have these errors to deal with:
Error: Database |
error 23505: duplicate key value violates unique |
|
constraint "activity_messageid_datetime_action_key" |
|
|
DETAIL: Key (messageid, datetime, action) |
|
|
=(2, |
2017-09-19 18:00:03.831818+02, rt) already exists. |
|
The best course of action here is to do this:
1alter table tweet.activity
2drop constraint activity_messageid_datetime_action_key;
Now we can properly compare the concurrency scaling of the insert and the update based version. In case you might be curious about it, here’s the testing code that’s been used:
1 (defpackage #:concurrency
2(:use #:cl #:appdev)
3(:import-from #:lparallel
4 |
#:*kernel* |
5 |
#:make-kernel #:make-channel |
6 |
#:submit-task #:receive-result |
7 |
#:kernel-worker-index) |
8(:import-from #:cl-postgres-error
9 |
|
#:database-error) |
10 |
(:export |
#:*connspec* |
11 |
|
#:concurrency-test)) |
12 |
|
|
13 |
(in-package #:concurrency) |
|
14 |
|
|
15 |
(defparameter *connspec* '("appdev" "dim" nil "localhost")) |
|
16 |
|
|
17(defparameter *insert-rt*
18"insert into tweet.activity(messageid, action) values($1, 'rt')")
19
20(defparameter *update-rt*
21"update tweet.message set rts = coalesce(rts, 0) + 1 where messageid = $1")
22 |
|
23 |
(defun concurrency-test (workers retweets messageid |
24 |
&optional (connspec *connspec*)) |
25(format t "Starting benchmark for updates~%")
26(with-timing (rts seconds)
27(run-workers workers retweets messageid *update-rt* connspec)
28(format t "Updating took ~f seconds, did ~d rts~%" seconds rts))
29 |
|
30 |
(format t "~%") |
31 |
|
32(format t "Starting benchmark for inserts~%")
33(with-timing (rts seconds)
34(run-workers workers retweets messageid *insert-rt* connspec)
35(format t "Inserting took ~f seconds, did ~d rts~%" seconds rts)))
Chapter 36 Isolation and Locking j 317
36 |
|
37 |
(defun run-workers (workers retweets messageid sql |
38 |
&optional (connspec *connspec*)) |
39(let* ((*kernel* (lparallel:make-kernel workers))
40(channel (lparallel:make-channel)))
41(loop repeat workers
42do (lparallel:submit-task channel #'retweet-many-times
Chapter 36 Isolation and Locking j 318
for a free slot.
In another test with even more concurrency pressure at 50 retweets per worker, we can show that the results are repeatable:
1 CL-USER> (concurrency::concurrency-test 100 50 6)
2Starting benchmark for updates
3Updating took 5.070135 seconds, did 5000 rts
4
5Starting benchmark for inserts
6Inserting took 3.739505 seconds, did 5000 rts
If you know that your application has to scale, think about how to avoid concurrent activity that competes against a single shared resource. Here, this shared resource is the rts eld of the tweet.message row that you target, and the concurrency behavior is going to be ne if the retweet activity is well distributed. As soon as many users want to retweet the same message, then the update solution has a non-trivial scalability impact.
Now, we’re going to implement the tweet.activity based model. In this model, the number of retweets needs to be computed each time we display it, and it’s part of the visible data. Also, in the general case, it’s impossible for our users to know for sure how many retweets have been made so that we can implement a cache with eventual consistency properties.
