- •About…
- •About the Book
- •About the Author
- •Acknowledgements
- •About the organisation of the books
- •Structured Query Language
- •A First Use Case
- •Loading the Data Set
- •Application Code and SQL
- •Back to Discovering SQL
- •Computing Weekly Changes
- •Software Architecture
- •Why PostgreSQL?
- •The PostgreSQL Documentation
- •Getting Ready to read this Book
- •Business Logic
- •Every SQL query embeds some business logic
- •Business Logic Applies to Use Cases
- •Correctness
- •Efficiency
- •Stored Procedures — a Data Access API
- •Procedural Code and Stored Procedures
- •Where to Implement Business Logic?
- •A Small Application
- •Readme First Driven Development
- •Chinook Database
- •Top-N Artists by Genre
- •Intro to psql
- •The psqlrc Setup
- •Transactions and psql Behavior
- •Discovering a Schema
- •Interactive Query Editor
- •SQL is Code
- •SQL style guidelines
- •Comments
- •Unit Tests
- •Regression Tests
- •A Closer Look
- •Indexing Strategy
- •Indexing for Queries
- •Choosing Queries to Optimize
- •PostgreSQL Index Access Methods
- •Advanced Indexing
- •Adding Indexes
- •An Interview with Yohann Gabory
- •Get Some Data
- •Structured Query Language
- •Queries, DML, DDL, TCL, DCL
- •Select, From, Where
- •Anatomy of a Select Statement
- •Projection (output): Select
- •Restrictions: Where
- •Order By, Limit, No Offset
- •Ordering with Order By
- •kNN Ordering and GiST indexes
- •Top-N sorts: Limit
- •No Offset, and how to implement pagination
- •Group By, Having, With, Union All
- •Aggregates (aka Map/Reduce): Group By
- •Aggregates Without a Group By
- •Restrict Selected Groups: Having
- •Grouping Sets
- •Common Table Expressions: With
- •Distinct On
- •Result Sets Operations
- •Understanding Nulls
- •Three-Valued Logic
- •Not Null Constraints
- •Outer Joins Introducing Nulls
- •Using Null in Applications
- •Understanding Window Functions
- •Windows and Frames
- •Partitioning into Different Frames
- •Available Window Functions
- •When to Use Window Functions
- •Relations
- •SQL Join Types
- •An Interview with Markus Winand
- •Serialization and Deserialization
- •Some Relational Theory
- •Attribute Values, Data Domains and Data Types
- •Consistency and Data Type Behavior
- •PostgreSQL Data Types
- •Boolean
- •Character and Text
- •Server Encoding and Client Encoding
- •Numbers
- •Floating Point Numbers
- •Sequences and the Serial Pseudo Data Type
- •Universally Unique Identifier: UUID
- •Date/Time and Time Zones
- •Time Intervals
- •Date/Time Processing and Querying
- •Network Address Types
- •Denormalized Data Types
- •Arrays
- •Composite Types
- •Enum
- •PostgreSQL Extensions
- •An interview with Grégoire Hubert
- •Object Relational Mapping
- •Tooling for Database Modeling
- •How to Write a Database Model
- •Generating Random Data
- •Modeling Example
- •Normalization
- •Data Structures and Algorithms
- •Normal Forms
- •Database Anomalies
- •Modeling an Address Field
- •Primary Keys
- •Foreign Keys Constraints
- •Not Null Constraints
- •Check Constraints and Domains
- •Exclusion Constraints
- •Practical Use Case: Geonames
- •Features
- •Countries
- •Modelization Anti-Patterns
- •Entity Attribute Values
- •Multiple Values per Column
- •UUIDs
- •Denormalization
- •Premature Optimization
- •Functional Dependency Trade-Offs
- •Denormalization with PostgreSQL
- •Materialized Views
- •History Tables and Audit Trails
- •Validity Period as a Range
- •Pre-Computed Values
- •Enumerated Types
- •Multiple Values per Attribute
- •The Spare Matrix Model
- •Denormalize wih Care
- •Not Only SQL
- •Schemaless Design in PostgreSQL
- •Durability Trade-Offs
- •Another Small Application
- •Insert, Update, Delete
- •Insert Into
- •Insert Into … Select
- •Update
- •Inserting Some Tweets
- •Delete
- •Tuples and Rows
- •Deleting All the Rows: Truncate
- •Isolation and Locking
- •About SSI
- •Putting Concurrency to the Test
- •Computing and Caching in SQL
- •Views
- •Materialized Views
- •Triggers
- •Transactional Event Driven Processing
- •Trigger and Counters Anti-Pattern
- •Fixing the Behavior
- •Event Triggers
- •Listen and Notify
- •PostgreSQL Notifications
- •Notifications and Cache Maintenance
- •Listen and Notify Support in Drivers
- •Batch Update, MoMA Collection
- •Updating the Data
- •Concurrency Patterns
- •On Conflict Do Nothing
- •An Interview with Kris Jenkins
- •Installing and Using PostgreSQL Extensions
- •Finding PostgreSQL Extensions
- •A Short List of Noteworthy Extensions
- •Auditing Changes with hstore
- •Introduction to hstore
- •Comparing hstores
- •Auditing Changes with a Trigger
- •Testing the Audit Trigger
- •From hstore Back to a Regular Record
- •Last.fm Million Song Dataset
- •Using Trigrams For Typos
- •The pg_trgm PostgreSQL Extension
- •Trigrams, Similarity and Searches
- •Complete and Suggest Song Titles
- •Trigram Indexing
- •Denormalizing Tags with intarray
- •Advanced Tag Indexing
- •User-Defined Tags Made Easy
- •The Most Popular Pub Names
- •A Pub Names Database
- •Normalizing the Data
- •Geolocating the Nearest Pub (k-NN search)
- •How far is the nearest pub?
- •The earthdistance PostgreSQL contrib
- •Pubs and Cities
- •The Most Popular Pub Names by City
- •Geolocation with PostgreSQL
- •Geolocation Data Loading
- •Geolocation Metadata
- •Emergency Pub
- •Counting Distinct Users with HyperLogLog
- •HyperLogLog
- •Installing postgresql-hll
- •Counting Unique Tweet Visitors
- •Lossy Unique Count with HLL
- •Getting the Visits into Unique Counts
- •Scheduling Estimates Computations
- •Combining Unique Visitors
- •An Interview with Craig Kerstiens
5 
A Small Application
Let’s write a very basic application where we’re going to compare using either classic application code or SQL to solve some common problems. Our goal in this section is to be confronted with managing SQL as part of a code base, and show when to use classic application code or SQL.
Readme First Driven Development
Before writing any code or tests or anything, I like to write the readme rst. That’s this little le explaining to the user why to care for about the application, and maybe some details about how to use it. Let’s do that now.
The cdstore application is a very simple wrapper on top of the Chinook database. The Chinook data model represents a digital media store, including tables for artists, albums, media tracks, invoices, and customers.
The cdstore application allows listing useful information and reports on top of the database, and also provides a way to generate some activity.
Chapter 5 A Small Application j 42
Loading the Dataset
When I used the Chinook dataset rst, it didn’t support PostgreSQL, so I used theSQLitedataoutput, whichnicely tsintoasmallenoughdata le. Nowadays you will nd a PostgreSQL backup le that you can use. It’s easier for me to just use pgloader though, so I will just do that.
Another advantage of using pgloader in this book is that we have the following summary output, which lists tables and how many rows we loaded for each of them. This is the rst encounter with our dataset.
Here’s a truncated output from the pgloader run (edited so that it can t in the book page format):
$ |
createdb chinook |
|
|
|
|
|
$ |
pgloader https://github.com/lerocha/chinook-database/raw/master |
|||||
|
/ChinookDatabase/DataSources |
|
|
|
||
|
/Chinook_Sqlite_AutoIncrementPKs.sqlite |
|
|
|||
|
pgsql:///chinook |
|
|
|
|
|
... |
|
|
|
|
|
|
|
table name |
errors |
rows |
bytes |
total time |
|
----------------------- |
|
--------- |
--------- |
--------- |
-------------- |
|
|
fetch |
0 |
0 |
|
|
1.611s |
|
fetch meta data |
0 |
33 |
|
|
0.050s |
|
Create Schemas |
0 |
0 |
|
|
0.002s |
|
Create SQL Types |
0 |
0 |
|
|
0.008s |
|
Create tables |
0 |
22 |
|
|
0.092s |
|
Set Table OIDs |
0 |
11 |
|
|
0.017s |
----------------------- |
|
--------- |
--------- |
--------- |
-------------- |
|
|
artist |
0 |
275 |
6.8 |
kB |
0.026s |
|
album |
0 |
347 |
10.5 |
kB |
0.090s |
|
employee |
0 |
8 |
1.4 |
kB |
0.034s |
|
invoice |
0 |
412 |
31.0 |
kB |
0.059s |
|
mediatype |
0 |
5 |
0.1 |
kB |
0.083s |
|
playlisttrack |
0 |
8715 |
57.3 |
kB |
0.179s |
|
customer |
0 |
59 |
6.7 |
kB |
0.010s |
|
genre |
0 |
25 |
0.3 |
kB |
0.019s |
|
invoiceline |
0 |
2240 |
43.6 |
kB |
0.090s |
|
playlist |
0 |
18 |
0.3 |
kB |
0.056s |
|
track |
0 |
3503 |
236.6 |
kB |
0.192s |
----------------------- |
--------- |
--------- |
--------- |
-------------- |
||
COPY Threads Completion |
0 |
4 |
|
|
0.335s |
|
|
Create Indexes |
0 |
22 |
|
|
0.326s |
|
Index Build Completion |
0 |
22 |
|
|
0.088s |
|
Reset Sequences |
0 |
0 |
|
|
0.049s |
|
Primary Keys |
1 |
11 |
|
|
0.030s |
|
Create Foreign Keys |
0 |
11 |
|
|
0.065s |
|
Create Triggers |
0 |
0 |
|
|
0.000s |
|
Install Comments |
0 |
0 |
|
|
0.000s |
----------------------- |
--------- |
--------- |
--------- |
-------------- |
Total import time |
|
15607 |
394.5 kB |
0.893s |
|
|
Chapter 5 A Small Application j 43 |
Now that the dataset is loaded, we have to |
x a badly de ned primary key from |
|
the SQLite side of things: |
|
|
> \d track |
Table "public.track" |
|
|
||
Column |
│ Type │ |
Modifiers |
══════════════╪═════════╪═════════════════════════════════════════════════════════
trackid |
│ bigint |
│ |
not null default nextval('track_trackid_seq'::regclass) |
name |
│ text |
│ |
|
albumid |
│ bigint |
│ |
|
mediatypeid |
│ bigint |
│ |
|
genreid |
│ bigint |
│ |
|
composer |
│ text |
│ |
|
milliseconds │ bigint |
│ |
|
|
bytes |
│ bigint |
│ |
|
unitprice |
│ numeric │ |
|
|
Indexes: |
|
|
|
"idx_51519_ipk_track" |
UNIQUE, btree (trackid) |
||
"idx_51519_ifk_trackalbumid" btree (albumid) "idx_51519_ifk_trackgenreid" btree (genreid) "idx_51519_ifk_trackmediatypeid" btree (mediatypeid)
... foreign keys ...
> alter table track add primary key using index idx_51519_ipk_track; ALTER TABLE
Note that as PostgreSQL implements group by inference we need this primary key to exists in order to be able to run some of the following queries. This means that as soon as you’ve loaded the dataset, please x the primary key so that we are ready to play with the dataset.
Chinook Database
The Chinook database includes basic music elements such as album, artist, track, genre and mediatype for a music collection. Also, we nd the idea of a playlist with an association table playlisttrack, because any track can take part of several playlists and a single playlist is obviously made of several tracks.
Then there’s a model for a customer paying for some tracks with the tables staff, customer, invoice and invoiceline.
pgloader# \dt chinook.
|
List of |
relations |
Schema │ |
Name |
│ Type │ Owner |
═════════╪═══════════════╪═══════╪═══════
Chapter 5 A Small Application j 44
chinook │ album |
│ table │ dim |
chinook │ artist |
│ table │ dim |
chinook │ customer |
│ table │ dim |
chinook │ genre |
│ table │ dim |
chinook │ invoice |
│ table │ dim |
chinook │ invoiceline |
│ table │ dim |
chinook │ mediatype |
│ table │ dim |
chinook │ playlist |
│ table │ dim |
chinook │ playlisttrack │ table │ dim
chinook |
│ |
staff |
│ |
table |
│ |
dim |
chinook |
│ |
track |
│ |
table |
│ |
dim |
(11 rows)
With that in mind we can begin to explore the dataset with a simple query:
1select genre.name, count(*) as count
2 |
from |
genre |
|
3 |
|
left join track |
using(genreid) |
4group by genre.name
5order by count desc;
Which gives us:
name │ count
════════════════════╪═══════
Rock |
│ |
1297 |
Latin |
│ |
579 |
Metal |
│ |
374 |
Alternative & Punk │ |
332 |
|
Jazz |
│ |
130 |
TV Shows |
│ |
93 |
Blues |
│ |
81 |
Classical |
│ |
74 |
Drama |
│ |
64 |
R&B/Soul |
│ |
61 |
Reggae |
│ |
58 |
Pop |
│ |
48 |
Soundtrack |
│ |
43 |
Alternative |
│ |
40 |
Hip Hop/Rap |
│ |
35 |
Electronica/Dance |
│ |
30 |
Heavy Metal |
│ |
28 |
World |
│ |
28 |
Sci Fi & Fantasy |
│ |
26 |
Easy Listening |
│ |
24 |
Comedy |
│ |
17 |
Bossa Nova |
│ |
15 |
Science Fiction |
│ |
13 |
Rock And Roll |
│ |
12 |
Opera |
│ |
1 |
(25 rows)
Chapter 5 A Small Application j 45
Music Catalog
Now, back to our application. We are going to write it in Python, to make it easy to browse the code within the book.
Using the anosql Python library it is very easy to embed SQL code in Python and keep the SQL clean and tidy in .sql les. We will look at the Python side of things in a moment.
The artist.sql le looks like this:
1-- name: top-artists-by-album
2 -- Get the list of the N artists with the most albums
3select artist.name, count(*) as albums
4 |
from |
artist |
|
5 |
|
|
left join album using(artistid) |
6 |
group |
by |
artist.name |
7 |
order |
by |
albums desc |
8limit :n;
Having .sql les in our source tree allows us to version control them with git, write comments when necessary, and also copy and paste the les between your application’s directory and the interactive psql shell.
In the case of our artist.sql variables and we use limit :n. PostgresQL shell:
le, we see the use of the anosql facility to name Here’s how to bene t from that directly in the
>\set n 1
>\i artist.sql
name |
│ albums |
═════════════╪════════
Iron Maiden │ |
21 |
(1 row)
>\set n 3
>\i artist.sql
name |
│ albums |
══════════════╪════════
Iron Maiden |
│ |
21 |
Led Zeppelin |
│ |
14 |
Deep Purple |
│ |
11 |
(3 rows)
Of course, you can also set the variable’s value from the command line, in case you want to integrate that into bash scripts or other calls:
1 psql --variable "n=10" -f artist.sql chinook
