- •About…
- •About the Book
- •About the Author
- •Acknowledgements
- •About the organisation of the books
- •Structured Query Language
- •A First Use Case
- •Loading the Data Set
- •Application Code and SQL
- •Back to Discovering SQL
- •Computing Weekly Changes
- •Software Architecture
- •Why PostgreSQL?
- •The PostgreSQL Documentation
- •Getting Ready to read this Book
- •Business Logic
- •Every SQL query embeds some business logic
- •Business Logic Applies to Use Cases
- •Correctness
- •Efficiency
- •Stored Procedures — a Data Access API
- •Procedural Code and Stored Procedures
- •Where to Implement Business Logic?
- •A Small Application
- •Readme First Driven Development
- •Chinook Database
- •Top-N Artists by Genre
- •Intro to psql
- •The psqlrc Setup
- •Transactions and psql Behavior
- •Discovering a Schema
- •Interactive Query Editor
- •SQL is Code
- •SQL style guidelines
- •Comments
- •Unit Tests
- •Regression Tests
- •A Closer Look
- •Indexing Strategy
- •Indexing for Queries
- •Choosing Queries to Optimize
- •PostgreSQL Index Access Methods
- •Advanced Indexing
- •Adding Indexes
- •An Interview with Yohann Gabory
- •Get Some Data
- •Structured Query Language
- •Queries, DML, DDL, TCL, DCL
- •Select, From, Where
- •Anatomy of a Select Statement
- •Projection (output): Select
- •Restrictions: Where
- •Order By, Limit, No Offset
- •Ordering with Order By
- •kNN Ordering and GiST indexes
- •Top-N sorts: Limit
- •No Offset, and how to implement pagination
- •Group By, Having, With, Union All
- •Aggregates (aka Map/Reduce): Group By
- •Aggregates Without a Group By
- •Restrict Selected Groups: Having
- •Grouping Sets
- •Common Table Expressions: With
- •Distinct On
- •Result Sets Operations
- •Understanding Nulls
- •Three-Valued Logic
- •Not Null Constraints
- •Outer Joins Introducing Nulls
- •Using Null in Applications
- •Understanding Window Functions
- •Windows and Frames
- •Partitioning into Different Frames
- •Available Window Functions
- •When to Use Window Functions
- •Relations
- •SQL Join Types
- •An Interview with Markus Winand
- •Serialization and Deserialization
- •Some Relational Theory
- •Attribute Values, Data Domains and Data Types
- •Consistency and Data Type Behavior
- •PostgreSQL Data Types
- •Boolean
- •Character and Text
- •Server Encoding and Client Encoding
- •Numbers
- •Floating Point Numbers
- •Sequences and the Serial Pseudo Data Type
- •Universally Unique Identifier: UUID
- •Date/Time and Time Zones
- •Time Intervals
- •Date/Time Processing and Querying
- •Network Address Types
- •Denormalized Data Types
- •Arrays
- •Composite Types
- •Enum
- •PostgreSQL Extensions
- •An interview with Grégoire Hubert
- •Object Relational Mapping
- •Tooling for Database Modeling
- •How to Write a Database Model
- •Generating Random Data
- •Modeling Example
- •Normalization
- •Data Structures and Algorithms
- •Normal Forms
- •Database Anomalies
- •Modeling an Address Field
- •Primary Keys
- •Foreign Keys Constraints
- •Not Null Constraints
- •Check Constraints and Domains
- •Exclusion Constraints
- •Practical Use Case: Geonames
- •Features
- •Countries
- •Modelization Anti-Patterns
- •Entity Attribute Values
- •Multiple Values per Column
- •UUIDs
- •Denormalization
- •Premature Optimization
- •Functional Dependency Trade-Offs
- •Denormalization with PostgreSQL
- •Materialized Views
- •History Tables and Audit Trails
- •Validity Period as a Range
- •Pre-Computed Values
- •Enumerated Types
- •Multiple Values per Attribute
- •The Spare Matrix Model
- •Denormalize wih Care
- •Not Only SQL
- •Schemaless Design in PostgreSQL
- •Durability Trade-Offs
- •Another Small Application
- •Insert, Update, Delete
- •Insert Into
- •Insert Into … Select
- •Update
- •Inserting Some Tweets
- •Delete
- •Tuples and Rows
- •Deleting All the Rows: Truncate
- •Isolation and Locking
- •About SSI
- •Putting Concurrency to the Test
- •Computing and Caching in SQL
- •Views
- •Materialized Views
- •Triggers
- •Transactional Event Driven Processing
- •Trigger and Counters Anti-Pattern
- •Fixing the Behavior
- •Event Triggers
- •Listen and Notify
- •PostgreSQL Notifications
- •Notifications and Cache Maintenance
- •Listen and Notify Support in Drivers
- •Batch Update, MoMA Collection
- •Updating the Data
- •Concurrency Patterns
- •On Conflict Do Nothing
- •An Interview with Kris Jenkins
- •Installing and Using PostgreSQL Extensions
- •Finding PostgreSQL Extensions
- •A Short List of Noteworthy Extensions
- •Auditing Changes with hstore
- •Introduction to hstore
- •Comparing hstores
- •Auditing Changes with a Trigger
- •Testing the Audit Trigger
- •From hstore Back to a Regular Record
- •Last.fm Million Song Dataset
- •Using Trigrams For Typos
- •The pg_trgm PostgreSQL Extension
- •Trigrams, Similarity and Searches
- •Complete and Suggest Song Titles
- •Trigram Indexing
- •Denormalizing Tags with intarray
- •Advanced Tag Indexing
- •User-Defined Tags Made Easy
- •The Most Popular Pub Names
- •A Pub Names Database
- •Normalizing the Data
- •Geolocating the Nearest Pub (k-NN search)
- •How far is the nearest pub?
- •The earthdistance PostgreSQL contrib
- •Pubs and Cities
- •The Most Popular Pub Names by City
- •Geolocation with PostgreSQL
- •Geolocation Data Loading
- •Geolocation Metadata
- •Emergency Pub
- •Counting Distinct Users with HyperLogLog
- •HyperLogLog
- •Installing postgresql-hll
- •Counting Unique Tweet Visitors
- •Lossy Unique Count with HLL
- •Getting the Visits into Unique Counts
- •Scheduling Estimates Computations
- •Combining Unique Visitors
- •An Interview with Craig Kerstiens
7 
SQL is Code
The rst step here is realizing that your database engine actually is part of your application logic. Any SQL statement you write, even the simplest possible, does embed some logic: you are projecting a particular set of columns, ltering the result to only a part of the available data set (thanks to the where clause), and you want to receive the result in a known ordering. That is already is business logic. Application code is written in SQL.
We compared a simple eight-line SQL query and the typical object model code solving the same use case earlier and analyzed its correctness and e ciency issues. Then in the previous section, we approached a good way to have your SQL queries as .sql les in your code base.
Now that SQL is actually code in your application’s source tree, we need to apply the same methodology that you’re used to: set a minimum level of expected quality thanks to common indentation rules, code comments, consistent naming, unit testing, and code revision systems.
SQL style guidelines
Code style is mainly about following the principle of least astonishment rule. That’s why having a clear internal style guide that every developer follows is important in larger teams. We are going to cover several aspects of SQL code style here, from indentation and to alias names.
Chapter 7 SQL is Code j 61
Indenting is a tool aimed at making it easy to read the code. Let’s face it: we spend more time reading code than writing it, so we should always optimize for easy to read the code. SQL is code, so it needs to be properly indented.
Let’s see a few examples of bad and good style so that you can decide about your local guidelines.
1SELECT title, name FROM album LEFT JOIN track USING(albumid) WHERE albumid = 1 ORDER
Here we have a run-away query all on the same line, making it more di cult than it should for a reader to grasp what the query is all about. Also, the query is using the old habit of all-caps SQL keywords. While it’s true that SQL started out a long time ago, we now have color screens and syntax highlighting and we don’t write all-caps code anymore… not even in SQL.
My advice is to right align top-level SQL clauses and have them on new lines:
1select title, name
2 from album left join track using(albumid)
3 where albumid = 1
4order by 2;
Now it’s quite a bit easier to understand the structure of this query at a glance and to realize that it is indeed a very basic SQL statement. Moreover, it’s easier tospot a problem in the query: order by 2. SQL allowsone to use output column number as references in some of its clauses, which is very useful at the prompt (because we are all lazy, right?). It makes refactoring harder than it should be though. If we now decide we don’t want to output the album’s name with each track’s row in the result set, as we are actually interested in the track’s title and duration, as found in the milliseconds column:
1select name, milliseconds
2 from album left join track using(albumid)
3 where albumid = 1
4order by 2;
So now the ordering has changed, so you need also to change the order by clause, obtaining the following di f:
1@@ -1,4 +1,4 @@
2 |
- |
select |
title, name |
3 |
+ |
select |
name, milliseconds |
4 |
|
from |
album left join track using(albumid) |
5 |
|
where |
albumid = 1 |
6 |
-order by |
2; |
|
7 |
+order by |
1; |
|
Chapter 7 SQL is Code j 62
This is a very simple example, but nonetheless we can see that the review process now has to take into account why the order by clause is modi ed when what you want to achieve is changing the columns returned.
Now, the right ordering for this query might actually be to return the tracks in the order they appear on the album, which seems to be handled in the Chinook model by the trackid itself, so it’s better to use that:
1select name, milliseconds
2 from album left join track using(albumid)
3 where albumid = 1
4order by trackid;
This query is now about to be ready to be checked in into your application’s code base, tested and reviewed. An alternative writing would require splitting the from clause into one source relation per line, having the join appearing more clearly:
1select name, milliseconds
2 |
from |
|
album |
3 |
|
left join |
track using(albumid) |
4 |
where |
albumid = |
1 |
5order by trackid;
In this style, we see that we indent the join clauses nested in the from clause, because that’s the semantics of an SQL query. Also, we lef align the table names that take part of the join. An alternative style consists of also entering the join clause (one of either on or using) in a separate line too:
1select name, milliseconds
2 |
from |
album |
3left join track
4 |
using(albumid) |
5 |
where albumid = 1 |
6order by trackid;
This extended style is useful when using subqueries, so let’s fetch track information from albums we get in a subquery:
1select title, name, milliseconds
2from (
3select albumid, title
4 |
from |
album |
5 |
|
join artist using(artistid) |
6where artist.name = 'AC/DC'
7)
8as artist_albums
9 |
left join track |
Chapter 7 SQL is Code j 63
10 |
using(albumid) |
11 |
order by trackid; |
One of the key things to think about in terms of the style you pick is being consistent. That’s why in the previous example we also split the from clause in the subquery, even though it’s a very simple clause that’s not surprising.
SQL requires using parens for subqueries, and we can put that requirement to good use in the way we indent our queries, as shown above.
Another habit that is worth mentioning here consists of writing the join conditions of inner joins in the where clause:
1SELECT name, title
2FROM artist, album
3 WHERE artist.artistid = album.artistid
4AND artist.artistid = 1;
This style reminds us of the 70s and 80s before when the SQL standard did specify the join semantics and the join condition. It is extremely confusing to use such a style and doing it is frowned upon. The modern SQL spelling looks like the following:
1 select name, title
2from artist
3 |
inner join album using(artistid) |
4where artist.artistid = 1;
Here I expanded the inner join to its full notation. The SQL standard introduces noise words in the syntax, and both inner and outer are noise words: a left, right or full join is always an outer join, and a straight join always is an inner join.
It is also possible to use the natural join here, which will automatically expand a join condition over columns having the same name:
1select name, title
2 from artist natural join album
3where artist.artistid = 1;
General wisdom dictates that one should avoid natural joins: you can (and will) change your query semantics by merely adding a column to or removing a column from a table! In the Chinook model, we have ve di ferent tables with a name column, none of those being part of the primary key. In most cases, you don’t want to join tables on the name column…
Because it’s fun to do so, let’s write a query to nd out if the Chinook data set includescasesofatrackbeingnamedaf eranotherartist’s, perhapsre ectingtheir
