- •About…
- •About the Book
- •About the Author
- •Acknowledgements
- •About the organisation of the books
- •Structured Query Language
- •A First Use Case
- •Loading the Data Set
- •Application Code and SQL
- •Back to Discovering SQL
- •Computing Weekly Changes
- •Software Architecture
- •Why PostgreSQL?
- •The PostgreSQL Documentation
- •Getting Ready to read this Book
- •Business Logic
- •Every SQL query embeds some business logic
- •Business Logic Applies to Use Cases
- •Correctness
- •Efficiency
- •Stored Procedures — a Data Access API
- •Procedural Code and Stored Procedures
- •Where to Implement Business Logic?
- •A Small Application
- •Readme First Driven Development
- •Chinook Database
- •Top-N Artists by Genre
- •Intro to psql
- •The psqlrc Setup
- •Transactions and psql Behavior
- •Discovering a Schema
- •Interactive Query Editor
- •SQL is Code
- •SQL style guidelines
- •Comments
- •Unit Tests
- •Regression Tests
- •A Closer Look
- •Indexing Strategy
- •Indexing for Queries
- •Choosing Queries to Optimize
- •PostgreSQL Index Access Methods
- •Advanced Indexing
- •Adding Indexes
- •An Interview with Yohann Gabory
- •Get Some Data
- •Structured Query Language
- •Queries, DML, DDL, TCL, DCL
- •Select, From, Where
- •Anatomy of a Select Statement
- •Projection (output): Select
- •Restrictions: Where
- •Order By, Limit, No Offset
- •Ordering with Order By
- •kNN Ordering and GiST indexes
- •Top-N sorts: Limit
- •No Offset, and how to implement pagination
- •Group By, Having, With, Union All
- •Aggregates (aka Map/Reduce): Group By
- •Aggregates Without a Group By
- •Restrict Selected Groups: Having
- •Grouping Sets
- •Common Table Expressions: With
- •Distinct On
- •Result Sets Operations
- •Understanding Nulls
- •Three-Valued Logic
- •Not Null Constraints
- •Outer Joins Introducing Nulls
- •Using Null in Applications
- •Understanding Window Functions
- •Windows and Frames
- •Partitioning into Different Frames
- •Available Window Functions
- •When to Use Window Functions
- •Relations
- •SQL Join Types
- •An Interview with Markus Winand
- •Serialization and Deserialization
- •Some Relational Theory
- •Attribute Values, Data Domains and Data Types
- •Consistency and Data Type Behavior
- •PostgreSQL Data Types
- •Boolean
- •Character and Text
- •Server Encoding and Client Encoding
- •Numbers
- •Floating Point Numbers
- •Sequences and the Serial Pseudo Data Type
- •Universally Unique Identifier: UUID
- •Date/Time and Time Zones
- •Time Intervals
- •Date/Time Processing and Querying
- •Network Address Types
- •Denormalized Data Types
- •Arrays
- •Composite Types
- •Enum
- •PostgreSQL Extensions
- •An interview with Grégoire Hubert
- •Object Relational Mapping
- •Tooling for Database Modeling
- •How to Write a Database Model
- •Generating Random Data
- •Modeling Example
- •Normalization
- •Data Structures and Algorithms
- •Normal Forms
- •Database Anomalies
- •Modeling an Address Field
- •Primary Keys
- •Foreign Keys Constraints
- •Not Null Constraints
- •Check Constraints and Domains
- •Exclusion Constraints
- •Practical Use Case: Geonames
- •Features
- •Countries
- •Modelization Anti-Patterns
- •Entity Attribute Values
- •Multiple Values per Column
- •UUIDs
- •Denormalization
- •Premature Optimization
- •Functional Dependency Trade-Offs
- •Denormalization with PostgreSQL
- •Materialized Views
- •History Tables and Audit Trails
- •Validity Period as a Range
- •Pre-Computed Values
- •Enumerated Types
- •Multiple Values per Attribute
- •The Spare Matrix Model
- •Denormalize wih Care
- •Not Only SQL
- •Schemaless Design in PostgreSQL
- •Durability Trade-Offs
- •Another Small Application
- •Insert, Update, Delete
- •Insert Into
- •Insert Into … Select
- •Update
- •Inserting Some Tweets
- •Delete
- •Tuples and Rows
- •Deleting All the Rows: Truncate
- •Isolation and Locking
- •About SSI
- •Putting Concurrency to the Test
- •Computing and Caching in SQL
- •Views
- •Materialized Views
- •Triggers
- •Transactional Event Driven Processing
- •Trigger and Counters Anti-Pattern
- •Fixing the Behavior
- •Event Triggers
- •Listen and Notify
- •PostgreSQL Notifications
- •Notifications and Cache Maintenance
- •Listen and Notify Support in Drivers
- •Batch Update, MoMA Collection
- •Updating the Data
- •Concurrency Patterns
- •On Conflict Do Nothing
- •An Interview with Kris Jenkins
- •Installing and Using PostgreSQL Extensions
- •Finding PostgreSQL Extensions
- •A Short List of Noteworthy Extensions
- •Auditing Changes with hstore
- •Introduction to hstore
- •Comparing hstores
- •Auditing Changes with a Trigger
- •Testing the Audit Trigger
- •From hstore Back to a Regular Record
- •Last.fm Million Song Dataset
- •Using Trigrams For Typos
- •The pg_trgm PostgreSQL Extension
- •Trigrams, Similarity and Searches
- •Complete and Suggest Song Titles
- •Trigram Indexing
- •Denormalizing Tags with intarray
- •Advanced Tag Indexing
- •User-Defined Tags Made Easy
- •The Most Popular Pub Names
- •A Pub Names Database
- •Normalizing the Data
- •Geolocating the Nearest Pub (k-NN search)
- •How far is the nearest pub?
- •The earthdistance PostgreSQL contrib
- •Pubs and Cities
- •The Most Popular Pub Names by City
- •Geolocation with PostgreSQL
- •Geolocation Data Loading
- •Geolocation Metadata
- •Emergency Pub
- •Counting Distinct Users with HyperLogLog
- •HyperLogLog
- •Installing postgresql-hll
- •Counting Unique Tweet Visitors
- •Lossy Unique Count with HLL
- •Getting the Visits into Unique Counts
- •Scheduling Estimates Computations
- •Combining Unique Visitors
- •An Interview with Craig Kerstiens
6 
The SQL REPL — An Interactive
Setup
PostgreSQL ships with an interactive console with the command line tool named psql. It can be used both for scripting and interactive usage and is moreover quite a powerful tool. Interactive features includes autocompletion, readline support (history searches, modern keyboard movements, etc), input and output redirection, formatted output, and more.
New users of PostgreSQL of en want to nd an advanced visual query editing tool and are confused when psql is the answer. Most PostgreSQL advanced users and experts don’t even think about it and use psql. In this chapter, you will learn how to fully appreciate that little command line tool.
Intro to psql
psql implements a REPL: the famous read-eval-print loop. It’s one of the best ways to interact with the computer when you’re just learning and trying things out. In the case of PostgreSQL you might be discovering a schema, a data set, or just working on a query.
We of en see the SQL query when it’s fully formed, and rarely get to see the steps that led us there. It’s the same with code, most of en what you get to see is its nal form, not the intermediary steps where the author tries things and re ne
Chapter 6 The SQL REPL — An Interactive Setup j 53
their understanding of the problem at hand, or the environment in which to solve it.
The process to follow to get to a complete and e cient SQL query is the same as when writing code: iterating from a very simple angle towards a full solution to the problem at hand. Having a REPL environment o fers an easy way to build up on what you just had before.
The psqlrc Setup
Here we begin with a full setup of psql and in the rest of the chapter, we are going to get back to each important point separately. Doing so allows you to have a fully working environment from the get-go and play around in your PostgreSQL console while reading the book.
\set PROMPT1 '%~%x%# ' \x auto
\set ON_ERROR_STOP on
\set ON_ERROR_ROLLBACK interactive
\pset null '¤'
\pset linestyle 'unicode'
\pset unicode_border_linestyle single \pset unicode_column_linestyle single \pset unicode_header_linestyle double set intervalstyle to 'postgres_verbose';
\setenv LESS '-iMFXSx4R'
\setenv EDITOR '/Applications/Emacs.app/Contents/MacOS/bin/emacsclient -nw'
Save that setup in the ~/.psqlrc le, which is read at startup by the psql application. As you’ve already read in the PostgreSQL documentation for psql, we have three di ferent settings to play with here:
• \set [ name [ value [ ... ] ] ]
This sets the psql variable name to value, or if more than one value is given, to the concatenation of all of them. If only one argument is given, the variable is set with an empty value. To unset a variable, use the \unset command.
• \setenv name [ value ]
This sets the environment variable name to value, or if the value is not supplied, unsets the environment variable.
Chapter 6 The SQL REPL — An Interactive Setup j 54
Here we use this facility to setup speci c environment variables we need from within psql, such as the LESS setup. It allows invoking the pager for each result set but having it take the control of the screen only when necessary.
• \pset [ option [ value ] ]
This command sets options a fecting the output of query result tables. option indicates which option is to be set. The semantics of value vary depending on the selected option. For some options, omitting value causes the option to be toggled or unset, as described under the particular option. If no such behavior is mentioned, then omitting value just results in the current setting being displayed.
Transactions and psql Behavior
In our case we set several psql variables that change its behavior:
• \set ON_ERROR_STOP on
The name is quite a good description of the option. It allows psql to know that it is not to continue trying to execute all your commands when a previous one is throwing an error. It’s primarily practical for scripts and can be also set using the command line. As we’ll see later, we can easily invoke scripts interactively within our session with the \i and \ir commands, so the option is still useful to us now.
• \set ON_ERROR_ROLLBACK interactive
This setting changes how psql behaves with respect to transactions. It is a very good interactive setup, and must be avoided in batch scripts.
From the documentation: When set to on, if a statement in a transaction blockgeneratesanerror, theerrorisignoredandthetransactioncontinues. When set to interactive, such errors are only ignored in interactive sessions, and not when reading script les. When unset or set to o f, a statement in a transaction block that generates an error aborts the entire transaction. The error rollback mode works by issuing an implicit SAVEPOINT for you, just before each command that is in a transaction block, and then rolling back to the savepoint if the command fails.
Chapter 6 The SQL REPL — An Interactive Setup j 55
With the \set PROMPT1 '%~%x%# ' that we are using, psql displays a little star in the prompt when there’s a transaction in ight, so you know you need to n- ish the transaction. More importantly, when you want to type in anything that will have a side e fect on your database (modifying the data set or the database schema), then without the star you know you need to rst type in BEGIN.
Let’s see an example output with ON_ERROR_ROLLBACK set to o f. Here’s its default value:
f1db# begin; BEGIN
f1db*# select 1/0; ERROR: division by zero f1db!# select 1+1;
ERROR: current transaction is aborted, commands ignored until end of transaction bloc f1db!# rollback;
ROLLBACK
We have an error in our transaction, and we notice that the star prompt is now a ag. The SQL transaction is marked invalid, and the only thing PostgreSQL will now accept from us is to nish the transaction, with either a commit or a rollback command. Both will result in the same result from the server: ROLLBACK.
Now, let’sdothesameSQLtransactionagain, thistimewith ON_ERROR_ROLLBA being set to interactive. Now, before each command we send to the server,
psql sends a savepoint command, which allows it to then issue a rollback to savepoint command in case of an error. This rollback to savepoint is also sent automatically:
f1db# begin; BEGIN
f1db*# select 1/0; ERROR: division by zero f1db*# select 1+1;
?column?
══════════
2
(1 row)
f1db*# commit; COMMIT
Notice how this time not only do we get to send successful commands af er the error, while still being in a transaction — also we get to be able to COMMIT our work to the server.
Chapter 6 The SQL REPL — An Interactive Setup j 56
A Reporting Tool
Getting familiar with psql is a very good productivity enhancer, so my advice is to spend some quality time with the documentation of the tool and get used to it. In this chapter, we are going to simplify things and help you to get started.
There are mainly two use cases for psql, either as an interactive tool or as a scripting and reporting tool. In the rst case, the idea is that you have plenty of commands to help you get your work done, and you can type in SQL right in your terminal and see the result of the query.
In the scripting and reporting use case, you have advanced formatting commands: it is possible to run a query and fetch its result directly in either asciidoc or HTML for example, given \pset format. Say we have a query that reports the N bests known results for a given driver surname. We can use psql to set dynamic variables, display tuples only and format the result in a convenient HTML output:
1 |
~ psql --tuples-only |
\ |
2 |
--set n=1 |
\ |
3 |
--set name=Alesi |
\ |
4 |
--no-psqlrc |
\ |
5 |
-P format=html |
\ |
6 |
-d f1db |
\ |
7-f report.sql
1 <table border="1">
2<tr valign="top">
3<td align="left">Alesi</td>
4 <td align="left">Canadian Grand Prix</td>
5 <td align="right">1995</td>
6<td align="right">1</td>
7</tr>
8</table>
It is also possible to set the connection parameters as environment variables, or to usethesameconnectionstringsasinyourapplication’scode, soyoucantestthem with copy/paste easily, there’s no need to transform them into the -d dbname -h
hostname -p port -U username syntax:
1~ psql -d postgresql://dim@localhost:5432/f1db
2f1db#
3
4 ~ psql -d "user=dim host=localhost port=5432 dbname=f1db"
5f1db#
