- •About…
- •About the Book
- •About the Author
- •Acknowledgements
- •About the organisation of the books
- •Structured Query Language
- •A First Use Case
- •Loading the Data Set
- •Application Code and SQL
- •Back to Discovering SQL
- •Computing Weekly Changes
- •Software Architecture
- •Why PostgreSQL?
- •The PostgreSQL Documentation
- •Getting Ready to read this Book
- •Business Logic
- •Every SQL query embeds some business logic
- •Business Logic Applies to Use Cases
- •Correctness
- •Efficiency
- •Stored Procedures — a Data Access API
- •Procedural Code and Stored Procedures
- •Where to Implement Business Logic?
- •A Small Application
- •Readme First Driven Development
- •Chinook Database
- •Top-N Artists by Genre
- •Intro to psql
- •The psqlrc Setup
- •Transactions and psql Behavior
- •Discovering a Schema
- •Interactive Query Editor
- •SQL is Code
- •SQL style guidelines
- •Comments
- •Unit Tests
- •Regression Tests
- •A Closer Look
- •Indexing Strategy
- •Indexing for Queries
- •Choosing Queries to Optimize
- •PostgreSQL Index Access Methods
- •Advanced Indexing
- •Adding Indexes
- •An Interview with Yohann Gabory
- •Get Some Data
- •Structured Query Language
- •Queries, DML, DDL, TCL, DCL
- •Select, From, Where
- •Anatomy of a Select Statement
- •Projection (output): Select
- •Restrictions: Where
- •Order By, Limit, No Offset
- •Ordering with Order By
- •kNN Ordering and GiST indexes
- •Top-N sorts: Limit
- •No Offset, and how to implement pagination
- •Group By, Having, With, Union All
- •Aggregates (aka Map/Reduce): Group By
- •Aggregates Without a Group By
- •Restrict Selected Groups: Having
- •Grouping Sets
- •Common Table Expressions: With
- •Distinct On
- •Result Sets Operations
- •Understanding Nulls
- •Three-Valued Logic
- •Not Null Constraints
- •Outer Joins Introducing Nulls
- •Using Null in Applications
- •Understanding Window Functions
- •Windows and Frames
- •Partitioning into Different Frames
- •Available Window Functions
- •When to Use Window Functions
- •Relations
- •SQL Join Types
- •An Interview with Markus Winand
- •Serialization and Deserialization
- •Some Relational Theory
- •Attribute Values, Data Domains and Data Types
- •Consistency and Data Type Behavior
- •PostgreSQL Data Types
- •Boolean
- •Character and Text
- •Server Encoding and Client Encoding
- •Numbers
- •Floating Point Numbers
- •Sequences and the Serial Pseudo Data Type
- •Universally Unique Identifier: UUID
- •Date/Time and Time Zones
- •Time Intervals
- •Date/Time Processing and Querying
- •Network Address Types
- •Denormalized Data Types
- •Arrays
- •Composite Types
- •Enum
- •PostgreSQL Extensions
- •An interview with Grégoire Hubert
- •Object Relational Mapping
- •Tooling for Database Modeling
- •How to Write a Database Model
- •Generating Random Data
- •Modeling Example
- •Normalization
- •Data Structures and Algorithms
- •Normal Forms
- •Database Anomalies
- •Modeling an Address Field
- •Primary Keys
- •Foreign Keys Constraints
- •Not Null Constraints
- •Check Constraints and Domains
- •Exclusion Constraints
- •Practical Use Case: Geonames
- •Features
- •Countries
- •Modelization Anti-Patterns
- •Entity Attribute Values
- •Multiple Values per Column
- •UUIDs
- •Denormalization
- •Premature Optimization
- •Functional Dependency Trade-Offs
- •Denormalization with PostgreSQL
- •Materialized Views
- •History Tables and Audit Trails
- •Validity Period as a Range
- •Pre-Computed Values
- •Enumerated Types
- •Multiple Values per Attribute
- •The Spare Matrix Model
- •Denormalize wih Care
- •Not Only SQL
- •Schemaless Design in PostgreSQL
- •Durability Trade-Offs
- •Another Small Application
- •Insert, Update, Delete
- •Insert Into
- •Insert Into … Select
- •Update
- •Inserting Some Tweets
- •Delete
- •Tuples and Rows
- •Deleting All the Rows: Truncate
- •Isolation and Locking
- •About SSI
- •Putting Concurrency to the Test
- •Computing and Caching in SQL
- •Views
- •Materialized Views
- •Triggers
- •Transactional Event Driven Processing
- •Trigger and Counters Anti-Pattern
- •Fixing the Behavior
- •Event Triggers
- •Listen and Notify
- •PostgreSQL Notifications
- •Notifications and Cache Maintenance
- •Listen and Notify Support in Drivers
- •Batch Update, MoMA Collection
- •Updating the Data
- •Concurrency Patterns
- •On Conflict Do Nothing
- •An Interview with Kris Jenkins
- •Installing and Using PostgreSQL Extensions
- •Finding PostgreSQL Extensions
- •A Short List of Noteworthy Extensions
- •Auditing Changes with hstore
- •Introduction to hstore
- •Comparing hstores
- •Auditing Changes with a Trigger
- •Testing the Audit Trigger
- •From hstore Back to a Regular Record
- •Last.fm Million Song Dataset
- •Using Trigrams For Typos
- •The pg_trgm PostgreSQL Extension
- •Trigrams, Similarity and Searches
- •Complete and Suggest Song Titles
- •Trigram Indexing
- •Denormalizing Tags with intarray
- •Advanced Tag Indexing
- •User-Defined Tags Made Easy
- •The Most Popular Pub Names
- •A Pub Names Database
- •Normalizing the Data
- •Geolocating the Nearest Pub (k-NN search)
- •How far is the nearest pub?
- •The earthdistance PostgreSQL contrib
- •Pubs and Cities
- •The Most Popular Pub Names by City
- •Geolocation with PostgreSQL
- •Geolocation Data Loading
- •Geolocation Metadata
- •Emergency Pub
- •Counting Distinct Users with HyperLogLog
- •HyperLogLog
- •Installing postgresql-hll
- •Counting Unique Tweet Visitors
- •Lossy Unique Count with HLL
- •Getting the Visits into Unique Counts
- •Scheduling Estimates Computations
- •Combining Unique Visitors
- •An Interview with Craig Kerstiens
Chapter 7 SQL is Code j 68
10 Result: PASS
You might also nd it easy to integrate SQL testing in your current unit testing solution. In Debian and derivatives operating systems, the pg_virtualenv is a tool that creates a temporary PostgreSQL installation that will exist only while you’re running your tests.
If you’re using Python, read the excellent article from Julien Danjou about databases integration testing strategies with Python where you will learn more tricks to integrate your database tests using the standard Python toolset.
Your application relies on SQL. You rely on tests to trust your ability to change and evolve your application. You need your tests to cover the SQL parts of your application!
Regression Tests
Regression testing protects against introducing bugs when refactoring code. In SQL too we refactor queries, either because the calling application code is changed and the query must change too, or because we are hitting problems in production and a new optimized version of the query is being checked-in to replace the previous erroneous version.
The way regression testing protects you is by registering the expected results from your queries, and then checking actual results against the expected results. Typically you would run the regression tests each time a query is changed.
The RegreSQL tool implements that idea. It nds SQL les in your code repository and allows registering plan tests against them, and then it compares the results with what’s expected.
A typical output from using RegreSQL against our cdstore application looks like the following:
1$ regresql test
2 Connecting to 'postgres:///chinook?sslmode=disable'…
3TAP version 13
4 ok 1 - src/sql/album-by-artist.1.out
5 ok 2 - src/sql/album-tracks.1.out
6ok 3 - src/sql/artist.1.out
7 ok 4 - src/sql/genre-topn.top-3.out
Chapter 7 SQL is Code j 69
8 ok 5 - src/sql/genre-topn.top-1.out
9ok 6 - src/sql/genre-tracks.out
In the following example we introduce a bug by changing the test plan without changing the expected result, and here’s how it looks then:
1$ regresql test
2 Connecting to 'postgres:///chinook?sslmode=disable'…
3TAP version 13
4 ok 1 - src/sql/album-by-artist.1.out
5 ok 2 - src/sql/album-tracks.1.out
6# Query File: 'src/sql/artist.sql'
7 # Bindings File: 'regresql/plans/src/sql/artist.yaml'
8# Bindings Name: '1'
9# Query Parameters: 'map[n:2]'
10# Expected Result File: 'regresql/expected/src/sql/artist.1.out'
11# Actual Result File: 'regresql/out/src/sql/artist.1.out'
12#
13# --- regresql/expected/src/sql/artist.1.out
14# +++ regresql/out/src/sql/artist.1.out
15# @@ -1,4 +1,5 @@
16 |
# - |
name |
| albums |
17# -------------+-------
18# -Iron Maiden | 21
19 |
# + |
name |
| albums |
20# +-------------+-------
21# +Iron Maiden | 21
22# +Led Zeppelin | 14
23#
24not ok 3 - src/sql/artist.1.out
25ok 4 - src/sql/genre-topn.top-3.out
26ok 5 - src/sql/genre-topn.top-1.out
27ok 6 - src/sql/genre-tracks.out
The diagnostic output allows actions to be taken to x the problem: either change the expected output (with regresql update) or x the regresql/plans/src/sql/artist.yaml le.
A Closer Look
When something wrong happens in production and you want to understand it, one of the important tasks we are confronted with is nding which part of the code is sending a speci c query we can see in the monitoring, in the logs or in the interactive activity views.
Chapter 7 SQL is Code j 70
PostgreSQL implements the application_name parameter, which you can set in the connection string and with the SET command within your session. It is then possible to have it reported in the server’s logs, and it’s also part of the system activity view pg_stat_activity.
It is a good idea to be quite granular with this setting, going as low as the module or package level, depending on your programming language of choice. It’s one of those settings that the main application should have full control of, so usually external (and internal) libs are not setting it.
