Добавил:
ИВТ (советую зайти в "Несортированное")rnПИН МАГА Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Database 2024 / Books / Искусство PostgreSQL.pdf
Скачиваний:
14
Добавлен:
20.11.2024
Размер:
1.62 Mб
Скачать

15

Group By, Having, With, Union All

Now that we have some of the basics of SQL queries, we can move on to more advanced topics. Up to now, queries would return as many rows as we select thanks to the where ltering. This lter applies against a data set that is produced by the from clause and its joins in between relations.

The outer joins might produce more rows than you have in your reference data set, in particular, cross join is a Cartesian product.

In this section, we’ll have a look at aggregates. They work by computing a digest value for several input rows at a time. With aggregates, we can return a summary containing many fewer rows than passed the where lter.

Aggregates (aka Map/Reduce): Group By

The group by clause introduces aggregates in SQL, and allows implementing much the same thing as map/reduce in other systems: map your data into different groups, and in each group reduce the data set to a single value.

As a rst example we can count how many races have been run in each decade:

1select extract('year'

2

from

3

date_trunc('decade', date))

4as decade,

5count(*)

Chapter 15 Group By, Having, With, Union All j 115

6from races

7 group by decade

8order by decade;

PostgreSQL o fers a rich set of date and times functions:

decade │ count

════════╪═══════

1950

84

1960

100

1970

144

1980

156

1990

162

2000

174

2010

156

(7 rows)

The di ference between each decade is easy to compute thanks to window function, seen later in this chapter. Let’s have a preview:

1 with races_per_decade

2as (

3select extract('year'

4

from

5

date_trunc('decade', date))

6

as decade,

7

count(*) as nbraces

8from races

9 group by decade

10order by decade

11)

12select decade, nbraces,

13case

14when lag(nbraces, 1)

15

over(order by

decade) is null

16

then ''

 

17

 

 

18

when nbraces - lag(nbraces, 1)

19

over(order by decade)

20

< 0

 

21

then format('-%3s',

 

22

lag(nbraces,

1)

23

over(order by

decade)

24

- nbraces)

 

25

 

 

26

else format('+%3s',

 

27

nbraces

 

28

- lag(nbraces,

1)

29

over(order by

decade))

30

 

 

31

end as evolution

 

Chapter 15 Group By, Having, With, Union All j 116

32 from races_per_decade;

We use a pretty complex CASE statement to elaborate on the exact output we want from the query. Other than that it’s using the lag() over(order by decade) expression that allows seeing the previous row, and moreover allows us to compute the di ference in between the current row and the previous one.

Here’s what we get from the previous query:

decade │ nbraces │ evolution

════════╪═════════╪═══════════

1950

84

1960

100

│ + 16

1970

144

│ + 44

1980

156

│ + 12

1990

162

│ + 6

2000

174

│ + 12

2010

156

│ - 18

(7 rows)

Now, we can also prepare the data set in a separate query that is run rst, called a common table expression and introduced by the with clause. We will expand on that idea in the upcoming pages.

PostgreSQL comes with the usual aggregates you would expect such as sum, count, and avg, and also with some more interesting ones such as bool_and. As its name suggests the bool_and aggregate starts true and remains true only if every row it sees evaluates to true.

With that aggregate, it’s then possible to search for all drivers who failed to nish any single race they participated in over their whole career:

1with counts as

2(

3select driverid, forename, surname,

4count(*) as races,

5

bool_and(position is null) as never_finished

6from drivers

7join results using(driverid)

8

join races using(raceid)

9group by driverid

10)

11select driverid, forename, surname, races

12from counts

13where never_finished

14order by races desc;

Well, it turns out that we have a great number of cases in which it happens. The

Chapter 15 Group By, Having, With, Union All j 117

previous query gives us 202 drivers who never

nished a single race they took part

in, 117 of them had only participated in a single race that said.

Not picking on anyone in particular, we can

nd out if some seasons were less

lucky than others on that basis and search for drivers who didn’t nish a single race they participated into, per season:

1with counts as

2(

3select date_trunc('year', date) as year,

4count(*) filter(where position is null) as outs,

5

bool_and(position is null) as never_finished

6from drivers

7join results using(driverid)

8join races using(raceid)

9group by date_trunc('year', date), driverid

10)

11select extract(year from year) as season,

12sum(outs) as "#times any driver didn't finish a race"

13from counts

14where never_finished

15group by season

16order by sum(outs) desc

17limit 5;

In this query, you can see the aggregate filter(where …) syntax that allows us to update our computation only for those rows that pass the lter. Here we choose to count all race results where the position is null, which means the driver didn’t make it to the nish line for some reason…

season │ #times any driver didn't finish a race

════════╪════════════════════════════════════════

1989

139

1953

51

1955

48

1990

48

1956

46

(5 rows)

It turns out that overall, 1989 was a pretty bad season.

Aggregates Without a Group By

It is possible to compute aggregates over a data set without using the group by clause in SQL. What it then means is that we are operating over a single group

Соседние файлы в папке Books