Добавил:

vadikbee ИВТ (советую зайти в "Несортированное")rnПИН МАГА Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный исследовательский университет «МИЭТ»

Предмет:

Базы данных

Файл:

Database 2024 / Books / Искусство PostgreSQL.pdf

Скачиваний:

Добавлен:

20.11.2024

Размер:

1.62 Mб

Скачать

☆

<<< < Предыдущая 13 14 15 16 17 18 19 20 21 22 23 2425 / 9025 26 27 28 29 30 31 32 33 34 35 36 37 > Следующая >>>

Group By, Having, With, Union All

Now that we have some of the basics of SQL queries, we can move on to more advanced topics. Up to now, queries would return as many rows as we select thanks to the where ltering. This lter applies against a data set that is produced by the from clause and its joins in between relations.

The outer joins might produce more rows than you have in your reference data set, in particular, cross join is a Cartesian product.

In this section, we’ll have a look at aggregates. They work by computing a digest value for several input rows at a time. With aggregates, we can return a summary containing many fewer rows than passed the where lter.

Aggregates (aka Map/Reduce): Group By

The group by clause introduces aggregates in SQL, and allows implementing much the same thing as map/reduce in other systems: map your data into different groups, and in each group reduce the data set to a single value.

As a rst example we can count how many races have been run in each decade:

1select extract('year'

2	from
3	date_trunc('decade', date))

4as decade,

5count(*)

Chapter 15 Group By, Having, With, Union All j 115

6from races

7 group by decade

8order by decade;

PostgreSQL o fers a rich set of date and times functions:

decade │ count

════════╪═══════

1950	│	84
1960	│	100
1970	│	144
1980	│	156
1990	│	162
2000	│	174
2010	│	156

(7 rows)

The di ference between each decade is easy to compute thanks to window function, seen later in this chapter. Let’s have a preview:

1 with races_per_decade

2as (

3select extract('year'

4	from
5	date_trunc('decade', date))
6	as decade,
7	count(*) as nbraces

8from races

9 group by decade

10order by decade

11)

12select decade, nbraces,

13case

14when lag(nbraces, 1)

15	over(order by	decade) is null
16	then ''
17
18	when nbraces - lag(nbraces, 1)
19	over(order by decade)
20	< 0
21	then format('-%3s',
22	lag(nbraces,	1)
23	over(order by	decade)
24	- nbraces)
25
26	else format('+%3s',
27	nbraces
28	- lag(nbraces,	1)
29	over(order by	decade))
30
31	end as evolution

Chapter 15 Group By, Having, With, Union All j 116

32 from races_per_decade;

We use a pretty complex CASE statement to elaborate on the exact output we want from the query. Other than that it’s using the lag() over(order by decade) expression that allows seeing the previous row, and moreover allows us to compute the di ference in between the current row and the previous one.

Here’s what we get from the previous query:

decade │ nbraces │ evolution

════════╪═════════╪═══════════

1950	│	84	│
1960	│	100	│ + 16
1970	│	144	│ + 44
1980	│	156	│ + 12
1990	│	162	│ + 6
2000	│	174	│ + 12
2010	│	156	│ - 18

(7 rows)

Now, we can also prepare the data set in a separate query that is run rst, called a common table expression and introduced by the with clause. We will expand on that idea in the upcoming pages.

PostgreSQL comes with the usual aggregates you would expect such as sum, count, and avg, and also with some more interesting ones such as bool_and. As its name suggests the bool_and aggregate starts true and remains true only if every row it sees evaluates to true.

With that aggregate, it’s then possible to search for all drivers who failed to nish any single race they participated in over their whole career:

1with counts as

3select driverid, forename, surname,

4count(*) as races,

5	bool_and(position is null) as never_finished

6from drivers

7join results using(driverid)

8	join races using(raceid)

9group by driverid

10)

11select driverid, forename, surname, races

12from counts

13where never_finished

14order by races desc;

Well, it turns out that we have a great number of cases in which it happens. The

Chapter 15 Group By, Having, With, Union All j 117

previous query gives us 202 drivers who never	nished a single race they took part
in, 117 of them had only participated in a single race that said.
Not picking on anyone in particular, we can	nd out if some seasons were less

lucky than others on that basis and search for drivers who didn’t nish a single race they participated into, per season:

1with counts as

3select date_trunc('year', date) as year,

4count(*) filter(where position is null) as outs,

5	bool_and(position is null) as never_finished

6from drivers

7join results using(driverid)

8join races using(raceid)

9group by date_trunc('year', date), driverid

10)

11select extract(year from year) as season,

12sum(outs) as "#times any driver didn't finish a race"

13from counts

14where never_finished

15group by season

16order by sum(outs) desc

17limit 5;

In this query, you can see the aggregate ﬁlter(where …) syntax that allows us to update our computation only for those rows that pass the lter. Here we choose to count all race results where the position is null, which means the driver didn’t make it to the nish line for some reason…

season │ #times any driver didn't finish a race

════════╪════════════════════════════════════════

1989	│	139
1953	│	51
1955	│	48
1990	│	48
1956	│	46

(5 rows)

It turns out that overall, 1989 was a pretty bad season.

Aggregates Without a Group By

It is possible to compute aggregates over a data set without using the group by clause in SQL. What it then means is that we are operating over a single group

<<< < Предыдущая 13 14 15 16 17 18 19 20 21 22 23 2425 / 9025 26 27 28 29 30 31 32 33 34 35 36 37 > Следующая >>>

Соседние файлы в папке Books

#
20.11.20245.53 Mб16Базы данных Кузнецов.pdf
#
20.11.20241.19 Mб14БД Илюшечкины.doc
#
20.11.202415.89 Mб15Документация к PostgreSQL 15.1.pdf
#
20.11.202416.84 Mб14Документация к PostgreSQL 16.4.pdf
#
20.11.20241.62 Mб14Искусство PostgreSQL.pdf
#
20.11.20246.87 Mб26Мониторинг PostgreSQL.pdf