Добавил:

vadikbee ИВТ (советую зайти в "Несортированное")rnПИН МАГА Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный исследовательский университет «МИЭТ»

Предмет:

Базы данных

Файл:

Database 2024 / Books / Искусство PostgreSQL.pdf

Скачиваний:

Добавлен:

20.11.2024

Размер:

1.62 Mб

Скачать

☆

<<< < Предыдущая 48 49 50 51 52 53 54 55 56 57 58 5960 / 9060 61 62 63 64 65 66 67 68 69 70 71 72 > Следующая >>>

Another Small Application

In a previous chapter when introducing arrays we used a dataset of 200,000 USA geolocated tweets with a very simple data model. The data model is a direct port of the Excel sheet format, allowing a straightforward loading process: we used the \copy command from psql.

1begin;

3create table tweet

5id bigint primary key,

6	date	date,
7	hour	time,
8	uname	text,
9	nickname	text,
10	bio	text,
11	message	text,
12	favs	bigint,
13	rts	bigint,
14	latitude	double precision,
15	longitude	double precision,
16	country	text,
17	place	text,
18	picture	text,
19	followers	bigint,
20	following	bigint,
21	listed	bigint,
22	lang	text,
23	url	text
24	);
25
26	\copy tweet from 'tweets.csv' with csv header delimiter ';'

Chapter 34 Another Small Application j 294

28 commit;

This database model is all wrong per the normal forms introduced earlier:

•There’s neither a unique constraint nor primary key, so there is nothing preventing insertion of duplicates entries, violating 1NF.

•Some non-key attributes are not dependent on the key because we mix data from the Twitter account posting the message and the message itself, violating 2NF.

This is the case with all the user’s attributes, such as the nickname, bio, picture, followers, following, and listed attributes.

•We have transitive dependencies in the model, which violates 3NF this time.

– The country and place attributes depend on the location attribute and as such should be on a separate table, such as the geonam data as used in the Denormalized Data Types chapter.

–The hour attributes depend on the date attribute, as the hour alone can’t represent when the tweet was transmitted.

•The longitude and latitude should really be a single location column, given PostgreSQL’s ability to deal with geometric data types, here a point.

It is interesting to note that failing to respect the normal forms has a negative impact on application’s performance. Here, each time a user changes his or her bio, we will have to go edit the user’s bio in every tweet ever posted. Or we could decide to only give new tweets the new bio, but then at query time when showing an old tweet, it gets costly to fetch the current bio from the user.

From a concurrency standpoint, a normalized schema helps to avoid concurrent update activity on the same rows from occuring of en in production.

It’s now time to rewrite our schema, and here’s a rst step:

1begin;

3create schema if not exists tweet;

5create table tweet.users

7	userid	bigserial primary key,
8	uname	text not null,

Chapter 34 Another Small Application j 295

9	nickname	text not null,
10	bio	text,
11	picture	text,
12	followers	bigint,
13	following	bigint,
14	listed	bigint,
15

16unique(uname)

17);

19create table tweet.message

20(

21id bigint primary key,

22	userid	bigint references tweet.users(userid),
23	datetime	timestamptz not null,
24	message	text,
25	favs	bigint,
26	rts	bigint,
27	location	point,
28	lang	text,
29	url	text
30	);
31
32	commit;

This model cleanly separates users and their messages and removes the attributes country and place, which we maintain separately in the geonames schema, as seen earlier.

That said, followers and following and listed elds are a summary of other information that we should have but don’t. The fact that the extract we worked with had a simpler statistics oriented schema shouldn’t blind us here. There’s a better way to register relationships between users in terms of who follows who and who lists who, as in the following model:

1begin;

3create schema if not exists tweet;

5create table tweet.users

7	userid	bigserial primary key,
8	uname	text not null,
9	nickname	text,
10	bio	text,
11	picture	text,
12

13unique(uname)

14);

Chapter 34 Another Small Application j 296

16create table tweet.follower

17(

18	follower	bigint	not	null	references	tweet.users(userid),
19	following	bigint	not	null	references	tweet.users(userid),
20

21primary key(follower, following)

22);

24create table tweet.list

25(

26	listid	bigserial primary key,
27	owner	bigint not null references tweet.users(userid),
28	name	text not null,
29

30unique(owner, name)

31);

33create table tweet.membership

34(

35	listid	bigint	not	null	references tweet.list(listid),
36	member	bigint	not	null	references tweet.users(userid),
37	datetime	timestamptz not			null,
38

39primary key(listid, member)

40);

42create table tweet.message

43(

44messageid bigserial primary key,

45	userid	bigint not null references tweet.users(userid),
46	datetime	timestamptz not null default now(),
47	message	text not null,
48	favs	bigint,
49	rts	bigint,
50	location	point,
51	lang	text,
52	url	text
53	);
54
55	commit;

Now we can begin to work with this model.

<<< < Предыдущая 48 49 50 51 52 53 54 55 56 57 58 5960 / 9060 61 62 63 64 65 66 67 68 69 70 71 72 > Следующая >>>

Соседние файлы в папке Books

#
20.11.20245.53 Mб16Базы данных Кузнецов.pdf
#
20.11.20241.19 Mб14БД Илюшечкины.doc
#
20.11.202415.89 Mб15Документация к PostgreSQL 15.1.pdf
#
20.11.202416.84 Mб14Документация к PostgreSQL 16.4.pdf
#
20.11.20241.62 Mб14Искусство PostgreSQL.pdf
#
20.11.20246.87 Mб26Мониторинг PostgreSQL.pdf