Добавил:
ИВТ (советую зайти в "Несортированное")rnПИН МАГА Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Database 2024 / Books / Искусство PostgreSQL.pdf
Скачиваний:
14
Добавлен:
20.11.2024
Размер:
1.62 Mб
Скачать

Chapter 36 Isolation and Locking j 315

activity records associated with a given messageid:

1select count(*) filter(where action = 'rt')

2- count(*) filter(where action = 'de-rt')

3as rts,

4count(*) filter(where action = 'fav')

5- count(*) filter(where action = 'de-fav')

6as favs

7from tweet.activity

8

join tweet.message using(messageid)

9where messageid = :id;

Reading the current counter value has become quite complex when compared to just adding a column to your query output list. On the other hand, when adding a rt or a fav action to a message, we transform the SQL:

1update tweet.message set rts = rts +1 where messageid = :id;

This is what we use instead:

1insert into tweet.activity(messageid, action) values(:id, 'rt');

The reason why replacing an update with an insert is interesting is concurrency behavior and locking. In the rst version, retweeting has to wait until all concurrent retweets are done, and the business model wants to sustain as many concurrent activities on the same small set of messages as possible (read about influencer accounts).

The insert has no concurrency because it targets a row that doesn’t exist yet. We register each action into its own tuple and require no locking to do that, allowing our production setup of PostgreSQL to sustain a much larger load.

Now, computing the counters each time we want to display them is costly. And the counters are displayed on every tweet message. We need a way to cache that information, and we’ll see about that in the Computing and Caching in SQL section.

Putting Concurrency to the Test

When we benchmark the concurrency properties of the two statements above, we quickly realize that the activity table is badly designed. The unique constraint includes a timestamptz eld, which in PostgreSQL is only precise down to the microsecond.

Chapter 36 Isolation and Locking j 316

This kind of made-up unique constraint means we now have these errors to deal with:

Error: Database

error 23505: duplicate key value violates unique

 

constraint "activity_messageid_datetime_action_key"

 

DETAIL: Key (messageid, datetime, action)

 

=(2,

2017-09-19 18:00:03.831818+02, rt) already exists.

 

The best course of action here is to do this:

1alter table tweet.activity

2drop constraint activity_messageid_datetime_action_key;

Now we can properly compare the concurrency scaling of the insert and the update based version. In case you might be curious about it, here’s the testing code that’s been used:

1 (defpackage #:concurrency

2(:use #:cl #:appdev)

3(:import-from #:lparallel

4

#:*kernel*

5

#:make-kernel #:make-channel

6

#:submit-task #:receive-result

7

#:kernel-worker-index)

8(:import-from #:cl-postgres-error

9

 

#:database-error)

10

(:export

#:*connspec*

11

 

#:concurrency-test))

12

 

 

13

(in-package #:concurrency)

14

 

 

15

(defparameter *connspec* '("appdev" "dim" nil "localhost"))

16

 

 

17(defparameter *insert-rt*

18"insert into tweet.activity(messageid, action) values($1, 'rt')")

19

20(defparameter *update-rt*

21"update tweet.message set rts = coalesce(rts, 0) + 1 where messageid = $1")

22

 

23

(defun concurrency-test (workers retweets messageid

24

&optional (connspec *connspec*))

25(format t "Starting benchmark for updates~%")

26(with-timing (rts seconds)

27(run-workers workers retweets messageid *update-rt* connspec)

28(format t "Updating took ~f seconds, did ~d rts~%" seconds rts))

29

 

30

(format t "~%")

31

 

32(format t "Starting benchmark for inserts~%")

33(with-timing (rts seconds)

34(run-workers workers retweets messageid *insert-rt* connspec)

35(format t "Inserting took ~f seconds, did ~d rts~%" seconds rts)))

Starting benchmark for inserts
Inserting took 2.132164 seconds, did 1000 rts
The update variant of the test took almost 50% as much time to complete than the insert variant, with this level of concurrency. Given that we have really simple SQL statements, we can attribute the timing di ference to having had to wait in line. Basically, the update version spent almost 1 second out of 3 seconds waiting
1
2
3
4
5
6
Here’s a typical result with a concurrency of 100 workers all wanting to do 10 retweet in a loop using a messageid, here message 3. While it’s not representative to have them loop 10 times to retweet the same message, it should help create the concurrency e fect we want to produce, which is having several concurrent transactions waiting in turn in order to have a lock access to the same row.
The theory says that those concurrent users will have to wait in line, and thus spend time waiting for a lock on the PostgreSQL server. We should see that in the timing reports as a time di ference:
CL-USER> (concurrency::concurrency-test 100 10 3) Starting benchmark for updates
Updating took 3.099873 seconds, did 1000 rts
(pomo:query sql messageid) 1)
(database-error (c)
(format t "Error: ~a~%" c) 0)))
(progn
(handler-case
(defun retweet
(messageid sql)
(loop repeat workers sum (lparallel:receive-result channel))))
43
44
45
46
47 (defun retweet-many-times (times messageid sql
48 &optional (connspec *connspec*))
49 (pomo:with-connection connspec
50 (pomo:query
51 (format nil "set application_name to 'worker ~a'" 52 (lparallel:kernel-worker-index)))
53 (loop repeat times sum (retweet messageid sql))))
54
55
56
57
58
59
60
61
62
retweets messageid sql connspec))

Chapter 36 Isolation and Locking j 317

36

 

37

(defun run-workers (workers retweets messageid sql

38

&optional (connspec *connspec*))

39(let* ((*kernel* (lparallel:make-kernel workers))

40(channel (lparallel:make-channel)))

41(loop repeat workers

42do (lparallel:submit-task channel #'retweet-many-times

Chapter 36 Isolation and Locking j 318

for a free slot.

In another test with even more concurrency pressure at 50 retweets per worker, we can show that the results are repeatable:

1 CL-USER> (concurrency::concurrency-test 100 50 6)

2Starting benchmark for updates

3Updating took 5.070135 seconds, did 5000 rts

4

5Starting benchmark for inserts

6Inserting took 3.739505 seconds, did 5000 rts

If you know that your application has to scale, think about how to avoid concurrent activity that competes against a single shared resource. Here, this shared resource is the rts eld of the tweet.message row that you target, and the concurrency behavior is going to be ne if the retweet activity is well distributed. As soon as many users want to retweet the same message, then the update solution has a non-trivial scalability impact.

Now, we’re going to implement the tweet.activity based model. In this model, the number of retweets needs to be computed each time we display it, and it’s part of the visible data. Also, in the general case, it’s impossible for our users to know for sure how many retweets have been made so that we can implement a cache with eventual consistency properties.

Соседние файлы в папке Books