Добавил:
ИВТ (советую зайти в "Несортированное")rnПИН МАГА Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Database 2024 / Books / Искусство PostgreSQL.pdf
Скачиваний:
8
Добавлен:
20.11.2024
Размер:
1.62 Mб
Скачать

8

Indexing Strategy

Coming up with an Indexing Strate is an important step in terms of mastering your PostgreSQL database. It means that you are in a position to make an informed choice about which indexes you need, and most importantly, which you don’t need in your application.

A PostgreSQL index allows the system to have new options to nd the data your queries need. In the absence of an index, the only option available to your database is a sequential scan of your tables. The index access methods are meant to be faster than a sequential scan, by fetching the data directly where it is.

Indexing is of en thought of as a data modeling activity. When using PostgreSQL, some indexes are necessary to ensure data consistency (the C in ACID). Constraints such as UNIQUE, PRIMARY KEY or EXCLUDE USING are only possible to implement in PostgreSQL with a backing index. When an index is used as an implementation detail to ensure data consistency, then the indexing strate is indeed a data modeling activity.

In all other cases, the indexing strate is meant to enable methods for faster accessmethodstodata. Thosemethodsareonlygoingtobeexercisedinthecontext of running a SQL query. As writing the SQL queries is the job of a developer, then coming up with the right indexing strate for an application is also the job of the developer.

Chapter 8 Indexing Strategy j 72

Indexing for Constraints

When using PostgreSQL some SQL modeling constraints can only be handled with the help of a backing index. That is the case for the primary key and unique constraints, and also for the exclusion constraints created with the PostgreSQL special syntax EXCLUDE USING.

In those three constraint cases, the reason why PostgreSQL needs an index is because it allows the system to implement visibility tricks with its MVCC implementation. From the PostgreSQL documentation:

PostgreSQL provides a rich set of tools for developers to manage concurrent access to data. Internally, data consistency is maintained by using a multiversion model (Multiversion Concurrency Control, MVCC). This means that each SQL statement sees a snapshot of data (a database version) as it was some time ago, regardless of the current state of the underlying data. This prevents statements from viewing inconsistent data produced by concurrent transactions performing updates on the same data rows, providing transaction isolation for each database session. MVCC, by eschewing the locking methodologies of traditional database systems, minimizes lock contention in order to allow for reasonable performance in multiuser environments.

If we think about how to implement the unique constraint, we soon realize that to be correct the implementation must prevent two concurrent statements from inserting duplicates. Let’s see an example with two transactions t1 and t2 happening in parallel:

1 t1> insert into test(id) values(1);

2t2> insert into test(id) values(1);

Before the transactions start the table has no duplicate entry, it is empty. If we consider each transaction, both t1 and t2 are correct and they are not creating duplicate entries with the data currently known by PostgreSQL.

Still, we can’t accept both the transactions — one of them has to be refused — because they are con icting with the one another. PostgreSQL knows how to do that, and the implementation relies on the internal code being able to access the indexes in a non-MVCC compliant way: the internal code of PostgreSQL knows what the inight non-committed transactions are doing.

Chapter 8 Indexing Strategy j 73

The way the internals of PostgreSQL solve this problem is by relying on its index data structure in a non-MVCC compliant way, and this capability is not visible to SQL level users.

So when you declare a unique constraint, a primary key constraint or an exclusion constraint PostgreSQL creates an index for you:

1 > create table test(id integer unique);

2CREATE TABLE

3Time: 68.775 ms

4

5> \d test

6Table "public.test"

7

Column | Type

| Modifiers

8--------+---------+-----------

9id | integer |

10Indexes:

11"test_id_key" UNIQUE CONSTRAINT, btree (id)

And we can see that the index is registered in the system catalogs as being de ned in terms of a constraint.

Indexing for Queries

PostgreSQL automatically creates only those indexes that are needed for the system to behave correctly. Any and all other indexes are to be de ned by the application developers when they need a faster access method to some tuples.

An index cannot alter the result of a query. An index only provides another access method to the data, one that is faster than a sequential scan in most cases. Query semantics and result set don’t depend on indexes.

Implementing a user story (or a business case) with the help of SQL queries is the job of the developer. As the author of the SQL statements, the developer also should be responsible for choosing which indexes are needed to support their queries.

Chapter 8 Indexing Strategy j 74

Cost of Index Maintenance

An index duplicates data in a specialized format made to optimise a certain type of searches. This duplicated data set is still ACID compliant: at COMMIT; time, every change that is made it to the main tables of your schema must have made it to the indexes too.

As a consequence, each index adds write costs to your DML queries: insert, update and delete now have to maintain the indexes too, and in a transactional way.

That’s why we have to de ne a global indexing strate . Unless you have in nite IO bandwidth and storage capacity, it is not feasible to index everything in your database.

Choosing Queries to Optimize

In every application, we have some user side parts that require the lowest latency you can provide, and some reporting queries that can run for a little while longer without users complaining.

So when you want to make a query faster and you see that its explain plan is lacking index support, think about the query in terms of SLA in your application. Does this query need to run as fast as possible, even when it means that you now have to maintain more indexes?

PostgreSQL Index Access Methods

PostgreSQL implements several index Access Methods. An access method is a generic algorithm with a clean API that can be implemented for compatible data types. Each algorithm is well adapted to some use cases, which is why it’s interesting to maintain several access methods.

The PostgreSQL documentation covers index types in the indexes chapter, and tells us that

Chapter 8 Indexing Strategy j 75

PostgreSQL provides several index types: B-tree, Hash, GiST, SPGiST, GIN and BRIN. Each index type uses a di ferent algorithm that is best suited to di ferent types of queries. By default, the CREATE INDEX command creates B-tree indexes, which t the most common situations.

Each index access method has been designed to solve speci c use case:

B-Tree, or balanced tree

Balanced indexes are the most common used, by a long shot, because they are very e cient and provide an algorithm that applies to most cases. PostgreSQL implementation of the B Tree index support is best in class and has been optimized to handle concurrent read and write operations.

You can read more about the PostgreSQL B-tree algorithm and its theoretical background in the source code le:

src/backend/access/nbtree/README.

GiST, or generalized search tree

This access method implements an more general algorithm that again comes from research activities. The GiST Indexing Project from the University of California Berkeley is described in the following terms:

The GiST project studies the engineering and mathematics behind content-based indexing for massive amounts of complex content.

Its implementation in PostgreSQL allows support for 2-dimensional data types such as the geometry point or the rang data types. Those data types don’t support a total order and as a consequence can’t be indexed properly in a B-tree index.

SP-GiST, or spaced partitioned gist

SP-GiST indexes are the only PostgreSQL index access method implementation that support non-balanced disk-based data structures, such as quadtrees, k-d trees, and radix trees (tries). This is useful when you want to index 2-dimensional data with very di ferent densities.

GIN, or generalized inverted index

Chapter 8 Indexing Strategy j 76

GIN is designed for handling cases where the items to be indexed are composite values, and the queries to be handled by the index need to search for element values that appear within the composite items. For example, the items could be documents, and the queries could be searches for documents containing speci c words.

GIN indexes are “inverted indexes” which are appropriate for data values that contain multiple component values, such as arrays. An inverted index contains a separate entry for each component value. Such an index can e ciently handle queries that test for the presence of speci c component values.

The GIN access method is the foundation for the PostgreSQL Full Text Search support.

BRIN, or block range indexes

BRIN indexes (a shorthand for block range indexes) store summaries about the values stored in consecutive physical block ranges of a table. Like GiST, SP GiST and GIN, BRIN can support many di ferent indexing strategies, and the particular operators with which a BRIN index can be used vary depending on the indexing strategy. For data types that have a linear sort order, the indexed data corresponds to the minimum and maximum values of the values in the column for each block range.

Hash

Hash indexes can only handle simple equality comparisons. The query planner will consider using a hash index whenever an indexed column is involved in a comparison using the = operator.

Never use a hash index in PostgreSQL before version 10. In PostgreSQL 10 onward, hash index are crash-safe and may be used.

Bloom lters

A Bloom lter is a space-e cient data structure that is used to test whether an element is a member of a set. In the case of an index access method, it allows fast exclusion of non-matching tuples via signatures whose size is determined at index creation.

This type of index is most useful when a table has many attributes and queries test arbitrary combinations of them. A traditional B-tree index is

Соседние файлы в папке Books