This article is not entirely complete yet, but it's still useful:

Reindex Maintenance Operation:

Important Points:

1. All the numbers and intervals in this document are experimental.

2. All the scripts in this document must be changed mode to executable by using the following command:

chmod +x scriptname.sh

3. These scripts are intended to run as root or a user which can switch user to ‘postgres’ user passwordlessly.

4. The scripts must run by /bin/sh like the following:

/bin/sh scriptname.sh

In some of the bash scripts, you may wonder why there is no while loop. The truth is that there are embedded loops and nested loops, implemented via “bash -c”, “xargs”, or external bash script.

• PostgreSQL provides the index methods B-tree, hash, GiST, spgist, brin and GIN. Default is btree. Users can also define their own index methods, but that is fairly complicated. When the WHERE clause is present, a partial index is created. The default method is btree.

• Create index syntax:

CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON [ ONLY ] table_name [ USING method ]

( { column_name | ( expression ) } [ COLLATE collation ] [ opclass [ ( opclass_parameter = value [, ... ] ) ] ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )

[ INCLUDE ( column_name [, ...] ) ]

[ WITH ( storage_parameter [= value] [, ... ] ) ]

[ TABLESPACE tablespace_name ]

[ WHERE predicate ]

• Alter index syntax:

ALTER INDEX [ IF EXISTS ] name RENAME TO new_name

ALTER INDEX [ IF EXISTS ] name SET TABLESPACE tablespace_name

ALTER INDEX name ATTACH PARTITION index_name

ALTER INDEX name [ NO ] DEPENDS ON EXTENSION extension_name

ALTER INDEX [ IF EXISTS ] name SET ( storage_parameter [= value] [, ... ] )

ALTER INDEX [ IF EXISTS ] name RESET ( storage_parameter [, ... ] )

ALTER INDEX [ IF EXISTS ] name ALTER [ COLUMN ] column_number

SET STATISTICS integer

ALTER INDEX ALL IN TABLESPACE name [ OWNED BY role_name [, ... ] ]

SET TABLESPACE new_tablespace [ NOWAIT ]

-------------------------------------------------------------------------------------------------------------

• Reindex syntax:

REINDEX [ ( option [, ...] ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name

where option can be one of:

CONCURRENTLY [ boolean ]

TABLESPACE new_tablespace

VERBOSE [ boolean ]

alternatively you can use reindexdb binary file for maintenance purposes:

reindexdb:

reindexdb is a wrapper around the SQL command REINDEX(7). There is no effective difference between reindexing databases via this utility and via

other methods for accessing the server.

File path: PostgreSQL’s executables’ path:

/usr/pgsql-<Major Version Number>/bin/

Command-line arguments:

Example:

To reindex the table t and the index ind in a database named drpezeshki:

reindexdb --table=t --index=ind drpezeshki

• Normally you don't have to worry about index fragmentation at all. However, if there has been a mass delete or update, or the sustained change rate was so high that autovacuum couldn't keep up, you may end up with a badly bloated index.

• Find boated/fragmented indexes:

o Query current extensions with:

select * from pg_extension;

select * from pg_available_extensions;

• Extensions for index management

1) pg_repack_

Reorganize tables in PostgreSQL databases with minimal locks

is a PostgreSQL extension which lets you remove bloat from

tables and indexes, and optionally restore the physical order of clustered

indexes. Unlike CLUSTER_ and `VACUUM FULL`_ it works online, without

holding an exclusive lock on the processed tables during processing.

pg_repack is efficient to boot, with performance comparable to using

CLUSTER directly.

2) pgstattuple:

/usr/pgsql-13/share/extension/pg_stattuple.control

dnf -y install postgresql-contrib

#connect to the desired database and issue the subsequent command:

CREATE EXTENSION pgstattuple;

o issue the following command:

SELECT * FROM pgstatindex('index name');

Sample output:

query to get the leaf fragmentation of all B-tree indexes:

----------------------------------------------------------------------------------------------------------------

SELECT i.indexrelid::regclass,

s.leaf_fragmentation

FROM pg_index AS i

JOIN pg_class AS t ON i.indexrelid = t.oid

JOIN pg_opclass AS opc ON i.indclass[0] = opc.oid

JOIN pg_am ON opc.opcmethod = pg_am.oid

CROSS JOIN LATERAL pgstatindex(i.indexrelid) AS s

WHERE t.relkind = 'i'

AND pg_am.amname = 'btree';

-------------------------------------------------------------------------------------------------------------

Using these extensions automation tasks can be planned and programmed to defragment indexes periodically like SQL Server

Related performance parameters:

maintenance_work_mem

max_parallel_maintenance_workers

effective_cache_size

Example:

• We have a table of more than 200,000 rows with the following definition:

Dataset exists on the following link:

https://www.kaggle.com/speedoheck/inpatient-hospital-charges

---------------------------------------------------

create table inpatientcharges

(

drg_definition varchar(200),

provider_id integer,

provider_name varchar(200),

provider_street_address varchar(200),

provider_city varchar(50),

provider_state varchar(10),

provider_zip_code integer,

hospital_referral_region_description varchar(50),

total_discharges integer,

average_covered_charges varchar(20),

average_total_payments varchar(20),

average_medicare_payments varchar(20)

);

alter table inpatientcharges owner to postgres;

---- Load the data into table

---- Adding an identity column

ALTER TABLE inpatientCharges

ADD COLUMN id int -- int or bigint or smallint

GENERATED BY DEFAULT AS IDENTITY;

--------------------------------------------------------------------------------------------------

• create index on the identity column:

-- create unique index concurrently ic on inpatientcharges (id asc nulls last) with (fillfactor=80); -- default is 90

-------------------------------------------------

If we want to turn it to the primary key, and to recreate a primary key constraint, without blocking updates while the index is rebuilt:

CREATE UNIQUE INDEX CONCURRENTLY ic ON inpatientCharges (id asc nulls last) include (drg_definition, provider_id, provider_name,provider_street_address,provider_city, provider_state,provider_zip_code,hospital_referral_region_description,total_discharges,average_covered_charges,average_total_payments,average_medicare_payments) with (fillfactor = 80);

ALTER TABLE inpatientCharges

-- DROP CONSTRAINT ic,

ADD CONSTRAINT ic PRIMARY KEY USING INDEX ic;

----------------------------------------------------------------------------------------------------------

You can issue this command to show indexes that belong to table:

\d table_name

----------------------------------------------------------------------------------------------------------

use this command to import dataset:

COPY inpatientCharges(drg_definition,provider_id,provider_name,provider_street_address,provider_city,provider_state,provider_zip_code,hospital_referral_region_description,total_discharges,average_covered_charges,average_total_payments,average_medicare_payments)

FROM '/var/lib/pgsql/data/inpatientCharges.csv'

DELIMITER ','

CSV HEADER;

-------------------------------------------------------------------------------------------------

SELECT

tablename,

indexname,

indexdef

FROM

pg_indexes

WHERE

schemaname = 'public'

ORDER BY

tablename,

indexname;

-------------------------------------------------------------------------------------------------

create a new non-primary key index:

---------------------------------------------------------------------------------------------------

CREATE INDEX CONCURRENTLY icpid ON inpatientCharges (provider_id asc nulls last) include (id, provider_name,provider_city, provider_state,provider_zip_code,total_discharges,average_covered_charges,average_total_payments,average_medicare_payments) with (fillfactor = 80);

--------------------------------------------------------------------------------------------------

Issue the following command given before to extract fragmentations by name and fragmentation is bigger than 40 for example:

--------------------------------------------------------------------------------------------------

SELECT i.indexrelid::regclass,

s.leaf_fragmentation

FROM pg_index AS i

JOIN pg_class AS t ON i.indexrelid = t.oid

JOIN pg_opclass AS opc ON i.indclass[0] = opc.oid

JOIN pg_am ON opc.opcmethod = pg_am.oid

CROSS JOIN LATERAL pgstatindex(i.indexrelid) AS s

WHERE t.relkind = 'i'

AND pg_am.amname = 'btree' and leaf_fragmentation > 40 and leaf_fragmentation <> 'NaN' order by leaf_fragmentation desc;

------- in one line for dynamic query:

-- user data

SELECT i.indexrelid::regclass FROM pg_index AS i JOIN pg_class AS t ON i.indexrelid = t.oid JOIN pg_opclass AS opc ON i.indclass[0] = opc.oid JOIN pg_am ON opc.opcmethod = pg_am.oid CROSS JOIN LATERAL pgstatindex(i.indexrelid) AS s WHERE t.relkind = 'i' AND pg_am.amname = 'btree' and leaf_fragmentation > 0 and leaf_fragmentation <> 'NaN' and t.relname not like 'pg_%' order by leaf_fragmentation desc;

-- system catalogs

Fragmentation of indexes in the above example:

SELECT i.indexrelid::regclass, s.leaf_fragmentation FROM pg_index AS i JOIN pg_class AS t ON i.indexrelid = t.oid JOIN pg_opclass AS opc ON i.indclass[0] = opc.oid JOIN pg_am ON opc.opcmethod = pg_am.oid CROSS JOIN LATERAL pgstatindex(i.indexrelid) AS s WHERE t.relkind = 'i' AND pg_am.amname = 'btree' and relname = 'ic' or relname='icpid';

select 'path.to.object'::regclass::oid

select 'ic'::regclass::oid

------------------------------------------------------------------------------------------------

Alternative query to fetch fragmented users only indexes’ list:

I do not vow that this query returns all fragmented indexes. It’s slower too! Someone suggested this query to me so I include it here:

SELECT i.indexrelid::regclass,

s.leaf_fragmentation

FROM pg_index AS i

JOIN pg_class AS t ON i.indexrelid = t.oid

JOIN pg_opclass AS opc ON i.indclass[0] = opc.oid

JOIN pg_am ON opc.opcmethod = pg_am.oid

CROSS JOIN LATERAL pgstatindex(i.indexrelid) AS s

WHERE t.relkind = 'i'

AND pg_am.amname = 'btree' and leaf_fragmentation > 30 and leaf_fragmentation <> 'NaN' order by leaf_fragmentation desc;

Now you can automate reindexing:

System catalog indexes cannot be rebuilt concurrently, thus we have separated user indexes and system catalogs reindexing

1) Script to reindex system catalogs (Not concurrent, thus midnights and weekends are suggested)

usage:

/bin/sh reindex_systemcatalogs.sh <PG Version> <Fragmentation Threshold> <DBName>

###################################################################

set -x

######## variables: ##############

PGHOME=/usr/pgsql-$1/bin #Home for PG13: /usr/pgsql-13/bin

fragmentation_threshold=$2

dbname=$3

##################################

export PATH=$PATH:$PGHOME

psqlc="sudo -u postgres psql -d$dbname"

$psqlc -c "SELECT i.indexrelid::regclass FROM pg_index AS i JOIN pg_class AS t ON i.indexrelid = t.oid JOIN pg_opclass AS opc ON i.indclass[0] = opc.oid JOIN pg_am ON opc.opcmethod = pg_am.oid CROSS JOIN LATERAL pgstatindex(i.indexrelid) AS s WHERE t.relkind = 'i' AND pg_am.amname = 'btree' and leaf_fragmentation >= $fragmentation_threshold and leaf_fragmentation <> 'NaN' and t.relname like 'pg_%' order by leaf_fragmentation desc;" | tail -n +3 | head -n -2 | xargs -I {} sudo -u postgres psql -d$dbname -c "REINDEX (verbose) INDEX {}"

###################################################################

§ If you want to include index, database, schema, and table’s name too, you should use the following 2 scripts, both in the same directory:

· The main script: The script you call:

usage:

/bin/sh reindex_sysindexes.sh <PG version> <Target Leaf Fragmentation Threshold (0-100)>

Note:

You must execute this script with /bin/sh

###################################################################

set -x

######## variables: ##############

LOG_DIRECTORY=/var/log/pgsql/maintenance/

LOG_FILE_NAME=reindex_job.log

PGHOME=/usr/pgsql-$1/bin #Home for PG13: /usr/pgsql-13/bin

fragmentation_threshold=$2

##################################

#exec > >(logger -i -p local1.info) 2>&1

LOG_PATH=$LOG_DIRECTORY$LOG_FILE_NAME

alias FULL_DATE='date "+%Y-%m-%d %T.%N %z"'

mkdir -p /var/log/pgsql/maintenance/

touch $LOG_PATH

export PATH=$PATH:$PGHOME

START_TIMESTAMP=$(echo $(date "+%s.%N"))

printf "\------------------------------------------------------------------------------------------------\n$(FULL_DATE):\tStarting reindex maintenance operation \n" >> $LOG_PATH 2>&1

printf "$(FULL_DATE):\tAdding pgstattuple extension to PostgreSQL's databases if not already added: \

(Attention! \"postgresql<pg version>-contrib\" package must be installed in advance\n\n" >> $LOG_PATH 2>&1

echo $(sudo -u postgres psql -t -c "select datname from pg_database where datname not in ('postgres','template0','template1');"\

| head -n -1 | xargs -I {} sudo -u postgres psql -d{} postgres -c "create extension if not exists pgstattuple;") >> $LOG_PATH 2>&1

printf "\n$(FULL_DATE):\tReindexing target system catalogs indexes:\n" >> $LOG_PATH 2>&1

printf "The index names will be in format: \"<DBName>.[schema name if not public.]<Table Name>.<Index Name>\"\n" >> $LOG_PATH 2>&1

echo $(sudo -u postgres psql -t -c "select datname from pg_database where datname \

not in ('postgres','template0','template1');" | head -n -1 | xargs -I {} bash -c \

"sudo -u postgres psql -t -d{} postgres -c \"SELECT concat('.',indrelid::regclass,'.',relname) \

FROM pg_index AS i JOIN pg_class AS t ON i.indexrelid = t.oid JOIN pg_opclass AS opc \

ON i.indclass[0] = opc.oid JOIN pg_am ON opc.opcmethod = pg_am.oid \

CROSS JOIN LATERAL pgstatindex(i.indexrelid) AS s \

WHERE t.relkind = 'i' AND pg_am.amname = 'btree' and leaf_fragmentation >= $fragmentation_threshold \

and leaf_fragmentation <> 'NaN' and t.relname like 'pg_%'\" | head -n -1 | tr -d ' ' | sed -e 's/^/{}/' | \

xargs -IA ./reindex.sh {} A $LOG_PATH")

END_TIMESTAMP=$(echo $(date "+%s.%N"))

#EXECUTION_TIME=$(expr $(date "+%s") - $START_TIMESTAMP)

EXECUTION_TIME=$(bc <<< "$END_TIMESTAMP-$START_TIMESTAMP")

printf "\n\n$(FULL_DATE):\t Process finished executing. The entire process took $EXECUTION_TIME seconds to complete.\n\n\n\n" >> $LOG_PATH 2>&1

###################################################################

· The subsidiary script in the same directory as the primary script:

###################################################################

exec >> $3 2>&1

printf "\n"

echo $2

TEMP=$2

TEMP=$(echo "${TEMP##*.}")

sudo -u postgres psql -d$1 -c"REINDEX INDEX $TEMP"

##################################################################

The above script also logs some information about output to the following file:

/var/log/pgsql/maintenance/reindex_job.log

--------------------

2) user data reindexing: (once every midnight)

I. For all databases (more convenient):

usage:

/bin/sh reindex_userindexes.sh <PG version> <Target Leaf Fragmentation Threshold (0-100)>

Note:

You must execute this script with /bin/sh

###################################################################

set -x

######## variables: ##############

LOG_PATH=/var/log/pgsql/maintenance/reindex_job.log

PGHOME=/usr/pgsql-$1/bin #Home for PG13: /usr/pgsql-13/bin

fragmentation_threshold=$2

##################################

#exec > >(logger -i -p local1.info) 2>&1

alias FULL_DATE='date "+%Y-%m-%d %T.%N %z"'

mkdir -p /var/log/pgsql/maintenance/

touch /var/log/pgsql/maintenance/reindex_job.log

export PATH=$PATH:$PGHOME

START_TIMESTAMP=$(echo $(date "+%s.%N"))

printf "\------------------------------------------------------------------------------------------------\n$(FULL_DATE):\tStarting reindex maintenance operation \n" >> $LOG_PATH 2>&1

printf "$(FULL_DATE):\tAdding pgstattuple extension to PostgreSQL's databases if not already added: \

(Attention! \"postgresql<pg version>-contrib\" package must be installed in advance\n\n" >> $LOG_PATH 2>&1

echo $(sudo -u postgres psql -t -c "select datname from pg_database where datname not in ('postgres','template0','template1');"\

| head -n -1 | xargs -I {} sudo -u postgres psql -d{} postgres -c "create extension if not exists pgstattuple;") >> $LOG_PATH 2>&1

printf "\n$(FULL_DATE):\tReindexing target user indexes:\n" >> $LOG_PATH 2>&1

echo $(sudo -u postgres psql -t -c "select datname from pg_database where datname \

not in ('postgres','template0','template1');" | head -n -1 | xargs -I {} bash -c "sudo -u postgres psql -t -d{} postgres -c \"SELECT relname \

FROM pg_index AS i JOIN pg_class AS t ON i.indexrelid = t.oid JOIN pg_opclass AS opc \

ON i.indclass[0] = opc.oid JOIN pg_am ON opc.opcmethod = pg_am.oid \

CROSS JOIN LATERAL pgstatindex(i.indexrelid) AS s \

WHERE t.relkind = 'i' AND pg_am.amname = 'btree' and leaf_fragmentation >= $fragmentation_threshold \

and leaf_fragmentation <> 'NaN' and t.relname not like 'pg_%';\" | head -n -1 | \

xargs -I@ sudo -u postgres psql -d{} -c\"REINDEX INDEX CONCURRENTLY @\"") >> $LOG_PATH 2>&1

END_TIMESTAMP=$(echo $(date "+%s.%N"))

#EXECUTION_TIME=$(expr $(date "+%s") - $START_TIMESTAMP)

EXECUTION_TIME=$(bc <<< "$END_TIMESTAMP-$START_TIMESTAMP")

printf "\n$(FULL_DATE):\t Process finished executing. The entire process took $EXECUTION_TIME seconds to complete.\n\n\n" >> $LOG_PATH 2>&1

###################################################################

The above script also logs some information about output to the following file:

/var/log/pgsql/maintenance/reindex_job.log

Sample log output:

-------------------------------------------------------------------------------------------------------------

§ If you want to include index, database, schema, and table’s name too, you should use the following 2 scripts, both in the same directory:

· The main script: The script you call:

usage:

/bin/sh reindex_userindexes.sh <PG version> <Target Leaf Fragmentation Threshold (0-100)>

Note:

You must execute this script with /bin/sh

###################################################################

set -x

######## variables: ##############

LOG_DIRECTORY=/var/log/pgsql/maintenance/

LOG_FILE_NAME=reindex_job.log

PGHOME=/usr/pgsql-$1/bin #Home for PG13: /usr/pgsql-13/bin

fragmentation_threshold=$2

###############################