PostgreSQL's Index Maintenance (B-tree reindex and drop unused indexes)

 This article is not entirely complete yet, but it's still useful:


Reindex Maintenance Operation:

 


Important Points:

1.     All the numbers and intervals in this document are experimental.

2.     All the scripts in this document must be changed mode to executable by using the following command:

chmod +x scriptname.sh

3.     These scripts are intended to run as root or a user which can switch user to ‘postgres’ user passwordlessly.

4.     The scripts must run by /bin/sh like the following:

/bin/sh scriptname.sh

      In some of the bash scripts, you may wonder why there is no while loop. The truth is that there are embedded loops and nested loops, implemented via “bash -c”, “xargs”, or external bash script.



          PostgreSQL provides the index methods B-tree, hash, GiST, spgist, brin and GIN. Default is btree. Users can also define their own index methods, but that is fairly complicated. When the WHERE clause is present, a partial index is created. The default method is btree.

 

          Create index syntax:

 

CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON [ ONLY ] table_name [ USING method ]

    ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass [ ( opclass_parameter = value [, ... ] ) ] ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )

    [ INCLUDE ( column_name [, ...] ) ]

    [ WITH ( storage_parameter [= value] [, ... ] ) ]

    [ TABLESPACE tablespace_name ]

    [ WHERE predicate ]

 

          Alter index syntax:

ALTER INDEX [ IF EXISTS ] name RENAME TO new_name

ALTER INDEX [ IF EXISTS ] name SET TABLESPACE tablespace_name

ALTER INDEX name ATTACH PARTITION index_name

ALTER INDEX name [ NO ] DEPENDS ON EXTENSION extension_name

ALTER INDEX [ IF EXISTS ] name SET ( storage_parameter [= value] [, ... ] )

ALTER INDEX [ IF EXISTS ] name RESET ( storage_parameter [, ... ] )

ALTER INDEX [ IF EXISTS ] name ALTER [ COLUMN ] column_number

    SET STATISTICS integer

ALTER INDEX ALL IN TABLESPACE name [ OWNED BY role_name [, ... ] ]

    SET TABLESPACE new_tablespace [ NOWAIT ]

-------------------------------------------------------------------------------------------------------------

 

          Reindex syntax:

 

REINDEX [ ( option [, ...] ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name

 

where option can be one of:

 

    CONCURRENTLY [ boolean ]

    TABLESPACE new_tablespace

    VERBOSE [ boolean ]

 

alternatively you can use reindexdb binary file for maintenance purposes:

reindexdb:

reindexdb is a wrapper around the SQL command REINDEX(7). There is no effective difference between reindexing databases via this utility and via

       other methods for accessing the server.

File path: PostgreSQL’s executables’ path:

/usr/pgsql-<Major Version Number>/bin/

 

Command-line arguments:

 

Example:

To reindex the table t and the index ind in a database named drpezeshki:

reindexdb --table=t --index=ind drpezeshki

          Normally you don't have to worry about index fragmentation at all. However, if there has been a mass delete or update, or the sustained change rate was so high that autovacuum couldn't keep up, you may end up with a badly bloated index.

 

          Find boated/fragmented indexes:

o          Query current extensions with:

select * from pg_extension;

select * from pg_available_extensions;

          Extensions for index management

1)        pg_repack_

 

Reorganize tables in PostgreSQL databases with minimal locks

 is a PostgreSQL extension which lets you remove bloat from

tables and indexes, and optionally restore the physical order of clustered

indexes. Unlike CLUSTER_ and `VACUUM FULL`_ it works online, without

holding an exclusive lock on the processed tables during processing.

pg_repack is efficient to boot, with performance comparable to using

CLUSTER directly.

2)        pgstattuple:

/usr/pgsql-13/share/extension/pg_stattuple.control

dnf -y install postgresql-contrib

#connect to the desired database and issue the subsequent command:

CREATE EXTENSION pgstattuple;

o          issue the following command:

SELECT * FROM pgstatindex('index name');

Sample output:

 

 

 

query to get the leaf fragmentation of all B-tree indexes:

----------------------------------------------------------------------------------------------------------------

SELECT i.indexrelid::regclass,

       s.leaf_fragmentation

FROM pg_index AS i

   JOIN pg_class AS t ON i.indexrelid = t.oid

   JOIN pg_opclass AS opc ON i.indclass[0] = opc.oid

   JOIN pg_am ON opc.opcmethod = pg_am.oid

   CROSS JOIN LATERAL pgstatindex(i.indexrelid) AS s

WHERE t.relkind = 'i'

  AND pg_am.amname = 'btree';

-------------------------------------------------------------------------------------------------------------

 

Using these extensions automation tasks can be planned and programmed to defragment indexes periodically like SQL Server

 

Related performance parameters:

maintenance_work_mem

max_parallel_maintenance_workers

effective_cache_size

Example:

          We have a table of more than 200,000 rows with the following definition:

Dataset exists on the following link:

https://www.kaggle.com/speedoheck/inpatient-hospital-charges

---------------------------------------------------

create table inpatientcharges

(

            drg_definition varchar(200),

            provider_id integer,

            provider_name varchar(200),

            provider_street_address varchar(200),

            provider_city varchar(50),

            provider_state varchar(10),

            provider_zip_code integer,

            hospital_referral_region_description varchar(50),

            total_discharges integer,

            average_covered_charges varchar(20),

            average_total_payments varchar(20),

            average_medicare_payments varchar(20)

);

 

alter table inpatientcharges owner to postgres;

---- Load the data into table

---- Adding an identity column

ALTER TABLE inpatientCharges

    ADD COLUMN id int                -- int or bigint or smallint

        GENERATED BY DEFAULT AS IDENTITY;

--------------------------------------------------------------------------------------------------

          create index on the identity column:

-- create unique index concurrently ic on inpatientcharges (id asc nulls last) with (fillfactor=80);        -- default is 90

-------------------------------------------------

If we want to turn it to the primary key, and to recreate a primary key constraint, without blocking updates while the index is rebuilt:

CREATE UNIQUE INDEX CONCURRENTLY ic ON inpatientCharges (id asc nulls last) include (drg_definition,      provider_id, provider_name,provider_street_address,provider_city, provider_state,provider_zip_code,hospital_referral_region_description,total_discharges,average_covered_charges,average_total_payments,average_medicare_payments) with (fillfactor = 80);

ALTER TABLE inpatientCharges

-- DROP CONSTRAINT ic,

    ADD CONSTRAINT ic PRIMARY KEY USING INDEX ic;

----------------------------------------------------------------------------------------------------------

You can issue this command to show indexes that belong to table:

\d table_name

----------------------------------------------------------------------------------------------------------

use this command to import dataset:

COPY inpatientCharges(drg_definition,provider_id,provider_name,provider_street_address,provider_city,provider_state,provider_zip_code,hospital_referral_region_description,total_discharges,average_covered_charges,average_total_payments,average_medicare_payments)

FROM '/var/lib/pgsql/data/inpatientCharges.csv'

DELIMITER ','

CSV HEADER;

-------------------------------------------------------------------------------------------------

SELECT

    tablename,

    indexname,

    indexdef

FROM

    pg_indexes

WHERE

    schemaname = 'public'

ORDER BY

    tablename,

    indexname;

------------------------------------------------------------------------------------------------- 

create a new non-primary key index:

---------------------------------------------------------------------------------------------------

CREATE INDEX CONCURRENTLY icpid ON inpatientCharges (provider_id asc nulls last) include (id,      provider_name,provider_city, provider_state,provider_zip_code,total_discharges,average_covered_charges,average_total_payments,average_medicare_payments) with (fillfactor = 80);

--------------------------------------------------------------------------------------------------

Issue the following command given before to extract fragmentations by name and fragmentation is bigger than 40 for example:

--------------------------------------------------------------------------------------------------

SELECT i.indexrelid::regclass,

       s.leaf_fragmentation

FROM pg_index AS i

   JOIN pg_class AS t ON i.indexrelid = t.oid

   JOIN pg_opclass AS opc ON i.indclass[0] = opc.oid

   JOIN pg_am ON opc.opcmethod = pg_am.oid

   CROSS JOIN LATERAL pgstatindex(i.indexrelid) AS s

WHERE t.relkind = 'i'

  AND pg_am.amname = 'btree' and leaf_fragmentation > 40 and leaf_fragmentation <> 'NaN' order by leaf_fragmentation desc;

 

------- in one line for dynamic query:

-- user data

SELECT i.indexrelid::regclass FROM pg_index AS i JOIN pg_class AS t ON i.indexrelid = t.oid JOIN pg_opclass AS opc ON i.indclass[0] = opc.oid   JOIN pg_am ON opc.opcmethod = pg_am.oid   CROSS JOIN LATERAL pgstatindex(i.indexrelid) AS s WHERE t.relkind = 'i'  AND pg_am.amname = 'btree' and leaf_fragmentation > 0 and leaf_fragmentation <> 'NaN' and t.relname not like 'pg_%' order by leaf_fragmentation desc;

-- system catalogs

SELECT i.indexrelid::regclass FROM pg_index AS i JOIN pg_class AS t ON i.indexrelid = t.oid JOIN pg_opclass AS opc ON i.indclass[0] = opc.oid   JOIN pg_am ON opc.opcmethod = pg_am.oid   CROSS JOIN LATERAL pgstatindex(i.indexrelid) AS s WHERE t.relkind = 'i'  AND pg_am.amname = 'btree' and leaf_fragmentation > 0 and leaf_fragmentation <> 'NaN' and t.relname like 'pg_%' order by leaf_fragmentation desc;

Fragmentation of indexes in the above example:

SELECT i.indexrelid::regclass, s.leaf_fragmentation FROM pg_index AS i JOIN pg_class AS t ON i.indexrelid = t.oid JOIN pg_opclass AS opc ON i.indclass[0] = opc.oid   JOIN pg_am ON opc.opcmethod = pg_am.oid   CROSS JOIN LATERAL pgstatindex(i.indexrelid) AS s WHERE t.relkind = 'i'  AND pg_am.amname = 'btree' and relname = 'ic' or relname='icpid';

select 'path.to.object'::regclass::oid

select 'ic'::regclass::oid

------------------------------------------------------------------------------------------------

Alternative query to fetch fragmented users only indexes’ list:

I do not vow that this query returns all fragmented indexes. It’s slower too! Someone suggested this query to me so I include it here:

SELECT i.indexrelid::regclass,

       s.leaf_fragmentation

FROM pg_index AS i

   JOIN pg_class AS t ON i.indexrelid = t.oid

   JOIN pg_opclass AS opc ON i.indclass[0] = opc.oid

   JOIN pg_am ON opc.opcmethod = pg_am.oid

   CROSS JOIN LATERAL pgstatindex(i.indexrelid) AS s

WHERE t.relkind = 'i'

  AND pg_am.amname = 'btree' and leaf_fragmentation > 30 and leaf_fragmentation <> 'NaN' order by leaf_fragmentation desc;

 


 

 

Now you can automate reindexing:

System catalog indexes cannot be rebuilt concurrently, thus we have separated user indexes and system catalogs reindexing

1) Script to reindex system catalogs (Not concurrent, thus midnights and weekends are suggested)

usage:

/bin/sh reindex_systemcatalogs.sh <PG Version> <Fragmentation Threshold> <DBName>

 

###################################################################

set -x

 

######## variables: ##############

 

PGHOME=/usr/pgsql-$1/bin      #Home for PG13: /usr/pgsql-13/bin

fragmentation_threshold=$2

dbname=$3

 

##################################

 

export PATH=$PATH:$PGHOME

psqlc="sudo -u postgres psql -d$dbname"

$psqlc -c "SELECT i.indexrelid::regclass FROM pg_index AS i JOIN pg_class AS t ON i.indexrelid = t.oid JOIN pg_opclass AS opc ON i.indclass[0] = opc.oid   JOIN pg_am ON opc.opcmethod = pg_am.oid   CROSS JOIN LATERAL pgstatindex(i.indexrelid) AS s WHERE t.relkind = 'i'  AND pg_am.amname = 'btree' and leaf_fragmentation >= $fragmentation_threshold and leaf_fragmentation <> 'NaN' and t.relname like 'pg_%' order by leaf_fragmentation desc;" | tail -n +3 | head -n -2 | xargs -I {} sudo -u postgres psql -d$dbname -c "REINDEX (verbose) INDEX {}"

###################################################################

 

 

 

§  If you want to include index, database, schema, and table’s name too, you should use the following 2 scripts, both in the same directory:

·        The main script: The script you call:

usage:

/bin/sh reindex_sysindexes.sh <PG version> <Target Leaf Fragmentation Threshold (0-100)>

Note:

You must execute this script with /bin/sh

###################################################################

set -x

 

######## variables: ##############

 

LOG_DIRECTORY=/var/log/pgsql/maintenance/

LOG_FILE_NAME=reindex_job.log

PGHOME=/usr/pgsql-$1/bin        #Home for PG13: /usr/pgsql-13/bin

fragmentation_threshold=$2

 

##################################

 

#exec > >(logger -i -p local1.info) 2>&1

 

LOG_PATH=$LOG_DIRECTORY$LOG_FILE_NAME

alias FULL_DATE='date "+%Y-%m-%d %T.%N %z"'

mkdir -p /var/log/pgsql/maintenance/

touch $LOG_PATH

export PATH=$PATH:$PGHOME

 

 

START_TIMESTAMP=$(echo $(date "+%s.%N"))

 

 

printf "\------------------------------------------------------------------------------------------------\n$(FULL_DATE):\tStarting reindex maintenance operation  \n" >> $LOG_PATH 2>&1

 

 

 

printf "$(FULL_DATE):\tAdding pgstattuple extension to PostgreSQL's databases if not already added: \

(Attention! \"postgresql<pg version>-contrib\" package must be installed in advance\n\n" >> $LOG_PATH 2>&1

echo $(sudo -u postgres psql -t -c "select datname from pg_database where datname not in ('postgres','template0','template1');"\

 | head -n -1 | xargs -I {} sudo -u postgres psql -d{} postgres -c "create extension if not exists pgstattuple;") >> $LOG_PATH 2>&1

 

printf "\n$(FULL_DATE):\tReindexing target system catalogs indexes:\n" >> $LOG_PATH 2>&1

printf "The index names will be in format: \"<DBName>.[schema name if not public.]<Table Name>.<Index Name>\"\n" >> $LOG_PATH 2>&1

 

 

 

echo $(sudo -u postgres psql -t -c "select datname from pg_database where datname \

 not in ('postgres','template0','template1');" | head -n -1 | xargs -I {}  bash -c \

 "sudo -u postgres psql -t -d{} postgres -c \"SELECT concat('.',indrelid::regclass,'.',relname) \

 FROM pg_index AS i JOIN pg_class AS t ON i.indexrelid = t.oid JOIN pg_opclass AS opc \

 ON i.indclass[0] = opc.oid   JOIN pg_am ON opc.opcmethod = pg_am.oid   \

 CROSS JOIN LATERAL pgstatindex(i.indexrelid) AS s \

 WHERE t.relkind = 'i'  AND pg_am.amname = 'btree' and leaf_fragmentation >= $fragmentation_threshold \

 and leaf_fragmentation <> 'NaN' and t.relname like 'pg_%'\" | head -n -1 | tr -d ' ' | sed -e 's/^/{}/' | \

 xargs -IA ./reindex.sh {} A $LOG_PATH")

 

 

 

 

END_TIMESTAMP=$(echo $(date "+%s.%N"))

#EXECUTION_TIME=$(expr $(date "+%s") - $START_TIMESTAMP)

EXECUTION_TIME=$(bc <<< "$END_TIMESTAMP-$START_TIMESTAMP")

 

 

printf "\n\n$(FULL_DATE):\t Process finished executing. The entire process took $EXECUTION_TIME seconds to complete.\n\n\n\n" >> $LOG_PATH 2>&1

###################################################################

·        The subsidiary script in the same directory as the primary script:

###################################################################

exec >> $3 2>&1

printf "\n"

echo $2

 

TEMP=$2

TEMP=$(echo "${TEMP##*.}")

sudo -u postgres psql -d$1 -c"REINDEX INDEX $TEMP"

##################################################################

The above script also logs some information about output to the following file:

/var/log/pgsql/maintenance/reindex_job.log


 

 

 

 

--------------------

2) user data reindexing: (once every midnight)

                                I.            For all databases (more convenient):

usage:

/bin/sh reindex_userindexes.sh <PG version> <Target Leaf Fragmentation Threshold (0-100)>

Note:

You must execute this script with /bin/sh

###################################################################

set -x

 

######## variables: ##############

 

LOG_PATH=/var/log/pgsql/maintenance/reindex_job.log

PGHOME=/usr/pgsql-$1/bin        #Home for PG13: /usr/pgsql-13/bin

fragmentation_threshold=$2

 

##################################

 

#exec > >(logger -i -p local1.info) 2>&1

 

alias FULL_DATE='date "+%Y-%m-%d %T.%N %z"'

mkdir -p /var/log/pgsql/maintenance/

touch /var/log/pgsql/maintenance/reindex_job.log

export PATH=$PATH:$PGHOME

 

 

START_TIMESTAMP=$(echo $(date "+%s.%N"))

 

 

printf "\------------------------------------------------------------------------------------------------\n$(FULL_DATE):\tStarting reindex maintenance operation  \n" >> $LOG_PATH 2>&1

 

 

 

printf "$(FULL_DATE):\tAdding pgstattuple extension to PostgreSQL's databases if not already added: \

(Attention! \"postgresql<pg version>-contrib\" package must be installed in advance\n\n" >> $LOG_PATH 2>&1

echo $(sudo -u postgres psql -t -c "select datname from pg_database where datname not in ('postgres','template0','template1');"\

 | head -n -1 | xargs -I {} sudo -u postgres psql -d{} postgres -c "create extension if not exists pgstattuple;") >> $LOG_PATH 2>&1

 

printf "\n$(FULL_DATE):\tReindexing target user indexes:\n" >> $LOG_PATH 2>&1

echo $(sudo -u postgres psql -t -c "select datname from pg_database where datname \

 not in ('postgres','template0','template1');" | head -n -1 | xargs -I {}  bash -c "sudo -u postgres psql -t -d{} postgres -c \"SELECT relname \

 FROM pg_index AS i JOIN pg_class AS t ON i.indexrelid = t.oid JOIN pg_opclass AS opc \

 ON i.indclass[0] = opc.oid   JOIN pg_am ON opc.opcmethod = pg_am.oid   \

 CROSS JOIN LATERAL pgstatindex(i.indexrelid) AS s \

 WHERE t.relkind = 'i'  AND pg_am.amname = 'btree' and leaf_fragmentation >= $fragmentation_threshold \

 and leaf_fragmentation <> 'NaN' and t.relname not like 'pg_%';\" | head -n -1 | \

 xargs -I@ sudo -u postgres psql -d{} -c\"REINDEX INDEX CONCURRENTLY @\"") >> $LOG_PATH 2>&1

 

 

END_TIMESTAMP=$(echo $(date "+%s.%N"))

#EXECUTION_TIME=$(expr $(date "+%s") - $START_TIMESTAMP)

EXECUTION_TIME=$(bc <<< "$END_TIMESTAMP-$START_TIMESTAMP")

 

 

printf "\n$(FULL_DATE):\t Process finished executing. The entire process took $EXECUTION_TIME seconds to complete.\n\n\n" >> $LOG_PATH 2>&1

###################################################################

The above script also logs some information about output to the following file:

/var/log/pgsql/maintenance/reindex_job.log

Sample log output:

-------------------------------------------------------------------------------------------------------------

§  If you want to include index, database, schema, and table’s name too, you should use the following 2 scripts, both in the same directory:

·        The main script: The script you call:

usage:

/bin/sh reindex_userindexes.sh <PG version> <Target Leaf Fragmentation Threshold (0-100)>

Note:

You must execute this script with /bin/sh

###################################################################

set -x

 

######## variables: ##############

 

LOG_DIRECTORY=/var/log/pgsql/maintenance/

LOG_FILE_NAME=reindex_job.log

PGHOME=/usr/pgsql-$1/bin        #Home for PG13: /usr/pgsql-13/bin

fragmentation_threshold=$2

 

###############################

 

#exec > >(logger -i -p local1.info) 2>&1

 

LOG_PATH=$LOG_DIRECTORY$LOG_FILE_NAME

alias FULL_DATE='date "+%Y-%m-%d %T.%N %z"'

mkdir -p /var/log/pgsql/maintenance/

touch $LOG_PATH

export PATH=$PATH:$PGHOME

 

 

START_TIMESTAMP=$(echo $(date "+%s.%N"))

 

 

printf "\------------------------------------------------------------------------------------------------\n$(FULL_DATE):\tStarting reindex maintenance operation  \n" >> $LOG_PATH 2>&1

 

 

 

printf "$(FULL_DATE):\tAdding pgstattuple extension to PostgreSQL's databases if not already added: \

(Attention! \"postgresql<pg version>-contrib\" package must be installed in advance\n\n" >> $LOG_PATH 2>&1

echo $(sudo -u postgres psql -t -c "select datname from pg_database where datname not in ('postgres','template0','template1');"\

 | head -n -1 | xargs -I {} sudo -u postgres psql -d{} postgres -c "create extension if not exists pgstattuple;") >> $LOG_PATH 2>&1

 

printf "\n$(FULL_DATE):\tReindexing target user indexes:\n" >> $LOG_PATH 2>&1

printf "The index names will be in format: \"<DBName>.[schema name if not public.]<Table Name>.<Index Name>\"\n" >> $LOG_PATH 2>&1

 

 

 

echo $(sudo -u postgres psql -t -c "select datname from pg_database where datname \

 not in ('postgres','template0','template1');" | head -n -1 | xargs -I {}  bash -c \

 "sudo -u postgres psql -t -d{} postgres -c \"SELECT concat('.',indrelid::regclass,'.',relname) \

 FROM pg_index AS i JOIN pg_class AS t ON i.indexrelid = t.oid JOIN pg_opclass AS opc \

 ON i.indclass[0] = opc.oid   JOIN pg_am ON opc.opcmethod = pg_am.oid   \

 CROSS JOIN LATERAL pgstatindex(i.indexrelid) AS s \

 WHERE t.relkind = 'i'  AND pg_am.amname = 'btree' and leaf_fragmentation >= $fragmentation_threshold \

 and leaf_fragmentation <> 'NaN' and t.relname not like 'pg_%'\" | head -n -1 | tr -d ' ' | sed -e 's/^/{}/' | \

 xargs -IA ./reindex.sh {} A $LOG_PATH")

 

 

 

 

END_TIMESTAMP=$(echo $(date "+%s.%N"))

#EXECUTION_TIME=$(expr $(date "+%s") - $START_TIMESTAMP)

EXECUTION_TIME=$(bc <<< "$END_TIMESTAMP-$START_TIMESTAMP")

 

 

printf "\n\n$(FULL_DATE):\t Process finished executing. The entire process took $EXECUTION_TIME seconds to complete.\n\n\n\n" >> $LOG_PATH 2>&1

###################################################################

·        The subsidiary script in the same directory as the primary script, name it reindex.sh and change its mode to executable by issuing ‘chmod +x reindex.sh’:

###################################################################

exec >> $3 2>&1

printf "\n"

echo $2

 

TEMP=$2

TEMP=$(echo "${TEMP##*.}")

sudo -u postgres psql -d$1 -c"REINDEX INDEX CONCURRENTLY $TEMP"

##################################################################

Sample log output:

-------------------------------------------------------------------------------------------------------------

 

 

                             II.            For one specific database:

usage:

/bin/sh reindex_userdata.sh <PG Version> <Fragmentation Threshold> <DBName>

 

###################################################################

set -x

 

######## variables: ##############

 

PGHOME=/usr/pgsql-$1/bin      #Home for PG13: /usr/pgsql-13/bin

fragmentation_threshold=$2

dbname=$3

 

##################################

 

export PATH=$PATH:$PGHOME

psqlc="sudo -u postgres psql -d$dbname"

$psqlc -c "SELECT i.indexrelid::regclass FROM pg_index AS i JOIN pg_class AS t ON i.indexrelid = t.oid JOIN pg_opclass AS opc ON i.indclass[0] = opc.oid   JOIN pg_am ON opc.opcmethod = pg_am.oid   CROSS JOIN LATERAL pgstatindex(i.indexrelid) AS s WHERE t.relkind = 'i'  AND pg_am.amname = 'btree' and leaf_fragmentation > $fragmentation_threshold and leaf_fragmentation <> 'NaN' and t.relname not like 'pg_%' order by leaf_fragmentation desc;" | tail -n +3 | head -n -2 | xargs -I {} sudo -u postgres psql -d$dbname -c "REINDEX (verbose) INDEX CONCURRENTLY {}"

###################################################################

Logging is not provided for the above script.

------------------------------------------------------------------------------------------------

 The alternative query which someone suggested to me has not been turned into a script as it is slower.

 

 

Find unused indexes:

------------------------------------------------------------------------------------------------

SELECT s.schemaname,

       s.relname AS tablename,

       s.indexrelname AS indexname,

       pg_relation_size(s.indexrelid) / 1048576.0 AS index_size

FROM pg_catalog.pg_stat_user_indexes s

   JOIN pg_catalog.pg_index i ON s.indexrelid = i.indexrelid

WHERE s.idx_scan = 0      -- has never been scanned

  AND 0 <>ALL (i.indkey)  -- no index column is an expression

  AND NOT i.indisunique   -- is not a UNIQUE index

  AND NOT EXISTS          -- does not enforce a constraint

         (SELECT 1 FROM pg_catalog.pg_constraint c

          WHERE c.conindid = s.indexrelid)

ORDER BY pg_relation_size(s.indexrelid) DESC;

 

------- in one line for dynamic query:

SELECT concat(s.schemaname,'.',s.indexrelname) FROM pg_catalog.pg_stat_user_indexes s JOIN pg_catalog.pg_index i ON s.indexrelid = i.indexrelid WHERE s.idx_scan = 0 AND 0 <>ALL (i.indkey) AND NOT i.indisunique AND NOT EXISTS (SELECT 1 FROM pg_catalog.pg_constraint c WHERE c.conindid = s.indexrelid) ORDER BY pg_relation_size(s.indexrelid) DESC;

 

-------------------------------------------------------------------------------------------------------------

or:

-------------------------------------------------------------------------------------------------------------

SELECT

    relid::regclass AS table,

    indexrelid::regclass AS index,

    pg_size_pretty(pg_relation_size(indexrelid::regclass)) AS index_size,

    idx_tup_read,

    idx_tup_fetch,

    idx_scan

FROM pg_stat_user_indexes

JOIN pg_index USING (indexrelid)

WHERE idx_scan = 0

AND indisunique IS FALSE

order by index_size desc

;

 

 

---------------

3) drop unused indexes (once every 2 weeks):

You have to periodically reset your index usage statistics by issuing ‘pg_stat_reset()’ command. So two scripts come in the way. The script (I) shall execute ‘pg_stat_reset()’ and script (II) would drop the indexes, however, I combined these two operations into one script below. According to the load of your database, you can schedule the execution of the script, say every month:

usage:

/bin/sh drop_unused_idx.sh <DBName>

 

#####Script (I)&(II) combined##########################################

#set -x

 

dbname=$1

 

psqlc="sudo -u postgres psql -d$dbname"

 

$psqlc -c "SELECT concat(s.schemaname,'.',s.indexrelname) FROM pg_catalog.pg_stat_user_indexes s JOIN pg_catalog.pg_index i ON s.indexrelid = i.indexrelid WHERE s.idx_scan = 0 AND 0 <>ALL (i.indkey) AND NOT i.indisunique AND NOT EXISTS (SELECT 1 FROM pg_catalog.pg_constraint c WHERE c.conindid = s.indexrelid) ORDER BY pg_relation_size(s.indexrelid) DESC;" | tail -n +3 | head -n -2 | xargs -I {} $psqlc -c'drop index {}'

$psqlc -c “select pg_stat_reset()”

 

###################################################################

 

 

 Some fragmentation workload:

create temp table tt as select * from inpatientCharges where id % 3 = 0; delete from inpatientCharges where id % 3 = 0; insert into inpatientCharges select * from tt;

 


 

 

 

 

References:

1)        https://www.postgresql.org/docs/14/sql-createindex.html

create index

2)        https://www.postgresql.org/docs/14/sql-reindex.html

rebuild index

3)        https://www.postgresql.org/docs/14/spgist-builtin-opclasses.html

Built-in operator classes

4)        https://www.postgresql.org/docs/14/sql-createopclass.html

create operator class

5)        https://www.postgresql.org/docs/14/btree-implementation.html#BTREE-DEDUPLICATION

b-tree deduplication

6)        https://www.postgresql.org/docs/14/runtime-config-client.html#GUC-GIN-PENDING-LIST-LIMIT

gin_pending_list_limit

7)        https://www.postgresql.org/docs/14/runtime-config-resource.html#GUC-MAINTENANCE-WORK-MEM

maintenance_work_mem

8)        https://www.postgresql.org/docs/14/runtime-config-resource.html#GUC-MAX-PARALLEL-MAINTENANCE-WORKERS

max_parallel_maintenance_workers

9)        https://www.postgresql.org/docs/14/runtime-config-query.html#GUC-EFFECTIVE-CACHE-SIZE

effective_cache_size

10)      https://www.postgresql.org/docs/14/runtime-config-client.html#GUC-GIN-PENDING-LIST-LIMIT

gin_pending_list_limit

11)      https://www.postgresql.org/docs/current/indexes-opclass.html

indexes-opclass

12)      https://www.postgresql.org/docs/13/pgstattuple.html

pgstattuple

13)      https://reorg.github.io/pg_repack/1.3/

pg_repack fork of pg_reorg_

14)      https://github.com/pgxn/pgxnclient

PGXN Client: A command line tool to interact with the PostgreSQL Extension Network

15)      https://www.postgresql.org/docs/13/ecpg-dynamic.html

PostgreSQL dynamic query

16)      https://www.cybertec-postgresql.com/en/get-rid-of-your-unused-indexes/

 

 


Comments

Popular posts from this blog

Setup barman 2.14 for PostgreSQL

A word about Microsoft's naming about "Replication" and "AlwaysOn"