Introduction to PostgreSQL Arrays

Posted on 2024-01-14, by Racum. Tags: PostgresSQL

Table of Contents

Arrays in PostgreSQL are very useful to represent lists of data without the need of extra “many-to-many” tables, but to take advantage of them, you need to know the basics; follow this article to learn them.

Fields

Arrays can be applied in all field types (including user-defined), but they are more commonly used over numbers (integers and floats) and strings (text, varchar, etc, usually as a “fake-enum”). To define them, just add a [] after its type:

create table countries (
    code text,
    name text,
    languages text[]
);

In this example, the field languages is defined as text[], meaning an “array of texts”. It is a good practice to use array fields in the plural, or other variation that convey the meaning of multiplicity.

postgres=# \d countries
              Table "public.countries"
  Column   |  Type  | Collation | Nullable | Default
-----------+--------+-----------+----------+---------
 code      | text   |           | not null |
 name      | text   |           | not null |
 languages | text[] |           |          |
Indexes:
    "countries_pkey" PRIMARY KEY, btree (code)

Arrays can be multi-dimensional, for example: integer[][] creates a 2-dimensional table of integers, but because this use not very common, I’ll not cover the details on this article.

Inserting

When you just inserting a list of values into an array, you can use the literal format with no need of casting: ’{a, b, c…}’:

insert into countries values
    ('ch', 'Switzerland', '{german, french, italian, romansh}'),
    ('it', 'Italy', '{italian, sardinian, german}'),
    ('es', 'Spain', '{spanish, catalan, galician, basque}'),
    ('py', 'Paraguay', '{spanish, guarani}');

You can also use the array[a, b, c…] constructor:

insert into countries values ('be', 'Belgium', array['dutch', 'french']);

And this is how they appear on psql:

postgres=# select * from countries;
 code |    name     |             languages
------+-------------+-----------------------------------
 ch   | Switzerland | {german,french,italian,romansh}
 it   | Italy       | {italian,sardinian,german}
 es   | Spain       | {spanish,catalan,galician,basque}
 py   | Paraguay    | {spanish,guarani}
 be   | Belgium     | {dutch,french}
(5 rows)

Selecting

Position

Arrays in PostgreSQL are 1-indexed, meaning, they start with [1] instead of [0], unlike most programing languages, for example, to get the ”first language” the syntax is language[1].

To get the ”last element” it is a bit more involved: we first need to get the index of the last value with array_upper(language, 1), and use it to as an index, like this: languages[array_upper(languages, 1)]; this 1 at the end is the dimension we are interested in (remember arrays can be multi-dimensional?).

select
    name,
    languages,
    languages[1] as first_language,
    languages[array_upper(languages, 1)] as last_language
from countries;

    name     |             languages             | first_language | last_language
-------------+-----------------------------------+----------------+---------------
 Switzerland | {german,french,italian,romansh}   | german         | romansh
 Italy       | {italian,sardinian,german}        | italian        | german
 Spain       | {spanish,catalan,galician,basque} | spanish        | basque
 Paraguay    | {spanish,guarani}                 | spanish        | guarani
 Belgium     | {dutch,french}                    | dutch          | french
(5 rows)

Contains

The symbol for the ”contains” operator is @>, and it behaves like the in operator from Python:

select * from countries where languages @> '{spanish}';

 code |   name   |             languages
------+----------+-----------------------------------
 es   | Spain    | {spanish,catalan,galician,basque}
 py   | Paraguay | {spanish,guarani}
(2 rows)

The example above matches a single value (still notated as an array), but the @> operator can work with multiple values, like this:

select * from countries where languages @> '{spanish, galician}';

 code | name  |             languages
------+-------+-----------------------------------
 es   | Spain | {spanish,catalan,galician,basque}
(1 row)

In this case both of the values need to be present.

Intersects

The symbol for ”overlap” (or ”intersects”) is &&, it behaves similar to “contains” but instead or matching both, it matches any:

select * from countries where languages && '{spanish, italian}';

 code |    name     |             languages
------+-------------+-----------------------------------
 ch   | Switzerland | {german,french,italian,romansh}
 it   | Italy       | {italian,sardinian,german}
 es   | Spain       | {spanish,catalan,galician,basque}
 py   | Paraguay    | {spanish,guarani}
(4 rows)

Unnesting

“Unnest” an array mean “split every element in its own row”, this is the basis of many interesting operations:

select code, name, unnest(languages) from countries;

 code |    name     |  unnest
------+-------------+-----------
 ch   | Switzerland | german
 ch   | Switzerland | french
 ch   | Switzerland | italian
 ch   | Switzerland | romansh
 it   | Italy       | italian
 it   | Italy       | sardinian
 it   | Italy       | german
 es   | Spain       | spanish
 es   | Spain       | catalan
 es   | Spain       | galician
 es   | Spain       | basque
 py   | Paraguay    | spanish
 py   | Paraguay    | guarani
 be   | Belgium     | dutch
 be   | Belgium     | french
(15 rows)

Combining (with deduplicaiton and sorting)

The inverse operation of “unnest” is array_agg(). On the example below, we are unnesting in the inner select, using distinct to remove duplicates and order by to sort; after that we “re-nest” (or “aggregate”) the values back in a single array with array_agg() on the outer select:

select array_agg(lang) as all_languages
from (
    select distinct unnest(languages) as lang
    from countries
    order by lang
);

                                      all_languages
-----------------------------------------------------------------------------------------
 {basque,catalan,dutch,french,galician,german,guarani,italian,romansh,sardinian,spanish}
(1 row)

Grouping by nested values

Unnested arrays behave like in-memory tables, you can apply all other valid SQL operations over it, for example, we can group by language to give us an inverted array containing the countries that match them:

select lang, array_agg(name) as spoken_in
from (
    select name, unnest(languages) as lang
    from countries
)
group by lang order by lang;

   lang    |       spoken_in
-----------+-----------------------
 basque    | {Spain}
 catalan   | {Spain}
 dutch     | {Belgium}
 french    | {Switzerland,Belgium}
 galician  | {Spain}
 german    | {Switzerland,Italy}
 guarani   | {Paraguay}
 italian   | {Switzerland,Italy}
 romansh   | {Switzerland}
 sardinian | {Italy}
 spanish   | {Spain,Paraguay}
(11 rows)

Updating

Adding

You can add elements at the beginning of the array with array_prepend() or at the end with array_append():

update countries
set languages = array_append(languages, 'esperanto')
where code = 'be';

Changes "languages" like this: {dutch, french} ➤ {dutch, french, esperanto}.

Replacing

You can replace a single element by index, like languages[3] = ‘german’, or replace it by value with array_replace(field, from, to), as in:

update countries
set languages = array_replace(languages, 'esperanto', 'german')
where code = 'be';

Changes "languages" like this: {dutch, french, esperanto} ➤ {dutch, french, german}.

Removing

Remove elements with array_remove():

update countries
set languages = array_remove(languages, 'german')
where code = 'be';

Changes "languages" like this: {dutch, french, german} ➤ {dutch, french}.

Going deeper

Please refer to the main Array documentation and the Array Functions and Operators, they both provide more comprehensive explanations and examples, including things I didn’t cover here, like multi-dimension arrays, slices, casting, etc.

If you are looking for other compound data-types, PostgreSQL also provides hstore for simple key-value pairs and JSON for a full hierarchical JSON representation; they are both very useful and deserve their own articles!