Introduction to PostgreSQL Arrays
Posted on 2024-01-14, by Racum.
Arrays in PostgreSQL are very useful to represent lists of data without the need of extra “many-to-many” tables, but to take advantage of them, you need to know the basics; follow this article to learn them.
Fields
Arrays can be applied in all field types (including user-defined), but they are more commonly used over numbers (integers and floats) and strings (text, varchar, etc, usually as a “fake-enum”). To define them, just add a []
after its type:
create table countries (
code text,
name text,
languages text[]
);
In this example, the field languages
is defined as text[]
, meaning an “array of texts”. It is a good practice to use array fields in the plural, or other variation that convey the meaning of multiplicity.
postgres=# \d countries
Table "public.countries"
Column | Type | Collation | Nullable | Default
-----------+--------+-----------+----------+---------
code | text | | not null |
name | text | | not null |
languages | text[] | | |
Indexes:
"countries_pkey" PRIMARY KEY, btree (code)
Arrays can be multi-dimensional, for example: integer[][]
creates a 2-dimensional table of integers, but because this use not very common, I’ll not cover the details on this article.
Inserting
When you just inserting a list of values into an array, you can use the literal format with no need of casting: ’{a, b, c…}’
:
insert into countries values
('ch', 'Switzerland', '{german, french, italian, romansh}'),
('it', 'Italy', '{italian, sardinian, german}'),
('es', 'Spain', '{spanish, catalan, galician, basque}'),
('py', 'Paraguay', '{spanish, guarani}');
You can also use the array[a, b, c…]
constructor:
insert into countries values ('be', 'Belgium', array['dutch', 'french']);
And this is how they appear on psql:
postgres=# select * from countries;
code | name | languages
------+-------------+-----------------------------------
ch | Switzerland | {german,french,italian,romansh}
it | Italy | {italian,sardinian,german}
es | Spain | {spanish,catalan,galician,basque}
py | Paraguay | {spanish,guarani}
be | Belgium | {dutch,french}
(5 rows)
Selecting
Position
Arrays in PostgreSQL are 1-indexed, meaning, they start with [1]
instead of [0]
, unlike most programing languages, for example, to get the ”first language” the syntax is language[1]
.
To get the ”last element” it is a bit more involved: we first need to get the index of the last value with array_upper(language, 1)
, and use it to as an index, like this: languages[array_upper(languages, 1)]
; this 1
at the end is the dimension we are interested in (remember arrays can be multi-dimensional?).
select
name,
languages,
languages[1] as first_language,
languages[array_upper(languages, 1)] as last_language
from countries;
name | languages | first_language | last_language
-------------+-----------------------------------+----------------+---------------
Switzerland | {german,french,italian,romansh} | german | romansh
Italy | {italian,sardinian,german} | italian | german
Spain | {spanish,catalan,galician,basque} | spanish | basque
Paraguay | {spanish,guarani} | spanish | guarani
Belgium | {dutch,french} | dutch | french
(5 rows)
Contains
The symbol for the ”contains” operator is @>
, and it behaves like the in
operator from Python:
select * from countries where languages @> '{spanish}';
code | name | languages
------+----------+-----------------------------------
es | Spain | {spanish,catalan,galician,basque}
py | Paraguay | {spanish,guarani}
(2 rows)
The example above matches a single value (still notated as an array), but the @>
operator can work with multiple values, like this:
select * from countries where languages @> '{spanish, galician}';
code | name | languages
------+-------+-----------------------------------
es | Spain | {spanish,catalan,galician,basque}
(1 row)
In this case both of the values need to be present.
Intersects
The symbol for ”overlap” (or ”intersects”) is &&
, it behaves similar to “contains” but instead or matching both, it matches any:
select * from countries where languages && '{spanish, italian}';
code | name | languages
------+-------------+-----------------------------------
ch | Switzerland | {german,french,italian,romansh}
it | Italy | {italian,sardinian,german}
es | Spain | {spanish,catalan,galician,basque}
py | Paraguay | {spanish,guarani}
(4 rows)
Unnesting
“Unnest” an array mean “split every element in its own row”, this is the basis of many interesting operations:
select code, name, unnest(languages) from countries;
code | name | unnest
------+-------------+-----------
ch | Switzerland | german
ch | Switzerland | french
ch | Switzerland | italian
ch | Switzerland | romansh
it | Italy | italian
it | Italy | sardinian
it | Italy | german
es | Spain | spanish
es | Spain | catalan
es | Spain | galician
es | Spain | basque
py | Paraguay | spanish
py | Paraguay | guarani
be | Belgium | dutch
be | Belgium | french
(15 rows)
Combining (with deduplicaiton and sorting)
The inverse operation of “unnest” is array_agg()
. On the example below, we are unnesting in the inner select, using distinct
to remove duplicates and order by
to sort; after that we “re-nest” (or “aggregate”) the values back in a single array with array_agg()
on the outer select:
select array_agg(lang) as all_languages
from (
select distinct unnest(languages) as lang
from countries
order by lang
);
all_languages
-----------------------------------------------------------------------------------------
{basque,catalan,dutch,french,galician,german,guarani,italian,romansh,sardinian,spanish}
(1 row)
Grouping by nested values
Unnested arrays behave like in-memory tables, you can apply all other valid SQL operations over it, for example, we can group by
language to give us an inverted array containing the countries that match them:
select lang, array_agg(name) as spoken_in
from (
select name, unnest(languages) as lang
from countries
)
group by lang order by lang;
lang | spoken_in
-----------+-----------------------
basque | {Spain}
catalan | {Spain}
dutch | {Belgium}
french | {Switzerland,Belgium}
galician | {Spain}
german | {Switzerland,Italy}
guarani | {Paraguay}
italian | {Switzerland,Italy}
romansh | {Switzerland}
sardinian | {Italy}
spanish | {Spain,Paraguay}
(11 rows)
Updating
Adding
You can add elements at the beginning of the array with array_prepend()
or at the end with array_append()
:
update countries
set languages = array_append(languages, 'esperanto')
where code = 'be';
Changes "languages" like this: {dutch, french}
➤ {dutch, french, esperanto}
.
Replacing
You can replace a single element by index, like languages[3] = ‘german’
, or replace it by value with array_replace(field, from, to)
, as in:
update countries
set languages = array_replace(languages, 'esperanto', 'german')
where code = 'be';
Changes "languages" like this: {dutch, french, esperanto}
➤ {dutch, french, german}
.
Removing
Remove elements with array_remove()
:
update countries
set languages = array_remove(languages, 'german')
where code = 'be';
Changes "languages" like this: {dutch, french, german}
➤ {dutch, french}
.
Going deeper
Please refer to the main Array documentation and the Array Functions and Operators, they both provide more comprehensive explanations and examples, including things I didn’t cover here, like multi-dimension arrays, slices, casting, etc.
If you are looking for other compound data-types, PostgreSQL also provides hstore for simple key-value pairs and JSON for a full hierarchical JSON representation; they are both very useful and deserve their own articles!