Introduction to PostgreSQL Arrays
Posted on 2024-01-14, by Racum.
Arrays in PostgreSQL are very useful to represent lists of data without the need of extra âmany-to-manyâ tables, but to take advantage of them, you need to know the basics; follow this article to learn them.
Fields
Arrays can be applied in all field types (including user-defined), but they are more commonly used over numbers (integers and floats) and strings (text, varchar, etc, usually as a âfake-enumâ). To define them, just add a [] after its type:
create table countries (
code text,
name text,
languages text[]
);
In this example, the field languages is defined as text[], meaning an âarray of textsâ. It is a good practice to use array fields in the plural, or other variation that convey the meaning of multiplicity.
postgres=# \d countries
Table "public.countries"
Column | Type | Collation | Nullable | Default
-----------+--------+-----------+----------+---------
code | text | | not null |
name | text | | not null |
languages | text[] | | |
Indexes:
"countries_pkey" PRIMARY KEY, btree (code)
Arrays can be multi-dimensional, for example: integer[][] creates a 2-dimensional table of integers, but because this use not very common, Iâll not cover the details on this article.
Inserting
When you just inserting a list of values into an array, you can use the literal format with no need of casting: â{a, b, câŚ}â:
insert into countries values
('ch', 'Switzerland', '{german, french, italian, romansh}'),
('it', 'Italy', '{italian, sardinian, german}'),
('es', 'Spain', '{spanish, catalan, galician, basque}'),
('py', 'Paraguay', '{spanish, guarani}');
You can also use the array[a, b, câŚ] constructor:
insert into countries values ('be', 'Belgium', array['dutch', 'french']);
And this is how they appear on psql:
postgres=# select * from countries;
code | name | languages
------+-------------+-----------------------------------
ch | Switzerland | {german,french,italian,romansh}
it | Italy | {italian,sardinian,german}
es | Spain | {spanish,catalan,galician,basque}
py | Paraguay | {spanish,guarani}
be | Belgium | {dutch,french}
(5 rows)
Selecting
Position
Arrays in PostgreSQL are 1-indexed, meaning, they start with [1] instead of [0], unlike most programing languages, for example, to get the âfirst languageâ the syntax is language[1].
To get the âlast elementâ it is a bit more involved: we first need to get the index of the last value with array_upper(language, 1), and use it to as an index, like this: languages[array_upper(languages, 1)]; this 1 at the end is the dimension we are interested in (remember arrays can be multi-dimensional?).
select
name,
languages,
languages[1] as first_language,
languages[array_upper(languages, 1)] as last_language
from countries;
name | languages | first_language | last_language
-------------+-----------------------------------+----------------+---------------
Switzerland | {german,french,italian,romansh} | german | romansh
Italy | {italian,sardinian,german} | italian | german
Spain | {spanish,catalan,galician,basque} | spanish | basque
Paraguay | {spanish,guarani} | spanish | guarani
Belgium | {dutch,french} | dutch | french
(5 rows)
Contains
The symbol for the âcontainsâ operator is @>, and it behaves like the in operator from Python:
select * from countries where languages @> '{spanish}';
code | name | languages
------+----------+-----------------------------------
es | Spain | {spanish,catalan,galician,basque}
py | Paraguay | {spanish,guarani}
(2 rows)
The example above matches a single value (still notated as an array), but the @> operator can work with multiple values, like this:
select * from countries where languages @> '{spanish, galician}';
code | name | languages
------+-------+-----------------------------------
es | Spain | {spanish,catalan,galician,basque}
(1 row)
In this case both of the values need to be present.
Intersects
The symbol for âoverlapâ (or âintersectsâ) is &&, it behaves similar to âcontainsâ but instead or matching both, it matches any:
select * from countries where languages && '{spanish, italian}';
code | name | languages
------+-------------+-----------------------------------
ch | Switzerland | {german,french,italian,romansh}
it | Italy | {italian,sardinian,german}
es | Spain | {spanish,catalan,galician,basque}
py | Paraguay | {spanish,guarani}
(4 rows)
Unnesting
âUnnestâ an array mean âsplit every element in its own rowâ, this is the basis of many interesting operations:
select code, name, unnest(languages) from countries;
code | name | unnest
------+-------------+-----------
ch | Switzerland | german
ch | Switzerland | french
ch | Switzerland | italian
ch | Switzerland | romansh
it | Italy | italian
it | Italy | sardinian
it | Italy | german
es | Spain | spanish
es | Spain | catalan
es | Spain | galician
es | Spain | basque
py | Paraguay | spanish
py | Paraguay | guarani
be | Belgium | dutch
be | Belgium | french
(15 rows)
Combining (with deduplication and sorting)
The inverse operation of âunnestâ is array_agg(). On the example below, we are unnesting in the inner select, using distinct to remove duplicates and order by to sort; after that we âre-nestâ (or âaggregateâ) the values back in a single array with array_agg() on the outer select:
select array_agg(lang) as all_languages
from (
select distinct unnest(languages) as lang
from countries
order by lang
);
all_languages
-----------------------------------------------------------------------------------------
{basque,catalan,dutch,french,galician,german,guarani,italian,romansh,sardinian,spanish}
(1 row)
Grouping by nested values
Unnested arrays behave like in-memory tables, you can apply all other valid SQL operations over it, for example, we can group by language to give us an inverted array containing the countries that match them:
select lang, array_agg(name) as spoken_in
from (
select name, unnest(languages) as lang
from countries
)
group by lang order by lang;
lang | spoken_in
-----------+-----------------------
basque | {Spain}
catalan | {Spain}
dutch | {Belgium}
french | {Switzerland,Belgium}
galician | {Spain}
german | {Switzerland,Italy}
guarani | {Paraguay}
italian | {Switzerland,Italy}
romansh | {Switzerland}
sardinian | {Italy}
spanish | {Spain,Paraguay}
(11 rows)
Updating
Adding
You can add elements at the beginning of the array with array_prepend() or at the end with array_append():
update countries
set languages = array_append(languages, 'esperanto')
where code = 'be';
Changes "languages" like this: {dutch, french} ⤠{dutch, french, esperanto}.
Replacing
You can replace a single element by index, like languages[3] = âgermanâ, or replace it by value with array_replace(field, from, to), as in:
update countries
set languages = array_replace(languages, 'esperanto', 'german')
where code = 'be';
Changes "languages" like this: {dutch, french, esperanto} ⤠{dutch, french, german}.
Removing
Remove elements with array_remove():
update countries
set languages = array_remove(languages, 'german')
where code = 'be';
Changes "languages" like this: {dutch, french, german} ⤠{dutch, french}.
Going deeper
Please refer to the main Array documentation and the Array Functions and Operators, they both provide more comprehensive explanations and examples, including things I didnât cover here, like multi-dimension arrays, slices, casting, etc.
If you are looking for other compound data-types, PostgreSQL also provides hstore for simple key-value pairs and JSON for a full hierarchical JSON representation; they are both very useful and deserve their own articles!