PostgreSQL array_agg Function
By gobrain
Jun 12th, 2024
PostgreSQL array_agg is an aggregate function that allows aggregating multiple values into an array. It is used when you want to group values from different rows into a single array.
In this article, we will explain the PostgreSQL array_agg function by providing examples for different use cases. Let's get started.
Syntax
The basic syntax of the postgresql array_agg function is as follows:
array_agg(expression)
Here, the expression parameter represents the column or value that you want to aggregate into an array. It can be any valid expression in PostgreSQL.
A Basic Example
Let’s consider a table called employees
with the following sample data:
id | name | department |
---|---|---|
1 | John | HR |
2 | Jane | HR |
3 | Mike | HR |
4 | Sarah | IT |
5 | Michael | IT |
Suppose that, you want to aggregate the names of employees in the HR department into a single array. You can accomplish this using the arrray_agg
function as shown below:
SELECT array_agg(name) AS hr_employees
FROM employees
WHERE department = 'HR';
The result is as follows:
hr_employees |
---|
{John, Jane} |
As you can see, the array_agg
function has aggregated the names of employees in the HR department into a single array.
PostgreSQL array_agg Function With GROUP BY
If you want to aggregate values of rows based on a column, then the postgresql array_agg
function can also be used with the GROUP BY
clause. Consider the following table called orders with the following sample data
id | customer_id | product_name | quantity |
---|---|---|---|
1 | 1 | Laptop | 2 |
2 | 1 | Smartphone | 3 |
3 | 2 | TV | 1 |
4 | 2 | Laptop | 2 |
5 | 3 | Smartphone | 4 |
If you want to aggregate the names of products that an customer buy into arrays, you can use the array_agg function with the GROUP BY clause. This enable you to group products by customers that buy them.
SELECT customer_id, array_agg(product_name) AS purchased_products
FROM orders
GROUP BY customer_id;
The result of this query will be:
customer_id | purchased_products |
---|---|
1 | {Laptop, Smartphone} |
2 | {TV, Laptop} |
3 | {Smartphone} |
In this example, the function grouped the orders by customer_id and aggregated the product names for each customer into arrays.
Order And Filter Values In Aggregated Array
In PostgreSQL, you can also order and filter values in an aggregated array using the ORDER BY
and WHERE
clauses in combination with the array_agg
function.
Here’s the syntax of how you can order and filter values in ARRAY_AGG
:
SELECT id, ARRAY_AGG(value ORDER BY value) AS sorted_array
FROM your_table
WHERE condition
GROUP BY id;
Now, suppose you have a table called orders
with the following data:
id | customer_id | order_date | total_amount |
---|---|---|---|
1 | 1 | 2023-06-15 | 100.00 |
2 | 1 | 2023-06-14 | 150.00 |
3 | 2 | 2023-06-16 | 200.00 |
4 | 2 | 2023-06-13 | 50.00 |
5 | 3 | 2023-06-16 | 300.00 |
To order the total_amount
values within the aggregated array in descending order and filter for orders where the total_amount
is greater than 100, you can use the following query:
SELECT customer_id, ARRAY_AGG(total_amount ORDER BY total_amount DESC) AS sorted_array
FROM orders
WHERE total_amount > 100
GROUP BY customer_id;
The result will be like:
customer_id | purchased_products |
---|---|
1 | {150, 100} |
2 | {200} |
3 | {300} |
Handle Duplicate Values
Additionally, the array_agg
function can handle duplicate values by using the DISTINCT
keyword. This allows you to eliminate duplicate elements from the resulting array.
Consider the following table called students
:
student_id | subject |
---|---|
1 | Math |
1 | Science |
1 | Math |
2 | History |
2 | Science |
To aggregate the unique subjects of each student into an array, you can use the following query:
SELECT student_id, array_agg(DISTINCT subject) AS subjects
FROM students
GROUP BY student_id;
The result of this query will be:
student_id | subjects |
---|---|
1 | {Math,Science} |
2 | {History,Science} |
PostgreSQL array_agg Function With Many to Many Relation
When dealing with a many-to-many relationship, where one entity is related to multiple entities in another table, you can use array_agg
to aggregate the related values into an array.
Let’s say you have two tables: users
and roles
, with a many-to-many relationship between them through an intermediary table called user_roles
. Each user can have multiple roles, and each role can be assigned to multiple users.
Here’s an example schema:
user_id | name |
---|---|
1 | John |
2 | Sarah |
3 | Michael |
4 | Emily |
5 | David |
Table: roles
role_id | name |
---|---|
1 | Admin |
2 | Manager |
3 | Employee |
4 | Intern |
5 | Analyst |
Table: user_roles
user_id | role_id |
---|---|
1 | 1 |
1 | 3 |
2 | 2 |
3 | 3 |
4 | 4 |
5 | 3 |
5 | 5 |
To retrieve all users with their corresponding roles aggregated into an array, you can use the following query:
SELECT u.user_id, u.name, array_agg(r.name) AS roles
FROM users u
JOIN user_roles ur ON u.user_id = ur.user_id
JOIN roles r ON ur.role_id = r.role_id
GROUP BY u.user_id, u.name;
This query joins the three tables (users
, user_roles
, and roles
) using the appropriate foreign key relationships. Then, it uses array_agg(r.name)
to aggregate the role names into an array for each user.
Here is the result:
user_id | name | roles |
---|---|---|
5 | David | {Employee,Analyst} |
4 | Emily | {Intern} |
2 | Sarah | {Manager} |
1 | John | {Admin,Employee} |
3 | Michael | {Employee} |
This allows you to obtain structured data in your applications.
PostgreSQL array_agg
Function At A Glance
1. What is the array_agg
function in PostgreSQL?
The array_agg
function in PostgreSQL is an aggregate function that collects values from a set of input rows into an array. It’s particularly useful when you want to group values and store them in an array for further analysis or manipulation.
2. What’s the syntax of the array_agg
function?
The basic syntax of the array_agg
function is as follows:
array_agg(expression)
Where expression
is the value you want to aggregate into an array.
3. What does the array_agg
function return?
The array_agg
function returns an array containing all the aggregated values from the input rows.
4. How does array_agg
handle NULL values?
By default, array_agg
includes NULL values in the resulting array. If you want to exclude NULLs, you can use the ARRAY_AGG(expression IGNORE NULLS)
syntax.
5. Can I order the elements within the aggregated array?
Yes, you can specify the order using the ORDER BY
clause within the array_agg
function. For instance:
SELECT category_id, array_agg(product_name ORDER BY product_name) AS sorted_products
FROM products
GROUP BY category_id;
6. How can I get a distinct array using array_agg
?
To obtain a distinct array, you can combine array_agg
with the DISTINCT
keyword:
SELECT order_id, array_agg(DISTINCT product_name) AS unique_products
FROM orders
GROUP BY order_id;
7. Is there a way to limit the number of elements in the aggregated array?
Yes, you can use the LIMIT
clause to restrict the number of elements in the aggregated array:
SELECT order_id, array_agg(product_name ORDER BY order_date LIMIT 5) AS recent_products
FROM orders
GROUP BY order_id;
8.Can I nest the array_agg
function within other functions?
Absolutely, you can use the array_agg
function as an input to other functions. This allows for complex aggregation and transformations of data.
Conclusion
PostgreSQL array_agg function is a function for aggregating values of rows into arrays. It allows us to combine multiple values into a single array so that you can work with grouped data in your applications. The values in arrays generated are also can be handled by filtering, ordering and eliminating duplicate values.
Thank you for reading.