Most relational database systems provide the functionality to create a VIEW, which basically acts like a shortcut or macro. Consider the following query:
postgres=# SELECT * FROM roster;
id | name | birthdate
----+------------------+---------------------
1 | John Smith | 1980-05-03 00:00:00
2 | Jane Brown | 1970-04-24 00:00:00
1 | Jimmy Peters | 1990-01-14 00:00:00
2 | Janice Patterson | 1993-08-27 00:00:00
3 | Joe Washington | 1992-11-12 00:00:00
(5 rows)
If we peek under the hood, we discover that “roster” is a VIEW that combines two tables together:
postgres=# \d+ roster
View "public.roster"
Column | Type | Collation | Nullable | Default | Storage | Description
-----------+-----------------------------+-----------+----------+---------+----------+-------------
id | integer | | | | plain |
name | text | | | | extended |
birthdate | timestamp without time zone | | | | plain |
View definition:
SELECT teacher.id,
teacher.name,
teacher.birthdate
FROM teacher
UNION ALL
SELECT student.id,
student.name,
student.birthdate
FROM student;
VIEW v. MATERIALIZED VIEW
For large data sets, sometimes VIEW does not perform well because it runs the underlying query **every** time the VIEW is referenced. Take, for example, a view created on the pgbench dataset (scale 100, after ~150,000 transactions):
postgres=# CREATE OR REPLACE VIEW account_balances AS
SELECT a.* FROM pgbench_branches b
JOIN pgbench_tellers t ON b.bid=t.bid
JOIN pgbench_accounts a ON a.bid=b.bid
WHERE abalance > 100;
CREATE VIEW
postgres=# explain analyze select * from account_balances ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=4.61..26.04 rows=10 width=97) (actual time=831.184..21182.906 rows=563000 loops=1)
-> Hash Join (cost=4.46..24.32 rows=10 width=101) (actual time=831.103..4844.011 rows=563000 loops=1)
Hash Cond: (t.bid = a.bid)
-> Seq Scan on pgbench_tellers t (cost=0.00..16.00 rows=1000 width=4) (actual time=0.023..7.406 rows=1000 loops=1)
-> Hash (cost=4.45..4.45 rows=1 width=97) (actual time=831.042..831.048 rows=56300 loops=1)
Buckets: 65536 (originally 1024) Batches: 1 (originally 1) Memory Usage: 7605kB
-> Index Scan using pga_abalance_idx on pgbench_accounts a (cost=0.43..4.45 rows=1 width=97) (actual time=0.034..439.754 rows=56300 loops=1)
Index Cond: (abalance > 100)
-> Index Only Scan using pgbench_branches_pkey on pgbench_branches b (cost=0.14..0.17 rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=563000)
Index Cond: (bid = t.bid)
Heap Fetches: 563000
Planning time: 0.526 ms
Execution time: 24911.494 ms
(13 rows)
As you can see, it took over 24 seconds to gather the accounts with balances greater than 100.
PostgreSQL provides the ability to instead create a MATERIALIZED VIEW, so that the results of the underlying query can be stored for later reference:
postgres=# CREATE MATERIALIZED VIEW mv_account_balances AS
SELECT a.* FROM pgbench_branches b
JOIN pgbench_tellers t ON b.bid=t.bid
JOIN pgbench_accounts a ON a.bid=b.bid
WHERE abalance > 100;
SELECT 563000
postgres=# explain analyze select * from mv_account_balances;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Seq Scan on mv_account_balances (cost=0.00..11168.30 rows=193830 width=352) (actual time=0.141..3784.241 rows=563000 loops=1)
Planning time: 0.072 ms
Execution time: 7473.437 ms
(3 rows)
As you can see, a MATERIALIZED VIEW produces the result in just over 7 seconds (as opposed to 24 seconds), because it stores a snapshot of the data for users to work with.
Refreshing a MATERIALIZED VIEW
Should the data set be changed, or should the MATERIALIZED VIEW need a copy of the latest data, the MATERIALIZED VIEW can be refreshed:
postgres=# select count(*) from pgbench_branches b join pgbench_tellers t on b.bid=t.bid join pgbench_accounts a on a.bid=b.bid where abalance > 4500;
count
-------
57610
(1 row)
— Some updates
postgres=# select count(*) from pgbench_branches b join pgbench_tellers t on b.bid=t.bid join pgbench_accounts a on a.bid=b.bid where abalance > 4500;
count
--------
173400
(1 row)
postgres=# select count(*) FROM mv_account_balances WHERE abalance > 4500;
count
-------
57610
(1 row)
postgres=# refresh materialized view mv_account_balances ;
REFRESH MATERIALIZED VIEW
postgres=# select count(*) FROM mv_account_balances WHERE abalance > 4500;
count
--------
173400
(1 row)
MATERIALIZED VIEW and indexes
Indexes can also be created against a MATERIALIZED VIEW to make queries even faster:
postgres=# EXPLAIN ANALYZE SELECT * FROM mv_account_balances WHERE abalance > 4500;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Seq Scan on mv_account_balances (cost=0.00..16729.38 rows=62109 width=97) (actual time=0.383..464.850 rows=57610 loops=1)
Filter: (abalance > 4500)
Rows Removed by Filter: 505390
Planning time: 0.118 ms
Execution time: 845.478 ms
(5 rows)
postgres=# CREATE INDEX mv_balances_idx ON mv_account_balances(abalance);
CREATE INDEX
postgres=# EXPLAIN ANALYZE SELECT * FROM mv_account_balances WHERE abalance > 4500;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on mv_account_balances (cost=1096.13..11054.68 rows=58284 width=97) (actual time=4.719..389.851 rows=57610 loops=1)
Recheck Cond: (abalance > 4500)
Heap Blocks: exact=1947
-> Bitmap Index Scan on mv_balances_idx (cost=0.00..1081.56 rows=58284 width=0) (actual time=4.471..4.478 rows=57610 loops=1)
Index Cond: (abalance > 4500)
Planning time: 0.412 ms
Execution time: 766.624 ms
(7 rows)
Conclusion
As we can see, MATERIALIZED VIEW provides some additional features that VIEW lacks, namely in providing a consistent snapshot of data for users to work with and giving users the ability to index the underlying snapshot. However, MATERIALIZED VIEW is not for everyone—some users may wish to have the most up-to-date data with every call. Users should employ each type of VIEW in accordance to their needs.