Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


...
PostgreSQL uses a fixed page size (commonly 8 kB), and does not allow tuples to span multiple pages. Therefore, it is not possible to store very large field values directly. To overcome this limitation, large field values are compressed and/or broken up into multiple physical rows. This happens transparently to the user, with only small impact on most of the backend code. The technique is affectionately known as TOAST (or “the best thing since sliced bread”).
...
The TOAST management code is triggered only when a row value to be stored in a table is wider than TOAST_TUPLE_THRESHOLD bytes (normally 2 kB). The TOAST code will compress and/or move field values out-of-line until the row value is shorter than TOAST_TUPLE_TARGET bytes (also normally 2 kB, adjustable) or no more gains can be had.

So PostgreSQL has to not only read an additional table to retrieve JSONB fields but also perform additional computations to decompress original data and only after that it can apply a function to JSONB value or make a comparison.

Thus, a "simpe" table scan for a TOASTed table becomes a quite complex operation because PostrgreSQL has to perform a set of actions:

  • Read the next datablock from the disk for the main table (blocks read during table scans are not pinned to the shared cache for large tables)
  • Perform an index scan for a TOASTed counterpart to find rows that relate to the row in the main table (index blocks can either in the shared cache or not)
  • Retrieve all those rows that contain JSONB value (in most cases db blocks must be read from disk)
  • Decompress the value of needed and apply a function or make a comparison to the value

Below are detailed “EXPLAIN ANALYZE” for two SQL queries to the same table. The first one uses a search condition with the JSONB field. The second one uses just a plain field in a search condition.

...

...

This scheme has a number of advantages compared to a more straightforward approach such as allowing row values to span pages. Assuming that queries are usually qualified by comparisons against relatively small key values, most of the work of the executor will be done using the main row entry. The big values of TOASTed attributes will only be pulled out (if selected at all) at the time the result set is sent to the client. Thus, the main table is much smaller and more of its rows fit in the shared buffer cache than would be the case without any out-of-line storage. Sort sets shrink also, and sorts will more often be done entirely in memory. A little test showed that a table containing typical HTML pages and their URLs was stored in about half of the raw data size including the TOAST table, and that the main table contained only about 10% of the entire data (the URLs and some small HTML pages). There was no run time difference compared to an un-TOASTed comparison table, in which all the HTML pages were cut down to 7 kB to fit.


So PostgreSQL has to not only read an additional table to retrieve JSONB fields but also perform additional computations to decompress original data and only after that it can apply a function to JSONB value or make a comparison.

Thus, a "simpe" table scan for a TOASTed table becomes a quite complex operation because PostrgreSQL has to perform a set of actions:

  • Read the next datablock from the disk for the main table (blocks read during table scans are not pinned to the shared cache for large tables)
  • Perform an index scan for a TOASTed counterpart to find rows that relate to the row in the main table (index blocks can either in the shared cache or not)
  • Retrieve all those rows that contain JSONB value (in most cases db blocks must be read from disk)
  • Decompress the value of needed and apply a function or make a comparison to the value


Below are detailed “EXPLAIN ANALYZE” for two SQL queries to the same table. The first one uses a search condition with the JSONB field. The second one uses just a plain field in a search condition.

No Format
1.
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT TEXT)
select * From diku_mod_source_record_storage.marc_records where content #>> '{fields, 33, 999, subfields, 0, i}' = '3dba7c3a-2bd1-4b47-b44c-25ff91fa6308';


Code Block
Gather  (cost=1000.00..1315735.03 rows=55711 width=823) (actual time=81722.406..81998.422 rows=1 loops=1)
  Output: id, content, instanceid
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=23246857 read=3371323
  ->  Parallel Seq Scan on diku_mod_source_record_storage.marc_records  (cost=0.00..1309163.93 rows=23213 width=823) (actual time=70081.439..81712.424 rows=0 loops=3)
        Output: id, content, instanceid
        Filter: ((marc_records.content #>> '{fields, 33, 999, subfields, 0, i}'::text[]) = '3dba7c3a-2bd1-4b47-b44c-25ff91fa6308';

Code Block
Gather  (cost=1000.00..1315735.03 rows=55711 width=823) (actual time=81722.406..81998.422 rows=1 loops=1)
  Output: id, content, instanceid
  Workers Planned: 2
  Workers Launched: 2::text)
        Rows Removed by Filter: 3714076
        Buffers: shared hit=23246857 read=3371323
  ->  Parallel Seq Scan on diku_mod_source_record_storage.marc_records  (cost=0.00..1309163.93 rows=23213 width=823) ( Worker 0: actual time=7008146814.439759..8171281707.424713 rows=01 loops=3)1
          OutputBuffers: id, content, instanceidshared hit=7626833 read=1101635
        Worker Filter: ((marc_records.content #>> '{fields,33,999,subfields,0,i}'::text[]) = '3dba7c3a-2bd1-4b47-b44c-25ff91fa6308'::text)
        Rows Removed by Filter: 3714076
        Buffers: shared hit=23246857 read=3371323
        Worker 0: actual time=46814.759..81707.7131: actual time=81707.793..81707.793 rows=0 loops=1
          Buffers: shared hit=7747854 read=1128701
Planning time: 0.163 ms
Execution time: 81998.470 ms


No Format
2.
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT TEXT)
select * From diku_mod_source_record_storage.marc_records where instanceid = 'fff787ec-7cd7-4acd-8709-c1a842786607'::uuid;


Code Block
Gather  (cost=1000.00..1298557.54 rows=1 width=823) (actual time=6948.502..16626.117 rows=1 loops=1)
  Output: id, content, instanceid
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=7626833192 read=11016351239333
  ->  Parallel Seq Scan  Worker 1: on diku_mod_source_record_storage.marc_records  (cost=0.00..1297557.44 rows=1 width=823) (actual time=8170713389.793376..8170716613.793601 rows=0 loops=13)
          BuffersOutput: shared hit=7747854 read=1128701
Planning time: 0.163 ms
Execution time: 81998.470 ms
No Format
2.
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT TEXT)
select * From diku_mod_source_record_storage.marc_records where id, content, instanceid
        Filter: (marc_records.instanceid = 'fff787ec-7cd7-4acd-8709-c1a842786607'::uuid;

Code Block
Gather  (cost=1000.00..1298557.54 rows=1 width=823) (actual time=6948.502..16626.117 rows=1 loops=1)
  Output: id, content, instanceid
  Workers Planned: 2
  Workers Launched: 2c1a842786607'::uuid)
        Rows Removed by Filter: 3714076
        Buffers: shared hit=192 read=1239333
  ->  Parallel Seq Scan on diku_mod_source_record_storage.marc_records  (cost=0.00..1297557.44 rows=1 width=823) (    Worker 0: actual time=1338916610.376118..1661316610.601118 rows=0 loops=3)1
          OutputBuffers: id, content, instanceid shared hit=74 read=532743
        Worker Filter1: (marc_records.instanceid = 'fff787ec-7cd7-4acd-8709-c1a842786607'::uuid)
actual time=16610.121..16610.121 rows=0 loops=1
       Rows Removed by FilterBuffers: 3714076shared hit=41 read=350884
Planning time: 0.151 ms
Execution time:  Buffers: shared hit=192 read=1239333
        Worker 0: actual time=16610.118..16610.118 rows=0 loops=1
          Buffers: shared hit=74 read=532743
        Worker 1: actual time=16610.121..16610.121 rows=0 loops=1
          Buffers: shared hit=41 read=350884
Planning time: 0.151 ms
Execution time: 16626.160 ms

As you can see for the first query a total amount of db block read (hit + read) is 23,246,857 + 3,371,323 = 26,618,180

For the second query 192 + 1,239,333 = 1,239,525

For comparison this is an "EXPLAIN ANALYSE" for a simple relational table with plain fields that has the same number or rows.

No Format
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT TEXT)
select * From16626.160 ms


As you can see for the first query a total amount of db block read (hit + read) is 23,246,857 + 3,371,323 = 26,618,180

For the second query 192 + 1,239,333 = 1,239,525

For comparison this is an "EXPLAIN ANALYSE" for a simple relational table with plain fields that has the same number or rows.

No Format
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT TEXT)
select * From diku_mod_source_record_storage.records where instanceid2 = '698dbbe4-858c-4d27-b182-666a59ec6bca'::uuid;


Code Block
Gather  (cost=1000.00..273306.54 rows=1 width=158) (actual time=792.961..838.863 rows=1 loops=1)
  Output: id, snapshotid, matchedprofileid, matchedid, generation, recordtype, instanceid, deleted, orderinfile, suppressdiscovery, createdbyuserid, createddate, updatedbyuserid, updateddate, instanceid2
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=214274
  ->  Parallel Seq Scan on diku_mod_source_record_storage.records where instanceid2 = '698dbbe4-858c-4d27-b182-666a59ec6bca'::uuid;

Code Block
Gather  (cost=10000.00..273306272306.5444 rows=1 width=158) (actual time=792676.961055..838784.863382 rows=10 loops=13)
        Output: id, snapshotid, matchedprofileid, matchedid, generation, recordtype, instanceid, deleted, orderinfile, suppressdiscovery, createdbyuserid, createddate, updatedbyuserid, updateddate, instanceid2
        Filter: (records.instanceid2 =  Workers Planned: 2'698dbbe4-858c-4d27-b182-666a59ec6bca'::uuid)
        Rows Removed by Filter: 3714076
   Workers Launched: 2   Buffers: shared hit=214274
  ->  Parallel Seq Scan on diku_mod_source_record_storage.records  (cost=0.00..272306.44 rows=1 width=158) ( Worker 0: actual time=676773.055911..784773.382911 rows=0 loops=3)
        Output: id, snapshotid, matchedprofileid, matchedid, generation, recordtype, instanceid, deleted, orderinfile, suppressdiscovery, createdbyuserid, createddate, updatedbyuserid, updateddate, instanceid21
          Buffers: shared hit=70404
        Worker Filter1: (records.instanceid2 = '698dbbe4-858c-4d27-b182-666a59ec6bca'::uuid)
actual time=462.120..787.102 rows=1 loops=1
       Rows Removed by FilterBuffers: 3714076shared hit=70341
Planning time: 0.313 ms
Execution time:  Buffers: shared hit=214274
        Worker 0: actual time=773.911..773.911 rows=0 loops=1
          Buffers: shared hit=70404
        Worker 1: actual time=462.120..787.102 rows=1 loops=1
          Buffers: shared hit=70341
Planning time: 0.313 ms
Execution time: 838.940 ms

838.940 ms

shared hit=214274

Here is a quite interesting post regarding hidden costs when working with JSONB

https://heap.io/blog/engineering/when-to-avoid-jsonb-in-a-postgresql-schema

Warning
iconfalse

...

Using the above measurements table again, the initial non-JSONB version of our table takes up 79 mb of disk space, whereas the JSONB variant takes 164 mb — more than twice as much. That is, the majority of our table contents are the the strings value_1, value_2, value_3, and scientist_id, repeated over and over again.

...