You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed a performance issue when using maintenance_tasks and multiple columns for a cursor and fetching pages of records. The issue is with how the 'next page' query is written in ActiveRecordCursor, which causes Postgres to do inefficient data fetches and is over 100x slower than it could be.
Background
Our table requests has an index on created_at, uuid like this:
CREATEINDEXidxON requests USING btree (created_at, uuid)
and our maintenance task, when defined like this:
class OurTask < MaintenanceTasks::Task
def cursor_columns
[:created_at, :uuid]
end
def collection
Request.all
end
end
will trigger queries for pages like this:
SELECT *
FROM requests
WHERE ( created_at > '2023-01-01 00:00:00' OR ( created_at = '2023-01-01 00:00:00' AND ( uuid > 'aaaaaaaa-aaaa-aaaa-b46e-38cd3572ec76' ) ) )
ORDER BY created_at, uuid
LIMIT 1000
This query does not perform as expected in Postgres, because of the OR. See related SO question here - the gist of it is that Postgres doesn't use an index condition but rather does a filter, which results in reading records from disk rather than using the index values.
In our table of about 30 million records, fetching a page of 1000 records was taking around 140s.
Given the massive performance difference for Postgres, it makes sense to me to add this as a configuration option. Maybe we can even enable it by default on Postgres? For MySQL I'm seeing slightly better performance with the current queries, so I'm hesitant to change them for everyone.
Hello, thanks for your work on this gem.
Summary
I noticed a performance issue when using maintenance_tasks and multiple columns for a cursor and fetching pages of records. The issue is with how the 'next page' query is written in
ActiveRecordCursor
, which causes Postgres to do inefficient data fetches and is over 100x slower than it could be.Background
Our table
requests
has an index oncreated_at, uuid
like this:and our maintenance task, when defined like this:
will trigger queries for pages like this:
This query does not perform as expected in Postgres, because of the
OR
. See related SO question here - the gist of it is that Postgres doesn't use anindex condition
but rather does afilter
, which results in reading records from disk rather than using the index values.In our table of about 30 million records, fetching a page of 1000 records was taking around
140s
.Solution
My proposed solution is to change how we construct the query here to use what Postgres calls "Composite Type Comparison", which I believe has wide support across DBMS. The query would look like:
This should be functionally equivalent and also causes Postgres to use only the index to determine the specific set of records to fetch from disk.
This query returns the page of 1000 records in under
1s
.Would you accept a PR to change this (whether for Postgres only, or for all DBs)?
The text was updated successfully, but these errors were encountered: