[Core] support limit pushdown with pk table #6914
Open
+414
−19
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
一、Background
Issue: #6847
Append table limit pushdown : #6848
This pr is to support limit pushdown with pk table.
二、Code Logic
We try best to filtering manifest entries before data reading.
Limit pushdown is only enabled for DEDUPLICATE/FIRST_ROW merge engines without deletion vectors, as accurate counting requires no merge operations or deleted rows.
We groups files by (partition, bucket) pairs and processes buckets sequentially. For each bucket, the algorithm checks if safe pushdown is possible: files must have no overlapping (same LSM level, excluding level 0) and no delete rows.
1、If safe, it accumulates row counts from file metadata until reaching the limit, then stops processing remaining buckets. 2、If unsafe (overlapping files or delete rows exist), all files in that bucket are included.
Linked issue: close #xxx
Tests
API and Format
Documentation