Skip to content

Conversation

@wwj6591812
Copy link
Contributor

@wwj6591812 wwj6591812 commented Dec 28, 2025

Purpose

一、Background
Issue: #6847
Append table limit pushdown : #6848
This pr is to support limit pushdown with pk table.

二、Code Logic
We try best to filtering manifest entries before data reading.
Limit pushdown is only enabled for DEDUPLICATE/FIRST_ROW merge engines without deletion vectors, as accurate counting requires no merge operations or deleted rows.
We groups files by (partition, bucket) pairs and processes buckets sequentially. For each bucket, the algorithm checks if safe pushdown is possible: files must have no overlapping (same LSM level, excluding level 0) and no delete rows.
1、If safe, it accumulates row counts from file metadata until reaching the limit, then stops processing remaining buckets. 2、If unsafe (overlapping files or delete rows exist), all files in that bucket are included.

Linked issue: close #xxx

Tests

API and Format

Documentation

@wwj6591812 wwj6591812 force-pushed the support_limit_pushdown_with_pk_table_1228 branch from 594b49b to f9bb040 Compare December 29, 2025 06:48
files = postFilterManifestEntries(files);
}

if (supportsLimitPushManifestEntries()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to do performance test again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants