-
Notifications
You must be signed in to change notification settings - Fork 19
98 Fix bug where column prune won't work when compression is enabled #108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…st client not honoring compression configuration
|
@luoyuxia @zhaohaidao Would appreciate a review here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a bug where column pruning fails after the Rust client writes to a log table. The root cause was that the Rust client wasn't honoring compression configuration when writing Arrow IPC data, causing parsing errors on read. The fix ensures that the Arrow writer respects the compression type specified in table properties.
- Added compression configuration support by parsing table properties for compression type and level
- Updated
MemoryLogRecordsArrowBuilderto accept and useArrowCompressionInfowhen creating the Arrow IPCStreamWriter - Integrated compression configuration into the write path from table config through accumulator to the Arrow writer
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| crates/fluss/src/compression/mod.rs | New module declaration for compression support |
| crates/fluss/src/compression/arrow_compression.rs | Implements ArrowCompressionType enum and ArrowCompressionInfo struct with parsing from table properties and validation |
| crates/fluss/src/lib.rs | Adds compression module to library exports |
| crates/fluss/src/metadata/table.rs | Adds get_arrow_compression_info() method to TableConfig for retrieving compression settings |
| crates/fluss/src/record/arrow.rs | Updates MemoryLogRecordsArrowBuilder to accept compression info and configures StreamWriter with appropriate IpcWriteOptions |
| crates/fluss/src/client/write/batch.rs | Updates ArrowLogWriteBatch::new() to accept and pass compression info to the arrow builder |
| crates/fluss/src/client/write/accumulator.rs | Retrieves compression info from table config and passes it when creating new write batches |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
luoyuxia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leekeiabstraction Thanks for the pr. Currently, all test is set table.log.arrow.compression.type to none, could you please also remove this to verify your fix works.
luoyuxia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leekeiabstraction Thanks. LGTM
|
Raised #109 to track ZSTD compression level support |
Purpose
Linked issue: close #98
Fix bug where column prune is broken after rust client writes to log table due to parsing error caused by rust client not honoring compression configuration.
ZSTD Compression level config and other compression related concerns has not been hooked up, I suggest that we raise a separate issue/PR to follow that up.
Brief change log
Ensure that record batch writing honors compression type by instantiating ArrowWriter with compression type specified in table properties,
Tests
Added UTs for ArrowCompressionInfo/Type.
Manual testings verifying pruned scan on (zstd, lz4_frame, none compressed) records work.