Skip to content

Conversation

@leekeiabstraction
Copy link
Contributor

@leekeiabstraction leekeiabstraction commented Dec 21, 2025

Purpose

Linked issue: close #98

Fix bug where column prune is broken after rust client writes to log table due to parsing error caused by rust client not honoring compression configuration.

ZSTD Compression level config and other compression related concerns has not been hooked up, I suggest that we raise a separate issue/PR to follow that up.

Brief change log

Ensure that record batch writing honors compression type by instantiating ArrowWriter with compression type specified in table properties,

  • Added ArrowCompressionInfo / Type
  • Passes CompressionType to Arrow ipc StreamWriter

Tests

Added UTs for ArrowCompressionInfo/Type.

Manual testings verifying pruned scan on (zstd, lz4_frame, none compressed) records work.

@leekeiabstraction
Copy link
Contributor Author

@luoyuxia @zhaohaidao Would appreciate a review here.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug where column pruning fails after the Rust client writes to a log table. The root cause was that the Rust client wasn't honoring compression configuration when writing Arrow IPC data, causing parsing errors on read. The fix ensures that the Arrow writer respects the compression type specified in table properties.

  • Added compression configuration support by parsing table properties for compression type and level
  • Updated MemoryLogRecordsArrowBuilder to accept and use ArrowCompressionInfo when creating the Arrow IPC StreamWriter
  • Integrated compression configuration into the write path from table config through accumulator to the Arrow writer

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
crates/fluss/src/compression/mod.rs New module declaration for compression support
crates/fluss/src/compression/arrow_compression.rs Implements ArrowCompressionType enum and ArrowCompressionInfo struct with parsing from table properties and validation
crates/fluss/src/lib.rs Adds compression module to library exports
crates/fluss/src/metadata/table.rs Adds get_arrow_compression_info() method to TableConfig for retrieving compression settings
crates/fluss/src/record/arrow.rs Updates MemoryLogRecordsArrowBuilder to accept compression info and configures StreamWriter with appropriate IpcWriteOptions
crates/fluss/src/client/write/batch.rs Updates ArrowLogWriteBatch::new() to accept and pass compression info to the arrow builder
crates/fluss/src/client/write/accumulator.rs Retrieves compression info from table config and passes it when creating new write batches

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leekeiabstraction Thanks for the pr. Currently, all test is set table.log.arrow.compression.type to none, could you please also remove this to verify your fix works.

Copy link
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leekeiabstraction Thanks. LGTM

@leekeiabstraction
Copy link
Contributor Author

Raised #109 to track ZSTD compression level support

@luoyuxia luoyuxia merged commit 4ce374a into apache:main Dec 22, 2025
19 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix column prune won't work when compression is enabel

2 participants