-
Notifications
You must be signed in to change notification settings - Fork 458
Implement basic Aggregate Merge Engine #2255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
907559e to
e7d0af7
Compare
e7d0af7 to
d34b27a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements the Aggregation Merge Engine for Fluss, which allows field-level aggregation of rows with the same primary key. The implementation includes 12 aggregate functions (sum, product, max, min, last_value, last_value_ignore_nulls, first_value, first_value_ignore_nulls, listagg, string_agg, bool_and, bool_or) with comprehensive schema evolution support.
Key changes:
- Core aggregation engine with field-level aggregators and caching
- Schema API enhancements to support aggregation function configuration
- Comprehensive documentation with examples for all supported functions
- Utility classes for string concatenation and value comparison
Reviewed changes
Copilot reviewed 56 out of 56 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| website/docs/table-design/table-types/pk-table/merge-engines/aggregation.md | Comprehensive documentation for the Aggregation Merge Engine with examples |
| fluss-server/src/main/java/org/apache/fluss/server/kv/rowmerger/AggregateRowMerger.java | Main row merger implementation with partial update support |
| fluss-server/src/main/java/org/apache/fluss/server/kv/rowmerger/aggregate/*.java | Aggregation context, caching, and field processing logic |
| fluss-server/src/main/java/org/apache/fluss/server/kv/rowmerger/aggregate/functions/*.java | Field aggregator implementations for all functions |
| fluss-server/src/main/java/org/apache/fluss/server/kv/rowmerger/aggregate/factory/*.java | Factory classes for creating aggregators via SPI |
| fluss-common/src/main/java/org/apache/fluss/metadata/AggFunction.java | Enum defining all supported aggregation functions |
| fluss-common/src/main/java/org/apache/fluss/metadata/Schema.java | Schema enhancements to support aggregation functions |
| fluss-common/src/main/java/org/apache/fluss/utils/*.java | Utility classes for string operations and comparisons |
| fluss-common/src/main/java/org/apache/fluss/config/ConfigOptions.java | New configuration options for aggregation |
| Test files | Comprehensive test coverage for all components |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * <p>Efficiently encodes schema ID and target column indices for cache lookup. Uses array | ||
| * content-based equality and hashCode for correct cache behavior. | ||
| */ | ||
| private static class CacheKey { |
Copilot
AI
Dec 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Class CacheKey overrides hashCode but not equals.
Purpose
Linked issue: close #2254
Brief change log
Tests
API and Format
Documentation