Support `APPROX_COUNT_DISTINCT ` with HLL sketches for cross-engine approximate count distinct

Introduce `APPROX_COUNT_DISTINCT(...)` as a first-class aggregation type that uses HyperLogLog sketches under the hood. HLL sketches are composable - they can be merged to produce combined cardinality estimates with ~1-2% error.

```
name: unique_customers
type: metric
expression: APPROX_DISTINCT(customer_id)
```

**Materialization (Build Time)**

DJ generates sketch accumulation SQL:
```sql
-- Spark
SELECT date, region, hll_sketch_agg(customer_id, 12) as cust_sketch
FROM orders GROUP BY date, region
```

**Query Time (Rollup)**

When querying across materialized data, DJ generates merge + estimate:
```sql
-- Druid
SELECT
  HLL_SKETCH_ESTIMATE(HLL_SKETCH_UNION(cust_sketch))
FROM materialized_cube
```

**Fallback (No Materialization**

```sql
SELECT hll_sketch_agg(customer_id, 12) FROM orders 
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support `APPROX_COUNT_DISTINCT` with HLL sketches for cross-engine approximate count distinct #1608

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support APPROX_COUNT_DISTINCT with HLL sketches for cross-engine approximate count distinct #1608

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Support `APPROX_COUNT_DISTINCT` with HLL sketches for cross-engine approximate count distinct #1608