<feature>[root]: test gpu #3153

zstack-robot-1 · 2025-12-29T09:06:58Z

DBImpact

Resolves: ZSTAC-1111

Change-Id: I61797062696e7379696b6e6e70796b6367706a6f

sync from gitlab !8967

DBImpact Resolves: ZSTAC-1111 Change-Id: I61797062696e7379696b6e6e70796b6367706a6f

coderabbitai · 2025-12-29T09:07:20Z

工作流

本次更改引入了GPU PCI地址映射系统，包含新的Spring服务、数据库架构、JPA实体和配置文档。该系统管理VM与主机PCI地址之间的映射，具有内存缓存、TTL过期机制和批量数据库查询支持。

更改

内聚组 / 文件	变更总结
数据库架构 `conf/db/upgrade/V5.5.0__addVmGpuPciMappingTable.sql`	创建新的VmGpuPciMappingVO表，包含主键、唯一键约束、外键关系和ON DELETE CASCADE级联规则。支持UUID、vmInstanceUuid、vmPciAddress、hostPciAddress、gpuSerial字段以及时间戳。
JPA实体 `header/src/main/java/org/zstack/header/vm/VmGpuPciMappingVO.java`, `header/src/main/java/org/zstack/header/vm/VmGpuPciMappingVO_.java`	添加VmGpuPciMappingVO实体类和相应的Criteria元模型类，用于类型安全的数据库查询和映射管理。包含@entity、@table、@foreignkey等注解和@PreUpdate生命周期钩子。
核心服务 `compute/src/main/java/org/zstack/compute/vm/VmGpuPciMappingService.java`	新增Spring服务，实现带TTL（5分钟）的并发映射缓存、批量检索、单条创建/删除、预加载、过期清理和缓存统计功能。集成数据库操作和生命周期管理。
文档与指南 `.github/copilot-instructions.md`, `gpu-pci-address-implementation-guide.md`, `prometheus-integration-flow.md`, `zstack-gpu-pci-monitoring-data-flow.md`	添加四份文档文件：Copilot AI助手指令、GPU PCI地址映射设计指南、Prometheus集成流程说明和ZStack GPU PCI监控数据流详解。包含架构概览、集成点、数据流图和性能优化建议。

序列图

sequenceDiagram
    participant Agent as VM Agent
    participant API as API Layer
    participant Cache as VmGpuPciMappingService
    participant DB as Database
    participant PG as Prometheus/<br/>PushGateway
    
    Agent->>API: POST /zwatch/metrics (GPU指标)
    activate API
    API->>API: APIPutMetricDataMsg处理
    API->>Cache: getHostPciAddressesBatch(cacheKeys)
    activate Cache
    alt 缓存命中
        Cache-->>API: 返回已缓存的PCI地址
    else 缓存过期/未命中
        Cache->>DB: batchQueryFromDatabase()
        DB-->>Cache: 返回vmPciAddress→hostPciAddress映射
        Cache->>Cache: 更新缓存和时间戳
        Cache-->>API: 返回映射结果
    end
    deactivate Cache
    
    API->>API: 添加PciDeviceAddressOnHost标签<br/>(动态标签增强)
    API->>DB: DatabaseDriver.write()
    activate DB
    DB-->>PG: 推送至PushGateway
    deactivate DB
    API-->>Agent: 响应成功
    deactivate API
    
    PG->>PG: 存储至TSDB<br/>含有丰富标签

sequenceDiagram
    participant Client as Client
    participant ZQL as ZQL Parser
    participant VmNS as VmNamespace
    participant MappingSvc as VmGpuPciMappingService
    participant TSDB as Prometheus TSDB
    
    Client->>ZQL: 查询 by hostPciAddress
    activate ZQL
    ZQL->>VmNS: 执行查询
    activate VmNS
    VmNS->>VmNS: 识别GPU指标
    VmNS->>MappingSvc: getHostPciAddressesBatch()
    activate MappingSvc
    alt 缓存有效(<1ms)
        MappingSvc-->>VmNS: 返回缓存的PCI映射
    else 需要数据库查询(~5-10ms)
        MappingSvc->>TSDB: 批量查询映射记录
        TSDB-->>MappingSvc: 返回结果并更新缓存
        MappingSvc-->>VmNS: 返回最新映射
    end
    deactivate MappingSvc
    VmNS->>VmNS: 按PciDeviceAddressOnHost过滤
    VmNS-->>ZQL: 返回增强后的数据
    deactivate VmNS
    ZQL-->>Client: 查询结果
    deactivate ZQL

代码审查工作量估计

🎯 3 (中等) | ⏱️ ~20 分钟

诗句

🐰 GPU的地址在哪里，映射表告诉你秘密！
缓存闪闪五分钟，批查库快如闪电，
Prometheus眼睛亮，标签丰富数据香，
PCI来回映射，ZStack监控真强！🚀✨

Pre-merge checks and finishing touches

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name	Status	Explanation	Resolution
Title check	❌ Error	标题'[root]: test gpu'与变更集内容不符，变更涉及GPU PCI地址映射完整实现（数据模型、服务、数据库表、文档），但标题过于宽泛且描述不准确。	建议将标题更改为更具体地反映主要变更，如'[vm]: Add GPU PCI address mapping service and schema'，确保清晰表达变更内容。
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.91% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description check	✅ Passed	PR描述包含JIRA工单号(ZSTAC-1111)和同步来源说明，与变更集相关，虽然信息简洁但足以说明变更的来源和关联工单。

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch sync/jin.shen/fix-80788-test

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

📜 Review details

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ca54e2b and e51869f.

📒 Files selected for processing (8)

.github/copilot-instructions.md
compute/src/main/java/org/zstack/compute/vm/VmGpuPciMappingService.java
conf/db/upgrade/V5.5.0__addVmGpuPciMappingTable.sql
gpu-pci-address-implementation-guide.md
header/src/main/java/org/zstack/header/vm/VmGpuPciMappingVO.java
header/src/main/java/org/zstack/header/vm/VmGpuPciMappingVO_.java
prometheus-integration-flow.md
zstack-gpu-pci-monitoring-data-flow.md

🧰 Additional context used

📓 Path-based instructions (3)

**/*.*

⚙️ CodeRabbit configuration file

**/*.*: - 代码里不应当有有中文，包括报错、注释等都应当使用正确的、无拼写错误的英文来写

Files:

prometheus-integration-flow.md
header/src/main/java/org/zstack/header/vm/VmGpuPciMappingVO_.java
conf/db/upgrade/V5.5.0__addVmGpuPciMappingTable.sql
header/src/main/java/org/zstack/header/vm/VmGpuPciMappingVO.java
compute/src/main/java/org/zstack/compute/vm/VmGpuPciMappingService.java
zstack-gpu-pci-monitoring-data-flow.md
gpu-pci-address-implementation-guide.md

**/*.java

⚙️ CodeRabbit configuration file

**/*.java: ## 1. API 设计要求

API 命名：

API 名称必须唯一，不能重复。

API 消息类需要继承 APIMessage；其返回类必须继承 APIReply 或 APIEvent，并在注释中用 @RestResponse 进行标注。

API 消息上必须添加注解 @RestRequest，并满足如下规范：

path:

针对资源使用复数形式。

当 path 中引用消息类变量时，使用 {variableName} 格式。

HTTP 方法对应：

查询操作 → HttpMethod.GET

更新操作 → HttpMethod.PUT

创建操作 → HttpMethod.POST

删除操作 → HttpMethod.DELETE

API 类需要实现 __example__ 方法以便生成 API 文档，并确保生成对应的 Groovy API Template 与 API Markdown 文件。

2. 命名与格式规范

类名：

使用 UpperCamelCase 风格。

特殊情况：

VO/AO/EO 类型类除外。

抽象类采用 Abstract 或 Base 前缀/后缀。

异常类应以 Exception 结尾。

测试类需要以 Test 或 Case 结尾。

方法名、参数名、成员变量和局部变量：

使用 lowerCamelCase 风格。

常量命名：

全部大写，使用下划线分隔单词。

要求表达清楚，避免使用含糊或不准确的名称。

包名：

统一使用小写，使用点分隔符，每个部分应是一个具有自然语义的英文单词（参考 Spring 框架的结构）。

命名细节：

避免在父子类或同一代码块中出现相同名字的成员或局部变量，防止混淆。

命名缩写：

不允许使用不必要的缩写，如：AbsSchedulerJob、condi、Fu 等。应使用完整单词提升可读性。

3. 编写自解释代码

意图表达：

避免使用布尔型参数造成含义不明确。例如：

对于 stopAgent(boolean ignoreError)，建议拆分为不同函数（如 stopAgentIgnoreError()），或使用枚举表达操作类型。

命名应尽量用完整的单词组合表达意图，并在名称中体现数据类型或用途（例如在常量与变量名称中，将类型词放在末尾）。

避免使用魔法值（Magic Value）：

直接使用未经定义的数值或字符串（如 if (status == 5)）应替换为枚举或常量。

示例：

// 错误示例：魔法值

if (user.getStatus() == 5) { ... }

// 正确示例：常量或枚举

public static final int STATUS_ACTIVE = 5;

if (user.getStatus() == STATUS_ACTIVE) { ... }

// 或使用枚举

enum UserStatus { ACTIVE, INACTIVE }

注释：

代码应尽量做到自解释，对少于两行的说明可以直接写在代码中。

对于较长的注释，需要仔细校对并随代码更新，确保内容正确。

接口方法不应有多余的修饰符（例如 public），且必须配有有效的 Javadoc 注释。

4. 流程控制和结构优化

if...else 的使用：

应尽量减少 if...else 结构的使用，建议：

限制嵌套层级最多为两层，且内层不应再出现 else 分支。

尽早返回（Early Return），将条件判断中的处理逻辑提前结束或抽成独立方法。

使用 Java Stream 或 Lambda 表达式代替冗长的循环与条件判断。

条件判断：

if 条件表达不宜过长或过于复杂，必要时可以将条件抽成 boolean 变量描述。

代码块长度：
...

Files:

header/src/main/java/org/zstack/header/vm/VmGpuPciMappingVO_.java
header/src/main/java/org/zstack/header/vm/VmGpuPciMappingVO.java
compute/src/main/java/org/zstack/compute/vm/VmGpuPciMappingService.java

**/*.sql

⚙️ CodeRabbit configuration file

**/*.sql: - Review the SQL code, make sure has no errors and confirm that:

Upgrading scene has been carefully handled

Do not use DEFAULT 0000-00-00 00:00:00 , use DEFAULT CURRENT_TIMESTAMP instead

When NOT NULL exists, must use stored procedure or other functions to process historical data, this is very very important

数据库记录中，如果字符串长度不可控，不要用 vchar，用 text 类型

Files:

conf/db/upgrade/V5.5.0__addVmGpuPciMappingTable.sql

🧠 Learnings (16)

📚 Learning: 2025-08-04T04:48:19.103Z

Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 2360
File: network/src/main/java/org/zstack/network/l3/L3BasicNetwork.java:449-490
Timestamp: 2025-08-04T04:48:19.103Z
Learning: ZStack项目在cherry-pick操作中，即使发现了性能优化机会（如IP地址批量保存的内存优化），也严格遵循不做额外修改的政策，优先保证cherry-pick的完整性和一致性。

Applied to files:

prometheus-integration-flow.md

📚 Learning: 2025-08-24T06:33:10.771Z

Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 2496
File: storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotTreeBase.java:1218-1224
Timestamp: 2025-08-24T06:33:10.771Z
Learning: ZStack团队在容量管理相关问题上响应迅速，当发现Pull快照流程中申请容量与释放容量不匹配的问题时，开发人员会及时进行修复以确保主存储容量核算的准确性。

Applied to files:

prometheus-integration-flow.md

📚 Learning: 2025-08-04T04:48:53.756Z

Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 2360
File: header/src/main/java/org/zstack/header/network/l3/APIDeleteReservedIpRangeMsg.java:59-63
Timestamp: 2025-08-04T04:48:53.756Z
Learning: 在ZStack项目中，API的Markdown文档文件不在主代码仓库内维护，而是在单独的文档仓库或位置进行管理。尽管编码规范提到需要生成对应的Groovy API Template与API Markdown文件，但Markdown文档不在当前代码仓库中。

Applied to files:

.github/copilot-instructions.md

📚 Learning: 2025-08-04T03:19:07.454Z

Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 2360
File: compute/src/main/java/org/zstack/compute/vm/StaticIpOperator.java:364-501
Timestamp: 2025-08-04T03:19:07.454Z
Learning: In ZStack project, developers may prefer to keep related validation logic together in a single method (like validateStaticIpTagsInApiMessage with 130+ lines) when the architecture is clear and there's no code reuse, rather than splitting into smaller methods based purely on single responsibility principle.

Applied to files:

.github/copilot-instructions.md

📚 Learning: 2025-08-22T05:36:40.467Z

Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 2489
File: storage/src/main/java/org/zstack/storage/snapshot/VolumeTree.java:471-545
Timestamp: 2025-08-22T05:36:40.467Z
Learning: In ZStack项目中，对于核心逻辑相关的代码，团队倾向于暂时不进行重构，即使代码复杂度较高，优先保证核心功能的稳定性。

Applied to files:

.github/copilot-instructions.md

📚 Learning: 2025-08-06T05:53:38.558Z

Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 2380
File: storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotManagerImpl.java:1232-1232
Timestamp: 2025-08-06T05:53:38.558Z
Learning: 在ZStack项目中，代码注释应该准确反映实际实现。如果代码只查询特定类型的归档数据（如ArchiveVmNicBundle），注释不应该说成"所有设备地址归档"，而应该具体说明查询的是哪种类型的归档。

Applied to files:

.github/copilot-instructions.md

📚 Learning: 2025-06-19T09:08:57.046Z

Learnt from: MatheMatrix
Repo: MatheMatrix/zstack PR: 2217
File: header/src/main/java/org/zstack/header/network/l2/L2NetworkCandidateFilterExtensionPoint.java:7-10
Timestamp: 2025-06-19T09:08:57.046Z
Learning: In the zstack project, for simple and self-explanatory interfaces like L2NetworkCandidateFilterExtensionPoint, the team prefers to rely on clear method names and parameters rather than adding detailed Javadoc comments, even when coding guidelines suggest otherwise.

Applied to files:

.github/copilot-instructions.md

📚 Learning: 2025-10-20T11:50:42.825Z

Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 2763
File: sdk/src/main/java/org/zstack/sdk/BatchCreateHostKernelInterfaceAction.java:52-54
Timestamp: 2025-10-20T11:50:42.825Z
Learning: 在 MatheMatrix/zstack 仓库中，sdk/src/main/java/org/zstack/sdk/ 目录下的文件是自动生成的内容，不应进行手动修改。如果需要修复这些文件中的问题，应该在代码生成器层面进行调整。

Applied to files:

.github/copilot-instructions.md

📚 Learning: 2025-07-10T08:46:14.028Z

Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 2288
File: header/src/main/java/org/zstack/header/network/l2/L2NetworkOwnedL3ExtensionPoint.java:1-4
Timestamp: 2025-07-10T08:46:14.028Z
Learning: 在 ZStack 项目中，Java 文件不使用许可证头部注释，文件直接以 package 声明开始，这是项目的标准做法。

Applied to files:

.github/copilot-instructions.md

📚 Learning: 2025-08-13T02:31:40.023Z

Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 2419
File: plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerBase.java:986-1067
Timestamp: 2025-08-13T02:31:40.023Z
Learning: 在ZStack项目中，团队倾向于保持统一的业务流程方法不进行拆分，即使方法较长也会维持在单个方法中以保持业务逻辑的完整性。

Applied to files:

.github/copilot-instructions.md

📚 Learning: 2025-07-24T05:53:10.246Z

Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 2325
File: compute/src/main/java/org/zstack/compute/vm/VmMigrateCallExtensionFlow.java:29-29
Timestamp: 2025-07-24T05:53:10.246Z
Learning: 在ZStack项目的重构过程中，团队采用务实的优先级策略：优先完成影响运行时功能的代码重构，对于注释中的旧引用等非功能性问题可以延后处理，因为这些不会影响系统的实际运行。

Applied to files:

.github/copilot-instructions.md

📚 Learning: 2025-10-29T10:12:05.812Z

Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 2829
File: sdk/src/main/java/org/zstack/sdk/CandidateDecisionEntry.java:7-29
Timestamp: 2025-10-29T10:12:05.812Z
Learning: Files in the `sdk/src/main/java/org/zstack/sdk/` directory are auto-generated and should not be manually modified. Code style and encapsulation issues in these files should be ignored during reviews.

Applied to files:

.github/copilot-instructions.md

📚 Learning: 2025-07-22T02:30:46.123Z

Learnt from: MatheMatrix
Repo: MatheMatrix/zstack PR: 2311
File: plugin/sdnController/src/main/java/org/zstack/sdnController/hardwareVxlan/KVMRealizeHardwareVxlanNetworkBackend.java:52-53
Timestamp: 2025-07-22T02:30:46.123Z
Learning: In ZStack, DatabaseFacade.findByUuid() calls for VO objects in contexts like KVMRealizeHardwareVxlanNetworkBackend do not require null checks because the L2NetworkInventory passed to these methods is guaranteed to have a corresponding VO record in the database by the system's design and calling patterns.

Applied to files:

header/src/main/java/org/zstack/header/vm/VmGpuPciMappingVO_.java

📚 Learning: 2025-08-03T04:10:21.683Z

Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 0
File: :0-0
Timestamp: 2025-08-03T04:10:21.683Z
Learning: ZStack 数据库升级脚本 V4.10.16__schema.sql 中的 UPGRADE_VM_METADATA_TABLES_DIRECT 存储过程使用直接 RENAME TABLE 操作，不具备幂等性。需要通过 information_schema 检查表和约束的存在性来确保脚本可以安全重复执行。

Applied to files:

conf/db/upgrade/V5.5.0__addVmGpuPciMappingTable.sql

📚 Learning: 2025-08-03T04:10:21.683Z

Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 0
File: :0-0
Timestamp: 2025-08-03T04:10:21.683Z
Learning: 在 ZStack 数据库升级脚本中，直接使用 RENAME TABLE 不能保证幂等性。应该通过 information_schema.tables 检查表的存在性，只在源表存在且目标表不存在时才执行重命名操作，以确保升级脚本可以安全地重复执行。

Applied to files:

conf/db/upgrade/V5.5.0__addVmGpuPciMappingTable.sql

📚 Learning: 2025-08-03T03:42:34.349Z

Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 2293
File: conf/db/upgrade/V4.10.16__schema.sql:69-82
Timestamp: 2025-08-03T03:42:34.349Z
Learning: 在 ZStack 项目中进行数据库 DDL 升级时，由于 MySQL/MariaDB 中的 RENAME TABLE、ALTER TABLE 等 DDL 语句会隐式 COMMIT，因此应使用临时表 + 数据复制的方式来确保升级可逆，而不是依赖事务的 ROLLBACK 机制。

Applied to files:

conf/db/upgrade/V5.5.0__addVmGpuPciMappingTable.sql

🧬 Code graph analysis (2)

header/src/main/java/org/zstack/header/vm/VmGpuPciMappingVO.java (2)

header/src/main/java/org/zstack/header/storage/primary/PrimaryStorageHistoricalUsageBaseVO.java (1)

org (11-45)

core/src/main/java/org/zstack/core/upgrade/AgentVersionVO.java (1)

Table (10-87)

compute/src/main/java/org/zstack/compute/vm/VmGpuPciMappingService.java (2)

core/src/main/java/org/zstack/core/Platform.java (1)

Platform (70-1208)

core/src/main/java/org/zstack/core/db/Q.java (1)

Q (16-193)

🪛 LanguageTool

prometheus-integration-flow.md

[uncategorized] ~217-~217: 动词的修饰一般为‘形容词（副词）+地+动词’。您的意思是否是：灵活"地"推
Context: ...icTemplate和MetricDataHttpReceiver`配置灵活的推送策略 --- *本文档基于ZStack代码库分析生成，版本: 5.4.0...

(wb4)

zstack-gpu-pci-monitoring-data-flow.md

[grammar] ~295-~295: Woord verwijderen
Context: ...- 内存缓存：~0.1ms响应 vs ~5-10ms DB查询 - TTL过期：5分钟自动清理，防止内存泄漏 ### 扩展性设计 - 插件化架构：DatabaseDriver接口支持不同存储...

(QB_NEW_NL_OTHER_ERROR_IDS_UNNECESSARY_NOUN)

🪛 markdownlint-cli2 (0.18.1)

gpu-pci-address-implementation-guide.md

175-175: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

182-182: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

190-190: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

493-493: Multiple headings with the same content

(MD024, no-duplicate-heading)

🔇 Additional comments (8)

.github/copilot-instructions.md (1)

1-42: LGTM!

该文件为 AI 编码助手提供了清晰的 ZStack 架构概览和开发规范指导，内容结构合理，使用英文编写符合编码规范要求。

prometheus-integration-flow.md (1)

1-221: LGTM!

该文档详细描述了 ZStack 与 Prometheus 的集成流程，包括数据收集（Pull 模式）和推送（Push 模式）两个主要流程。文档结构清晰，包含架构图、时序图和代码示例，有助于理解监控系统的设计。

conf/db/upgrade/V5.5.0__addVmGpuPciMappingTable.sql (1)

1-48: SQL 升级脚本设计良好

脚本使用了存储过程和幂等性检查，符合 ZStack 项目的最佳实践。正确使用了 DEFAULT CURRENT_TIMESTAMP 而非 DEFAULT 0000-00-00 00:00:00，并且通过外键约束实现了级联删除。

header/src/main/java/org/zstack/header/vm/VmGpuPciMappingVO_.java (1)

1-15: LGTM!

JPA 元模型类定义正确，所有字段与 VmGpuPciMappingVO 实体类匹配，符合 JPA 规范要求。

zstack-gpu-pci-monitoring-data-flow.md (1)

1-306: LGTM!

该文档详细描述了 ZStack GPU PCI 地址映射的监控数据流，包含完整的架构图、时序图、数据转换流程和性能优化方案。文档结构清晰，配合 mermaid 图表使设计思路易于理解。

compute/src/main/java/org/zstack/compute/vm/VmGpuPciMappingService.java (1)

17-32: 缓存设计良好，实现了性能优化

该服务使用了 TTL-based 的缓存策略，结合 @PostConstruct 预加载机制，能够有效减少数据库查询压力。批量查询接口的设计也符合性能优化最佳实践。

Also applies to: 212-228

gpu-pci-address-implementation-guide.md (1)

1-839: LGTM!

该实现指南文档非常详尽，涵盖了 GPU PCI 地址映射方案的各个方面：

数据模型设计与 VO 类定义

服务层接口与缓存优化

查询层的动态标签增强

Prometheus 集成设计

性能优化方案（批量查询、内存缓存、预热机制）

ZQL 查询支持和扩展性设计

文档结构清晰，包含了完整的代码示例和最佳实践建议，对开发和维护有很大帮助。
header/src/main/java/org/zstack/header/vm/VmGpuPciMappingVO.java (1)
32-33: 缺少 createDate 字段的初始化

createDate 字段定义了 @Column 注解，但没有设置默认值或在 @PrePersist 中初始化。相比之下，lastOpDate 在 @PreUpdate 中有更新逻辑。建议添加 @PrePersist 方法来初始化 createDate。
🔎 建议的修复
+    @PrePersist
+    private void prePersist() {
+        createDate = new Timestamp(System.currentTimeMillis());
+        lastOpDate = new Timestamp(System.currentTimeMillis());
+    }
+
     @PreUpdate
     private void preUpdate() {
         lastOpDate = new Timestamp(System.currentTimeMillis());
     }
⛔ Skipped due to learnings
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 2419
File: conf/db/upgrade/V5.3.46__schema.sql:105-116
Timestamp: 2025-08-12T03:27:18.648Z
Learning: In ZStack project, the timestamp pattern `createDate` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00' combined with `lastOpDate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP is the standard database schema usage and should not be flagged as an issue. This is an established project convention confirmed by project maintainers.
Learnt from: MatheMatrix
Repo: MatheMatrix/zstack PR: 2250
File: header/src/main/java/org/zstack/header/network/l3/ReservedIpRangeVO.java:50-53
Timestamp: 2025-07-07T02:24:14.223Z
Learning: 在 ZStack 代码库中，PreUpdate 回调方法中将 lastOpDate 设置为 null 是一个既定的设计模式，不需要修改为当前时间戳。这个模式在其他 VO 类中也有使用，应该遵循现有的设计。
Learnt from: MatheMatrix
Repo: MatheMatrix/zstack PR: 2419
File: conf/db/upgrade/V5.3.46__schema.sql:90-104
Timestamp: 2025-08-12T03:27:09.968Z
Learning: In ZStack database schema files, the pattern `createDate` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00' is the established project standard and should not be changed to DEFAULT CURRENT_TIMESTAMP, despite general SQL guidelines suggesting otherwise. This is confirmed project convention.
Learnt from: MatheMatrix
Repo: MatheMatrix/zstack PR: 2428
File: conf/db/upgrade/V5.3.40__schema.sql:195-204
Timestamp: 2025-09-02T05:45:41.896Z
Learning: User MatheMatrix (ye.zou) confirmed that the createDate timestamp pattern with DEFAULT '0000-00-00 00:00:00' should not be changed in ZStack database schema files, reinforcing that this is the established project convention that takes precedence over general SQL guidelines.
Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 2307
File: conf/db/upgrade/V5.4.0__schema.sql:25-25
Timestamp: 2025-07-21T01:53:16.584Z
Learning: In the ZStack project, all `createDate` timestamp columns in SQL schema files consistently use `DEFAULT '0000-00-00 00:00:00'` as the default value, not `DEFAULT CURRENT_TIMESTAMP`. This is a project-wide convention and should be followed for consistency.

coderabbitai · 2025-12-29T09:10:56Z

compute/src/main/java/org/zstack/compute/vm/VmGpuPciMappingService.java

@@ -0,0 +1,267 @@
+package org.zstack.vm;


⚠️ Potential issue | 🔴 Critical

包名错误

根据文件路径 compute/src/main/java/org/zstack/compute/vm/VmGpuPciMappingService.java，包名应为 org.zstack.compute.vm，而非 org.zstack.vm。

🔎 建议的修复

-package org.zstack.vm; +package org.zstack.compute.vm;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

package org.zstack.vm;

package org.zstack.compute.vm;

🤖 Prompt for AI Agents

In compute/src/main/java/org/zstack/compute/vm/VmGpuPciMappingService.java at line 1 the package declaration is incorrect (package org.zstack.vm); change it to package org.zstack.compute.vm to match the file path and Java package conventions, update any relative imports or references in the file if needed, and rebuild to ensure no package-related compilation errors.

coderabbitai · 2025-12-29T09:10:56Z

compute/src/main/java/org/zstack/compute/vm/VmGpuPciMappingService.java

+    // 缓存配置
+    private static final long CACHE_EXPIRE_MS = 5 * 60 * 1000; // 5分钟缓存过期时间
+
+    // 缓存结构：key = vmUuid + ":" + vmPciAddress, value = hostPciAddress
+    private final Map<String, String> mappingCache = new ConcurrentHashMap<>();
+
+    // 缓存时间戳：key = cacheKey, value = timestamp
+    private final Map<String, Long> cacheTimestamp = new ConcurrentHashMap<>();


⚠️ Potential issue | 🔴 Critical

代码中包含中文注释，违反编码规范

根据编码规范："代码里不应当有有中文，包括报错、注释等都应当使用正确的、无拼写错误的英文来写"。所有注释应使用英文。

🔎 建议的修复示例

- // 缓存配置 - private static final long CACHE_EXPIRE_MS = 5 * 60 * 1000; // 5分钟缓存过期时间 + // Cache configuration + private static final long CACHE_EXPIRE_MS = 5 * 60 * 1000; // Cache expires after 5 minutes - // 缓存结构：key = vmUuid + ":" + vmPciAddress, value = hostPciAddress + // Cache structure: key = vmUuid + ":" + vmPciAddress, value = hostPciAddress private final Map<String, String> mappingCache = new ConcurrentHashMap<>(); - // 缓存时间戳：key = cacheKey, value = timestamp + // Cache timestamp: key = cacheKey, value = timestamp private final Map<String, Long> cacheTimestamp = new ConcurrentHashMap<>();

请将文件中所有中文注释（第 24、25、27、30、34、39、43、49、60、66、72、85、89、104、131、140、145、156、160、166、178、182、188、199、203、209、230、248、263 行等）替换为英文。

🤖 Prompt for AI Agents

In compute/src/main/java/org/zstack/compute/vm/VmGpuPciMappingService.java around lines 24-31 (and other listed lines: 34,39,43,49,60,66,72,85,89,104,131,140,145,156,160,166,178,182,188,199,203,209,230,248,263), replace all Chinese comments with clear, correctly spelled English comments that preserve the original meaning (e.g., "缓存配置" -> "Cache configuration", "缓存结构：key = vmUuid + \":\" + vmPciAddress, value = hostPciAddress" -> "Cache structure: key = vmUuid + \":\" + vmPciAddress, value = hostPciAddress", etc.); ensure no Chinese characters remain in comments or error messages, keep comment formatting and locations unchanged, and run a quick grep/IDE search to verify all occurrences are converted before committing.

coderabbitai · 2025-12-29T09:10:56Z

compute/src/main/java/org/zstack/compute/vm/VmGpuPciMappingService.java

+    public String getHostPciAddress(String vmUuid, String vmPciAddress) {
+        String cacheKey = vmUuid + ":" + vmPciAddress;
+
+        // 检查缓存是否过期
+        Long timestamp = cacheTimestamp.get(cacheKey);
+        if (timestamp != null &&
+            System.currentTimeMillis() - timestamp > CACHE_EXPIRE_MS) {
+            // 缓存过期，清理
+            mappingCache.remove(cacheKey);
+            cacheTimestamp.remove(cacheKey);
+            timestamp = null;
+        }
+
+        // 从缓存获取或查询数据库
+        return mappingCache.computeIfAbsent(cacheKey, key -> {
+            String hostAddress = queryFromDatabase(vmUuid, vmPciAddress);
+            if (hostAddress != null) {
+                cacheTimestamp.put(cacheKey, System.currentTimeMillis());
+            }
+            return hostAddress;
+        });
+    }


⚠️ Potential issue | 🔴 Critical

缺少 queryFromDatabase 方法定义

第 51 行调用了 queryFromDatabase(vmUuid, vmPciAddress) 方法，但该方法在类中未定义。这会导致编译错误。

🔎 建议添加缺失的方法

/** * Query mapping from database */ private String queryFromDatabase(String vmUuid, String vmPciAddress) { VmGpuPciMappingVO mapping = Q.New(VmGpuPciMappingVO.class) .eq(VmGpuPciMappingVO_.vmInstanceUuid, vmUuid) .eq(VmGpuPciMappingVO_.vmPciAddress, vmPciAddress) .find(); return mapping != null ? mapping.getHostPciAddress() : null; }

coderabbitai · 2025-12-29T09:10:56Z

compute/src/main/java/org/zstack/compute/vm/VmGpuPciMappingService.java

+    private Map<String, String> batchQueryFromDatabase(List<String> cacheKeys) {
+        // 解析cacheKeys为vmUuid和pciAddress
+        Set<String> vmUuids = cacheKeys.stream()
+            .map(key -> key.split(":")[0])
+            .collect(Collectors.toSet());
+
+        Set<String> pciAddresses = cacheKeys.stream()
+            .map(key -> key.split(":")[1])
+            .collect(Collectors.toSet());
+
+        // 批量查询数据库
+        List<VmGpuPciMappingVO> mappings = Q.New(VmGpuPciMappingVO.class)
+            .in(VmGpuPciMappingVO_.vmInstanceUuid, vmUuids)
+            .in(VmGpuPciMappingVO_.vmPciAddress, pciAddresses)
+            .list();
+
+        // 转换为Map
+        return mappings.stream()
+            .collect(Collectors.toMap(
+                vo -> vo.getVmInstanceUuid() + ":" + vo.getVmPciAddress(),
+                VmGpuPciMappingVO::getHostPciAddress
+            ));
+    }


⚠️ Potential issue | 🔴 Critical

批量查询存在潜在的数据不匹配问题

第 118-121 行的批量查询使用了两个 IN 子句分别查询 vmUuids 和 pciAddresses，这会返回所有 vmUuid 在集合中且 pciAddress 在集合中的记录，但不保证 vmUuid 和 pciAddress 是成对匹配的。

例如，如果 cacheKeys 包含 ["vm1:pci1", "vm2:pci2"]，查询会返回 vm1+pci2 和 vm2+pci1 的组合（如果存在），这是不正确的。

🔎 建议的修复方案

方案 1：使用多个单独的查询（适合小批量）

private Map<String, String> batchQueryFromDatabase(List<String> cacheKeys) { Map<String, String> result = new HashMap<>(); for (String cacheKey : cacheKeys) { String[] parts = cacheKey.split(":"); String vmUuid = parts[0]; String vmPciAddress = parts[1]; VmGpuPciMappingVO mapping = Q.New(VmGpuPciMappingVO.class) .eq(VmGpuPciMappingVO_.vmInstanceUuid, vmUuid) .eq(VmGpuPciMappingVO_.vmPciAddress, vmPciAddress) .find(); if (mapping != null) { result.put(cacheKey, mapping.getHostPciAddress()); } } return result; }

方案 2：查询所有相关记录后在内存中过滤（适合大批量）

private Map<String, String> batchQueryFromDatabase(List<String> cacheKeys) { Set<String> vmUuids = cacheKeys.stream() .map(key -> key.split(":")[0]) .collect(Collectors.toSet()); List<VmGpuPciMappingVO> mappings = Q.New(VmGpuPciMappingVO.class) .in(VmGpuPciMappingVO_.vmInstanceUuid, vmUuids) .list(); // Build a set of requested cache keys for filtering Set<String> requestedKeys = new HashSet<>(cacheKeys); return mappings.stream() .map(vo -> { String key = vo.getVmInstanceUuid() + ":" + vo.getVmPciAddress(); return new AbstractMap.SimpleEntry<>(key, vo.getHostPciAddress()); }) .filter(entry -> requestedKeys.contains(entry.getKey())) .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue)); }

🤖 Prompt for AI Agents

In compute/src/main/java/org/zstack/compute/vm/VmGpuPciMappingService.java around lines 107-129, the current batch query uses two IN filters which can return mismatched vmUuid+pciAddress combinations; change to either (A) for small batches loop per cacheKey and query by both vmInstanceUuid and vmPciAddress to get exact matches, or (B, preferred for larger batches) query once by the set of vmUuids only, build a Set of requested "vmUuid:vmPciAddress" cache keys, then filter the returned mappings in-memory to keep only entries whose vmUuid:vmPciAddress pair exists in the requested set and collect those into the result map.

coderabbitai · 2025-12-29T09:10:56Z

conf/db/upgrade/V5.5.0__addVmGpuPciMappingTable.sql

+            `vmInstanceUuid` varchar(32) NOT NULL COMMENT 'VM实例UUID',
+            `vmPciAddress` varchar(32) NOT NULL COMMENT 'VM内部看到的PCI地址',
+            `hostPciAddress` varchar(32) NOT NULL COMMENT 'Host上真实的PCI地址',
+            `gpuSerial` varchar(128) DEFAULT NULL COMMENT 'GPU序列号',


⚠️ Potential issue | 🔴 Critical

SQL 脚本中包含中文注释，违反编码规范

根据编码规范要求："代码里不应当有有中文，包括报错、注释等都应当使用正确的、无拼写错误的英文来写"。数据库表和列的 COMMENT 也属于代码的一部分，应使用英文。

🔎 建议的修复

CREATE TABLE `zstack`.`VmGpuPciMappingVO` ( `uuid` varchar(32) NOT NULL, - `vmInstanceUuid` varchar(32) NOT NULL COMMENT 'VM实例UUID', - `vmPciAddress` varchar(32) NOT NULL COMMENT 'VM内部看到的PCI地址', - `hostPciAddress` varchar(32) NOT NULL COMMENT 'Host上真实的PCI地址', - `gpuSerial` varchar(128) DEFAULT NULL COMMENT 'GPU序列号', + `vmInstanceUuid` varchar(32) NOT NULL COMMENT 'VM instance UUID', + `vmPciAddress` varchar(32) NOT NULL COMMENT 'PCI address seen inside the VM', + `hostPciAddress` varchar(32) NOT NULL COMMENT 'Real PCI address on the host', + `gpuSerial` varchar(128) DEFAULT NULL COMMENT 'GPU serial number', `createDate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `lastOpDate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`uuid`), UNIQUE KEY `ukVmGpuPciMappingVO` (`vmInstanceUuid`, `vmPciAddress`), KEY `fkVmGpuPciMappingVOVmInstanceVO` (`vmInstanceUuid`), CONSTRAINT `fkVmGpuPciMappingVOVmInstanceVO` FOREIGN KEY (`vmInstanceUuid`) REFERENCES `zstack`.`VmInstanceVO` (`uuid`) ON DELETE CASCADE - ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='VM GPU PCI地址映射表'; + ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='VM GPU PCI address mapping table';

Also applies to: 33-33

🤖 Prompt for AI Agents

In conf/db/upgrade/V5.5.0__addVmGpuPciMappingTable.sql around lines 23-26 (and also line 33), the column COMMENTs are written in Chinese which violates the codebase rule to use English; update each column COMMENT to clear, concise English equivalents (e.g., "VM instance UUID", "PCI address seen inside VM", "Host PCI address", "GPU serial number") and ensure any other Chinese comments in the file are replaced with proper English text.

coderabbitai · 2025-12-29T09:10:56Z

header/src/main/java/org/zstack/header/vm/VmGpuPciMappingVO.java

+@EntityGraph(
+    friends = {
+        @EntityGraph.Neighbour(type = VmInstanceVO.class, myField = "vmInstanceUuid", targetField = "uuid")
+    }
+)


⚠️ Potential issue | 🟠 Major

EntityGraph 关系定义可能有误

根据相关代码示例（如 PrimaryStorageHistoricalUsageBaseVO.java），当定义外键关系时应使用 parents 而非 friends。friends 通常用于同级关系，而这里 VmGpuPciMappingVO 依赖于 VmInstanceVO，应该是父子关系。

🔎 建议的修复

@EntityGraph( - friends = { + parents = { @EntityGraph.Neighbour(type = VmInstanceVO.class, myField = "vmInstanceUuid", targetField = "uuid") } )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@EntityGraph(

friends = {

@EntityGraph.Neighbour(type = VmInstanceVO.class, myField = "vmInstanceUuid", targetField = "uuid")

}

)

@EntityGraph(

parents = {

@EntityGraph.Neighbour(type = VmInstanceVO.class, myField = "vmInstanceUuid", targetField = "uuid")

}

)

🤖 Prompt for AI Agents

In header/src/main/java/org/zstack/header/vm/VmGpuPciMappingVO.java around lines 13 to 17, the EntityGraph relationship is declared with "friends" but this mapping should be a parent relationship to VmInstanceVO; replace the friends attribute with parents and declare the @EntityGraph.Neighbour entry under parents (pointing myField="vmInstanceUuid" to targetField="uuid") so the VO correctly models VmInstanceVO as its parent.

<feature>[root]: test gpu

e51869f

DBImpact Resolves: ZSTAC-1111 Change-Id: I61797062696e7379696b6e6e70796b6367706a6f

coderabbitai bot reviewed Dec 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

<feature>[root]: test gpu #3153

<feature>[root]: test gpu #3153

Uh oh!

zstack-robot-1 commented Dec 29, 2025

Uh oh!

coderabbitai bot commented Dec 29, 2025 •

edited

Loading

Pre-merge checks failed

Uh oh!

coderabbitai bot left a comment

2. 命名与格式规范

3. 编写自解释代码

4. 流程控制和结构优化

Uh oh!

coderabbitai bot Dec 29, 2025

Uh oh!

coderabbitai bot Dec 29, 2025

Uh oh!

coderabbitai bot Dec 29, 2025

Uh oh!

coderabbitai bot Dec 29, 2025

Uh oh!

coderabbitai bot Dec 29, 2025

Uh oh!

coderabbitai bot Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

<feature>[root]: test gpu #3153

Are you sure you want to change the base?

<feature>[root]: test gpu #3153

Uh oh!

Conversation

zstack-robot-1 commented Dec 29, 2025

Uh oh!

coderabbitai bot commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

工作流

更改

序列图

代码审查工作量估计

诗句

Pre-merge checks and finishing touches

Pre-merge checks failed

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

2. 命名与格式规范

3. 编写自解释代码

4. 流程控制和结构优化

Uh oh!

coderabbitai bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Dec 29, 2025 •

edited

Loading