Skip to content

Conversation

@369pro
Copy link

@369pro 369pro commented Dec 25, 2025

Description

Summary: (summary)

Fix: #(issue)

Docs Issue/PR: (docs-issue-or-pr-link)

Reviewer: @(reviewer)

Checklist:

  • I have performed a self-review of my own code | 我已自行检查了自己的代码
  • I have commented my code in hard-to-understand areas | 我已在难以理解的地方对代码进行了注释
  • I have added tests that prove my fix is effective or that my feature works | 我已添加测试以证明我的修复有效或功能正常
  • I have created related documentation issue/PR in MemOS-Docs (if applicable) | 我已在 MemOS-Docs 中创建了相关的文档 issue/PR(如果适用)
  • I have linked the issue to this PR (if applicable) | 我已将 issue 链接到此 PR(如果适用)
  • I have mentioned the person who will review this PR | 我已提及将审查此 PR 的人

Copilot AI review requested due to automatic review settings December 25, 2025 07:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Time to First Token (TTFT) and token speed calculation capabilities to the LLM performance tracking infrastructure. The implementation aims to provide better observability into LLM response times and throughput.

  • Enhanced the timed_with_status decorator to calculate and log content-based performance metrics (speed in chars/s, content length)
  • Added TTFT tracking in OpenAI streaming responses to measure latency from request to first token
  • Introduced model name tracking to correlate performance metrics with specific models

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File Description
src/memos/utils.py Modified the timed_with_status decorator to add performance metrics calculation including total time in seconds, content length, and speed (chars/s) for successful function calls with truthy results
src/memos/llms/openai.py Added TTFT (Time To First Token) tracking in the generate_stream method with timing logic and model name logging for the first token received

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- log_extra_args:
- can be a dict: fixed contextual fields that are always attached to logs;
- or a callable: like `fn(*args, **kwargs) -> dict`, used to dynamically generate contextual fields at runtime.
- target_models: list of target models to filter performance tracking
Copy link

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation mentions a new parameter "target_models" but this parameter is not actually implemented in the function signature or used in the function body. Either remove this documentation line or implement the parameter.

Suggested change
- target_models: list of target models to filter performance tracking

Copilot uses AI. Check for mistakes.
Comment on lines +86 to +88
if success_flag and result:
# Calculate content metrics
content_length = len(result) if isinstance(result, str) else 0
Copy link

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic assumes that when success_flag is True and result is truthy, the result will be a string for length calculation. However, this function is a generic decorator that could be applied to any function. If the result is not a string (e.g., a list, dict, or custom object), this will cause a bug. Consider adding a type check before attempting to measure content_length, or document that this decorator should only be used with string-returning functions.

Copilot uses AI. Check for mistakes.
Comment on lines +91 to +97
# Get model name from self.config
self = args[0] if args else None
model_name = (
getattr(self, "config", {}).get("model_name_or_path", "unknown")
if self
else "unknown"
)
Copy link

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing self.config using getattr on a potentially None object is fragile. If args[0] is not an object with a config attribute (or if args is empty), this will fail silently and return "unknown". This assumes all decorated functions are methods with a specific structure. Consider adding more robust type checking or documenting this requirement clearly.

Suggested change
# Get model name from self.config
self = args[0] if args else None
model_name = (
getattr(self, "config", {}).get("model_name_or_path", "unknown")
if self
else "unknown"
)
# Get model name from self.config, if available and well-formed
self_obj = args[0] if args else None
model_name = "unknown"
if self_obj is not None:
config = getattr(self_obj, "config", None)
if isinstance(config, dict):
model_name = config.get("model_name_or_path", "unknown")

Copilot uses AI. Check for mistakes.
Comment on lines +103 to +110
elif (
success_flag
): # Standard logging for non-target models or when not tracking metrics
msg = (
f"[TIMER_WITH_STATUS] {log_prefix or fn.__name__} "
f"took {elapsed_ms:.0f} ms{status_info}, args: {ctx_str}"
)
logger.info(msg)
Copy link

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logging condition is confusing. The branch at line 103-105 logs when success_flag is True but result is falsy (None, empty string, 0, False, etc.). This means successful executions with falsy results use the standard logging format, while successful executions with truthy results use the performance metrics format. This logic seems inconsistent - consider what should happen when a function successfully returns an empty string or None.

Copilot uses AI. Check for mistakes.
Comment on lines +104 to +108
# 尝试从响应中获取实际模型信息
actual_model = getattr(chunk, "model", None) or self.config.model_name_or_path
requested_model = self.config.model_name_or_path

# Print TTFT info - 显示请求模型和实际模型(如果不一致)
Copy link

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is in Chinese. For consistency with the rest of the codebase which uses English comments, this should be translated to English.

Suggested change
# 尝试从响应中获取实际模型信息
actual_model = getattr(chunk, "model", None) or self.config.model_name_or_path
requested_model = self.config.model_name_or_path
# Print TTFT info - 显示请求模型和实际模型(如果不一致)
# Try to get the actual model information from the response
actual_model = getattr(chunk, "model", None) or self.config.model_name_or_path
requested_model = self.config.model_name_or_path
# Print TTFT info - show requested model and actual model (if they differ)

Copilot uses AI. Check for mistakes.
Comment on lines +104 to +108
# 尝试从响应中获取实际模型信息
actual_model = getattr(chunk, "model", None) or self.config.model_name_or_path
requested_model = self.config.model_name_or_path

# Print TTFT info - 显示请求模型和实际模型(如果不一致)
Copy link

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is in Chinese. For consistency with the rest of the codebase which uses English comments, this should be translated to English.

Suggested change
# 尝试从响应中获取实际模型信息
actual_model = getattr(chunk, "model", None) or self.config.model_name_or_path
requested_model = self.config.model_name_or_path
# Print TTFT info - 显示请求模型和实际模型(如果不一致)
# Try to obtain the actual model information from the response
actual_model = getattr(chunk, "model", None) or self.config.model_name_or_path
requested_model = self.config.model_name_or_path
# Print TTFT info - show requested model and actual model (if they differ)

Copilot uses AI. Check for mistakes.
Comment on lines +93 to +114
first_token_time = None
start_time = time.perf_counter()

for chunk in response:
delta = chunk.choices[0].delta

# Calculate TTFT on first token
if first_token_time is None:
first_token_time = time.perf_counter()
ttft_ms = (first_token_time - start_time) * 1000.0

# 尝试从响应中获取实际模型信息
actual_model = getattr(chunk, "model", None) or self.config.model_name_or_path
requested_model = self.config.model_name_or_path

# Print TTFT info - 显示请求模型和实际模型(如果不一致)
if actual_model != requested_model:
logger.info(
f"TTFT: {ttft_ms:.2f}ms | Requested: {requested_model} | Actual: {actual_model}"
)
else:
logger.info(f"TTFT: {ttft_ms:.2f}ms | {requested_model}")
Copy link

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TTFT (Time To First Token) calculation and logging happens inside the generator loop but only once when first_token_time is None. However, this timing logic doesn't account for cases where the stream might be empty or where no chunks are yielded. Additionally, since this code is inside generate_stream which is decorated with @timed_with_status, there's now redundant timing logic - the decorator already tracks total time. Consider whether TTFT should be tracked separately from the decorator's timing or if this overlaps with the decorator's functionality.

Copilot uses AI. Check for mistakes.
Comment on lines +93 to +114
first_token_time = None
start_time = time.perf_counter()

for chunk in response:
delta = chunk.choices[0].delta

# Calculate TTFT on first token
if first_token_time is None:
first_token_time = time.perf_counter()
ttft_ms = (first_token_time - start_time) * 1000.0

# 尝试从响应中获取实际模型信息
actual_model = getattr(chunk, "model", None) or self.config.model_name_or_path
requested_model = self.config.model_name_or_path

# Print TTFT info - 显示请求模型和实际模型(如果不一致)
if actual_model != requested_model:
logger.info(
f"TTFT: {ttft_ms:.2f}ms | Requested: {requested_model} | Actual: {actual_model}"
)
else:
logger.info(f"TTFT: {ttft_ms:.2f}ms | {requested_model}")
Copy link

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new TTFT timing logic and model name retrieval in generate_stream lacks test coverage. The existing test at line 46-103 in test_openai.py doesn't verify the TTFT calculation or the logging of model information. Consider adding tests to verify that TTFT is calculated correctly and that model name logging works as expected.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant