-
Notifications
You must be signed in to change notification settings - Fork 3k
Human-in-the-Loop Implementation #2021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| previous_response_id: str | None, | ||
| conversation_id: str | None, | ||
| session: Session | None, | ||
| run_state: RunState[TContext] | None = None, | ||
| ): | ||
| if streamed_result.trace: | ||
| streamed_result.trace.start(mark_as_current=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prime server tracker when resuming streaming runs
When Runner.run_streamed resumes from a RunState with conversation_id or previous_response_id, _start_streaming constructs a _ServerConversationTracker but never seeds it with the prior model_responses that were already sent. Unlike the synchronous path, no call to track_server_items is made, so prepare_input treats every previously generated item as unsent and resubmits them to the server. This duplicates earlier messages and breaks server-side conversation threading when a run is resumed.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in a56ce0a. Added server conversation tracker priming at lines 1076-1079 to match the non-streaming implementation and prevent message duplication when resuming from RunState.
| previous_response_id: str | None, | ||
| conversation_id: str | None, | ||
| session: Session | None, | ||
| run_state: RunState[TContext] | None = None, | ||
| ): | ||
| if streamed_result.trace: | ||
| streamed_result.trace.start(mark_as_current=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Streaming resume ignores existing turn count
The streaming execution path always initializes current_turn = 0 when _start_streaming is called, even if a RunState with an existing _current_turn is supplied. The loop then increments from zero, so any turns completed before the interruption are ignored and the max_turns guard is reset. After each interruption, a resumed streaming run can exceed the user’s turn limit and misreport the current turn number.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was already fixed in 74c50fd at line 914: current_turn=run_state._current_turn if run_state else 0. The turn counter is properly restored from the RunState.
…essage duplication When resuming a streaming run from RunState, the server conversation tracker was not being primed with previously sent model responses. This caused `prepare_input` to treat all previously generated items as unsent and resubmit them to the server, breaking conversation threading. **Issue**: Missing `track_server_items` call in streaming resumption path **Fix**: Added server conversation tracker priming logic in `_start_streaming` method (lines 1076-1079) to match the non-streaming path implementation (lines 553-556). The fix iterates through `run_state._model_responses` and calls `track_server_items(response)` to mark them as already sent to the server. **Impact**: Resolves message duplication when resuming interrupted streaming runs, ensuring proper conversation threading with server-side sessions. Fixes code review feedback from PR openai#2021 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Thanks for sending this patch! I currently don't have the bandwidth to check this in depth, but one thing I wanted to mention is that, while implementing the sessions feature in openai-agents-js project, I found that the internals of runner need to take various HITL patterns into consideration. There might not be necessary to make those changes in this Python SDK, but sufficient testing for the sessions scenarios is worth doing. |
Happy to contribute! I added a couple examples using SQLiteSession and OpenAIConversationsSession and made sure they work: OPENAI_API_KEY="your_api_key_here" uv run python examples/memory/memory_session_hitl_example.py
=== Memory Session + HITL Example ===
Session id: :memory:
Enter a message to chat with the agent. Submit an empty line to exit.
The agent will ask for approval before using tools.
You: What cities does the Bay Bridge connect?
Assistant: The Bay Bridge connects San Francisco and Oakland in California.
You: What's the weather in those cities?
Agent HITL Assistant wants to call 'get_weather' with {"location":"San Francisco, CA"}. Approve? (y/n): y
Approved tool call.
Agent HITL Assistant wants to call 'get_weather' with {"location":"Oakland, CA"}. Approve? (y/n): y
Approved tool call.
Assistant: San Francisco is currently foggy with a temperature of 58°F. Oakland is sunny with a temperature of 72°F.
You: OPENAI_API_KEY="your_api_key_here" uv run python examples/memory/openai_session_hitl_example.py
=== OpenAI Session + HITL Example ===
Enter a message to chat with the agent. Submit an empty line to exit.
The agent will ask for approval before using tools.
You: What cities does the Bay Bridge connect?
Assistant: The Bay Bridge, officially known as the San Francisco–Oakland Bay Bridge, connects the cities of **San Francisco** and **Oakland** in California.
You: What's the weather in those cities?
Agent HITL Assistant wants to call 'get_weather' with {"location":"San Francisco, CA"}. Approve? (y/n): y
Approved tool call.
Agent HITL Assistant wants to call 'get_weather' with {"location":"Oakland, CA"}. Approve? (y/n): y
Approved tool call.
Assistant: San Francisco is currently foggy and 58°F, while Oakland is sunny and 72°F.
You: I'm hoping that just about covers everything but lemme know if there are other areas I should make sure address. |
…essage duplication When resuming a streaming run from RunState, the server conversation tracker was not being primed with previously sent model responses. This caused `prepare_input` to treat all previously generated items as unsent and resubmit them to the server, breaking conversation threading. **Issue**: Missing `track_server_items` call in streaming resumption path **Fix**: Added server conversation tracker priming logic in `_start_streaming` method (lines 1076-1079) to match the non-streaming path implementation (lines 553-556). The fix iterates through `run_state._model_responses` and calls `track_server_items(response)` to mark them as already sent to the server. **Impact**: Resolves message duplication when resuming interrupted streaming runs, ensuring proper conversation threading with server-side sessions. Fixes code review feedback from PR openai#2021 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…essage duplication When resuming a streaming run from RunState, the server conversation tracker was not being primed with previously sent model responses. This caused `prepare_input` to treat all previously generated items as unsent and resubmit them to the server, breaking conversation threading. **Issue**: Missing `track_server_items` call in streaming resumption path **Fix**: Added server conversation tracker priming logic in `_start_streaming` method (lines 1076-1079) to match the non-streaming path implementation (lines 553-556). The fix iterates through `run_state._model_responses` and calls `track_server_items(response)` to mark them as already sent to the server. **Impact**: Resolves message duplication when resuming interrupted streaming runs, ensuring proper conversation threading with server-side sessions. Fixes code review feedback from PR openai#2021 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…essage duplication When resuming a streaming run from RunState, the server conversation tracker was not being primed with previously sent model responses. This caused `prepare_input` to treat all previously generated items as unsent and resubmit them to the server, breaking conversation threading. **Issue**: Missing `track_server_items` call in streaming resumption path **Fix**: Added server conversation tracker priming logic in `_start_streaming` method (lines 1076-1079) to match the non-streaming path implementation (lines 553-556). The fix iterates through `run_state._model_responses` and calls `track_server_items(response)` to mark them as already sent to the server. **Impact**: Resolves message duplication when resuming interrupted streaming runs, ensuring proper conversation threading with server-side sessions. Fixes code review feedback from PR openai#2021 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…essage duplication When resuming a streaming run from RunState, the server conversation tracker was not being primed with previously sent model responses. This caused `prepare_input` to treat all previously generated items as unsent and resubmit them to the server, breaking conversation threading. **Issue**: Missing `track_server_items` call in streaming resumption path **Fix**: Added server conversation tracker priming logic in `_start_streaming` method (lines 1076-1079) to match the non-streaming path implementation (lines 553-556). The fix iterates through `run_state._model_responses` and calls `track_server_items(response)` to mark them as already sent to the server. **Impact**: Resolves message duplication when resuming interrupted streaming runs, ensuring proper conversation threading with server-side sessions. Fixes code review feedback from PR openai#2021 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Thanks @seratch for adding it to the 0.6.x milestone! I'll make sure to address any feedback. |
|
Just a heads up. I noticed there are some discrepancies with the serialized RunState when interrupting in Python and resuming in TypeScript and visa versa. I'm addressing those this weekend and will push the update. |
…essage duplication When resuming a streaming run from RunState, the server conversation tracker was not being primed with previously sent model responses. This caused `prepare_input` to treat all previously generated items as unsent and resubmit them to the server, breaking conversation threading. **Issue**: Missing `track_server_items` call in streaming resumption path **Fix**: Added server conversation tracker priming logic in `_start_streaming` method (lines 1076-1079) to match the non-streaming path implementation (lines 553-556). The fix iterates through `run_state._model_responses` and calls `track_server_items(response)` to mark them as already sent to the server. **Impact**: Resolves message duplication when resuming interrupted streaming runs, ensuring proper conversation threading with server-side sessions. Fixes code review feedback from PR openai#2021 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…essage duplication When resuming a streaming run from RunState, the server conversation tracker was not being primed with previously sent model responses. This caused `prepare_input` to treat all previously generated items as unsent and resubmit them to the server, breaking conversation threading. **Issue**: Missing `track_server_items` call in streaming resumption path **Fix**: Added server conversation tracker priming logic in `_start_streaming` method (lines 1076-1079) to match the non-streaming path implementation (lines 553-556). The fix iterates through `run_state._model_responses` and calls `track_server_items(response)` to mark them as already sent to the server. **Impact**: Resolves message duplication when resuming interrupted streaming runs, ensuring proper conversation threading with server-side sessions. Fixes code review feedback from PR openai#2021 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…essage duplication When resuming a streaming run from RunState, the server conversation tracker was not being primed with previously sent model responses. This caused `prepare_input` to treat all previously generated items as unsent and resubmit them to the server, breaking conversation threading. **Issue**: Missing `track_server_items` call in streaming resumption path **Fix**: Added server conversation tracker priming logic in `_start_streaming` method (lines 1076-1079) to match the non-streaming path implementation (lines 553-556). The fix iterates through `run_state._model_responses` and calls `track_server_items(response)` to mark them as already sent to the server. **Impact**: Resolves message duplication when resuming interrupted streaming runs, ensuring proper conversation threading with server-side sessions. Fixes code review feedback from PR openai#2021 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
@codex Can you do reviews on the code changes in this PR with fresh eyes? Please focus on the potential issues for existing apps and edge case scenarios rather than general feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/agents/run.py
Outdated
|
|
||
| await AgentRunner._save_result_to_session(session, starting_input, []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid duplicating session history when resuming streaming runs
When Runner.run_streamed is resumed from a RunState, _start_streaming still calls _prepare_input_with_session and _save_result_to_session unconditionally (streamed_result.input is recomputed and the original prompt is saved again). In the non‑streaming path these calls are skipped when a RunState is supplied so that the session history isn’t duplicated. Re‑preparing and saving here means a resumed streaming run writes the same user message back into the session and the next turn will see duplicate user prompts, breaking conversations for apps that rely on Session or session_input_callback. Consider gating these calls with run_state is None as the synchronous path does.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Fixed in 466c306
…essage duplication When resuming a streaming run from RunState, the server conversation tracker was not being primed with previously sent model responses. This caused `prepare_input` to treat all previously generated items as unsent and resubmit them to the server, breaking conversation threading. **Issue**: Missing `track_server_items` call in streaming resumption path **Fix**: Added server conversation tracker priming logic in `_start_streaming` method (lines 1076-1079) to match the non-streaming path implementation (lines 553-556). The fix iterates through `run_state._model_responses` and calls `track_server_items(response)` to mark them as already sent to the server. **Impact**: Resolves message duplication when resuming interrupted streaming runs, ensuring proper conversation threading with server-side sessions. Fixes code review feedback from PR openai#2021 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
… to_input_item() for RunItem instances
…_input after resume
…or improved filtering of rehydrated items
…servation during resumption
…ll items already persisted
…tes during resume scenarios
…oval are executed before shell tools
…r function and shell tools
…oceeding with new items
…napproved tools once
…processed tools by filtering based on call_ids
…l deserialization
… duplicates during resume scenarios" This reverts commit 230225c398ec33b8b07cb6ad9fd8286f8288be01.
…g with all items already persisted" This reverts commit 5373de8e63197951c07f2af9a79a0382d9899eba.
… all items already persisted
I'm happy to contribute and I really appreciate you sending the edge cases my way. I've created tests that capture each failing edge case and the corresponding fixes. I'll spend some time this week to see if I can reduce the number of changes while maintaining the test cases. Less is more. =) |
|
@codex please take a fresh look at the full set of changes and do a thorough review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
src/agents/run_state.py
Outdated
| # Add additional fields based on item type | ||
| if hasattr(item, "output"): | ||
| result["output"] = str(item.output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserve tool outputs when serializing run state
RunState serialization coerces every item’s output to a string (result["output"] = str(item.output)), so any structured tool result (e.g., a dict/list from a function tool or shell output) is irreversibly stringified when saving to JSON. _deserialize_items later feeds this string back into ToolCallOutputItem, meaning resumed runs or resumed session persistence no longer see the original structured output. HITL flows that persist state to disk will thus lose typed tool outputs after resumption. Avoid stringifying the output so it round-trips with its original type.
Useful? React with 👍 / 👎.
… run state serialization
…tructured types and ensure compatibility
|
@codex review |
|
Codex Review: Didn't find any major issues. Keep it up! ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
|
@mjschock Thank you so much for taking the time to work on this. Since yesterday, I've been working on the code based on your branch here. There are still some things to polish, and there are additional patterns that need to be supported (e.g., local shell tool migration and perhaps some others). This enhancement is a fairly large change, and as a maintainer, I need to ensure the final quality. If you're okay with it, would it be all right if I take over your changes in a new pull request and continue collaborating with you on the remaining work? Of course, I'll credit your contribution using Co-authored-by, so your work will be fully preserved in the new PR. I'd really appreciate your understanding. |
@seratch yep, np |
Resolves #636.
See #636 (comment).