Skip to content

Conversation

@ThomasK33
Copy link
Member

Unifies bash execution and background-process management under the existing task_* tool family.

Key changes:

  • Extend task to accept kind: "bash" and route execution through existing bash tooling.
  • Teach task_await, task_list, and task_terminate to handle bash:<processId> task IDs with proper scope checks.
  • Filter the model-facing toolset to the canonical allowlist so legacy bash_* tools are no longer exposed.
  • Update agent presets/depth messaging to prevent sub-agent recursion while still allowing bash via task(kind="bash").
  • Update task tool UI rendering for bash variants + streaming output.

Validation:

  • make static-check

📋 Implementation Plan

Consolidate bash_* into the task_* abstraction (task = agent | bash)

Goal

Reduce tool surface overlap by making task_* the single abstraction for both:

  • Agent tasks (existing subagent workspaces)
  • Bash tasks (existing background processes + foreground bash execution)

Success criteria

  • Models no longer see/need bash, bash_output, bash_background_list, bash_background_terminate in the default toolset.
  • task, task_await, task_list, task_terminate cover the full bash workflow:
    • spawn (fg/bg)
    • incremental output
    • list
    • terminate
  • Existing behaviors preserved:
    • bash output limits + truncation handling
    • background process semantics (timeouts, filters, queued-message interruption)
    • task scoping rules (only manage descendant work)

Recommended approach (net product LoC ≈ +450)
Extend the task_* tools to support a new bash task kind, backed by BackgroundProcessManager, and stop exposing bash_* to models (align getToolsForModel() with the getAvailableTools() allowlist; keep legacy wrappers temporarily).


1) API shape (tool schemas)

1.1 task args: union of agent + bash

Update TaskToolArgsSchema in src/common/utils/tools/toolDefinitions.ts from a single strict object to a strict union:

  • Agent (unchanged):
    • { subagent_type, prompt, title, run_in_background? }
  • Bash (new):
    • { kind: "bash", script, timeout_secs, run_in_background?, display_name }

Notes:

  • Keep agent shape unchanged to avoid breaking existing tool calls.
  • Require kind: "bash" so parsing is unambiguous.

1.2 task results: keep current discriminant, add optional bash fields

To avoid changing the status-based discriminated unions:

  • Keep status: "queued"|"running"|"completed" exactly as-is.
  • Extend completed results with optional bash-oriented fields (safe additive change):
    • exitCode?: number
    • note?: string
    • truncated?: { reason: string; totalLines: number }

For bash tasks:

  • Foreground completion returns status:"completed" with reportMarkdown containing a formatted bash summary + output.
  • Background spawn returns status:"running" and a taskId.

1.3 task_await becomes the unified “await output/report”

Update TaskAwaitToolArgsSchema to add bash-output knobs (all optional):

  • filter?: string
  • filter_exclude?: boolean
  • timeout_secs?: number (allow 0 for non-blocking, aligning with bash_output)

Update TaskAwaitTool*ResultSchema to include optional streaming fields on active/completed results:

  • output?: string (incremental output when kind=bash)
  • elapsed_ms?: number
  • exitCode?: number
  • note?: string

Mapping:

  • Agent tasks: identical behavior (waits for agent_report).
  • Bash tasks: delegates to BackgroundProcessManager.getOutput(); may return status:"running" with output populated.

Timeout semantics for bash tasks (replacing bash_output):

  • task_await.timeout_secs is forwarded to BackgroundProcessManager.getOutput(...) and means “wait up to N seconds for new output (or process exit)”.
  • If the timeout elapses and the process is still running, return status:"running" and whatever incremental output was read since the previous call (output may be empty).
  • Output is the existing unified stream (stdout+stderr combined from output.log); incomplete final line fragments remain buffered until newline/exit.

1.4 task_list returns both kinds

Extend TaskListToolTaskSchema additively:

  • kind?: "agent"|"bash" (optional; omit for agent tasks to minimize churn)
  • display_name?: string (bash)
  • script?: string (bash)
  • uptime_ms?: number (bash)
  • exitCode?: number (bash)
  • Option A (minimal): map bash terminal statuses to status:"reported" and include exitCode.
  • Option B (more expressive): extend TaskListStatusSchema with exited|killed|failed and plumb through.

1.5 task_terminate supports both

Keep task_terminate input shape, but broaden semantics:

  • task_ids may include agent or bash tasks.

For results:

  • Keep existing terminatedTaskIds array.
  • For bash terminations, return terminatedTaskIds: [taskId].

2) Backend implementation (node/services/tools)

2.1 Add a tiny task-id router (defensive)

Create a helper (new file) e.g. src/node/services/tools/taskId.ts:

  • isBashTaskId(taskId: string): boolean
  • toBashTaskId(processId: string): string (e.g. bash:${processId})
  • fromBashTaskId(taskId: string): string | null

Defensive checks:

  • assert(taskId.trim().length > 0)
  • reject malformed bash: ids early with explicit errors.

2.2 Implement task(kind=bash) in src/node/services/tools/task.ts

Dispatch based on args shape:

  • Agent path: unchanged.
  • Bash path:
    • Validate required deps: config.runtime, config.backgroundProcessManager, config.workspaceId.
    • If run_in_background=true:
      • Call backgroundProcessManager.spawn(runtime, workspaceId, script, { cwd, env, displayName, timeoutSecs }).
      • Return status:"running", taskId: toBashTaskId(processId).
    • If run_in_background=false:
      • Reuse existing bash execution logic (see 2.3) to preserve output limits/truncation.
      • Convert BashToolResultTaskToolCompletedResult.

2.3 De-duplicate bash execution logic

Refactor src/node/services/tools/bash.ts so that foreground execution is callable from both tools:

  • Extract a pure helper like executeBashForeground(config, args, ctx): Promise<BashToolResult>.
  • createBashTool() remains as thin wrapper.
  • createTaskTool() uses the helper for bash foreground tasks.

This avoids re-implementing:

  • line/byte limits
  • tmpfile overflow policy
  • streaming-to-UI behavior

2.4 Extend task_await to support bash tasks

In src/node/services/tools/task_await.ts:

  • Determine candidate IDs:
    • If args.task_ids provided: use them.
    • Else: union of
      • taskService.listActiveDescendantAgentTaskIds(workspaceId)
      • backgroundProcessManager.list() filtered to descendant workspaces + status === "running"
  • For each id:
    • If agent id: existing waitForAgentReport() behavior.
    • If bash id:
      • scope check: process.workspaceId must be workspaceId or a descendant agent workspace
      • call backgroundProcessManager.getOutput(processId, filter, filter_exclude, timeout_secs, abortSignal, workspaceId)
      • return:
        • status:"running" with output when running
        • status:"completed" with reportMarkdown and exitCode when exited/killed/failed
        • map interruptedstatus:"running" with note:"interrupted" (or add a new status if you choose to extend the schema)

2.5 Extend task_list to merge bash processes

In src/node/services/tools/task_list.ts:

  • Get descendant agent tasks as today.
  • Compute the set of descendant workspace IDs (include current workspace).
  • backgroundProcessManager.list() and include any process whose workspaceId is in that set.
  • Convert each process into a TaskListToolTaskSchema entry.
    • Depth: workspaceDepth(process.workspaceId) + 1 (or 1 if owned by current workspace).

2.6 Extend task_terminate to terminate bash tasks

In src/node/services/tools/task_terminate.ts:

  • For each id:
    • agent id → existing terminateDescendantAgentTask()
    • bash id → scope check + backgroundProcessManager.terminate(processId)

Optional improvement:

  • If terminating an agent task, also terminate any bash processes owned by that terminated subtree (defensive cleanup). This should be safe since BackgroundProcessManager.cleanup(workspaceId) already exists, but doing it explicitly makes the tool semantics deterministic.

2.7 Tool wiring: ensure task_* are init-wait wrapped

Because task_* will now execute bash (runtime-dependent), update src/common/utils/tools/tools.ts#getToolsForModel:

  • Wrap task, task_await, task_list, task_terminate with wrapWithInitWait(...) (same as bash today).
    • This prevents calling into runtime.exec() / BackgroundProcessManager before workspace init.
  • Keep ask_user_question, propose_plan, todo_*, status_set non-runtime.

3) Tool exposure + deprecation

3.1 Remove bash_* from the actual model toolset (not just prompt)

Today, getAvailableTools() is used for system prompt + tool-instruction extraction, but the model still receives whatever getToolsForModel() returns (unless a toolPolicy denies it).

To truly eliminate overlap:

  1. Filter the tools passed to the model to the allowlist from getAvailableTools(modelString, mode, options) (plus MCP tools). Best place: src/common/utils/tools/tools.ts#getToolsForModel just before returning.
  2. Once (1) is in place, remove these from getAvailableTools() (src/common/utils/tools/toolDefinitions.ts):
    • bash
    • bash_output
    • bash_background_list
    • bash_background_terminate
  3. Optional cleanup: stop constructing legacy bash_* tools at all (or gate behind a temporary env/feature flag).

3.2 Keep legacy bash_* as wrappers (short-lived)

To reduce risk, keep the tools around during rollout:

  • Update their descriptions to say “Deprecated: use task with kind:"bash".”
  • Optionally re-implement bash_output/list/terminate by delegating to the new task_* internals.
Why keep wrappers temporarily?
  • Some internal code paths / older sessions may still refer to bash_* tool names.
  • PTC sandbox currently lists bash_* as bridgeable tools.
  • Keeping wrappers allows phased migration without a flag day.

4) Frontend/UI updates (tool message rendering)

Because the model will now call task_* for bash operations, update task-tool renderers to handle the new union types:

  • src/browser/components/tools/TaskToolCall.tsx

    • TaskToolCall: render bash variant (script, display_name, background vs foreground)
    • TaskAwaitResult: if output present, render it (even when status !== completed)
    • TaskListItem: display display_name/script when agentType/title absent
    • TaskStatusBadge: optionally style bash terminal states if you extend the enum
  • src/browser/stories/App.task.stories.tsx + src/browser/stories/mockFactory.ts

    • Add a scenario showing a bash task spawned in background, then task_await output streaming.

5) Tests + validation

5.1 Backend tool tests

Add/extend tests under src/node/services/tools/:

  • task.bash.test.ts (new):
    • spawn bash task in background
    • task_await returns incremental output + running
    • task_terminate kills it
  • task_list includes bash processes for:
    • current workspace
    • descendant workspace (spawned by a child agent workspace)
  • scope validation:
    • parent can manage descendant’s bash tasks
    • unrelated workspace cannot

5.2 UI rendering tests (lightweight)

If there are existing component tests, add minimal assertions for bash-variant rendering. Otherwise rely on storybook + typecheck.

5.3 Manual validation checklist

  • make typecheck
  • Run targeted unit tests:
    • bun test src/node/services/tools/task_await.test.ts
    • bun test src/node/services/tools/task_list.test.ts (or add if missing)
    • bun test src/node/services/tools/bash_output.test.ts (ensure wrappers still work if kept)

6) Rollout plan

  1. Land schema + backend changes (task supports bash, task_await/list/terminate support bash IDs).
  2. Update UI tool renderers to avoid regressions.
  3. Remove bash_* from getAvailableTools() so models stop seeing them.
  4. After a release or two, delete legacy wrappers and remove bash_* from PTC bridgeables if desired.

Generated with mux • Model: openai:gpt-5.2 • Thinking: xhigh

Change-Id: Ic4a03eb4953916443fcc074412498e0853c0ff17
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: If25e133840a3d7cd29c32a0970cabd432a0e2b2d
Signed-off-by: Thomas Kosiewski <tk@coder.com>
@ThomasK33 ThomasK33 force-pushed the task-bash-abstraction branch from 7860d42 to 986955f Compare December 23, 2025 19:33
Change-Id: I93d9236d1160456b3f4f3c05815b70f84e812012
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: Ifa45aefee1298bd27137a4d1ffc6bf95517689b8
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: I002ab1584a1c9e73ebdd06a126d5bf01acdd9756
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: I1a869f9aea822126fdf7098161d5f5198fe12ee2
Signed-off-by: Thomas Kosiewski <tk@coder.com>
@ThomasK33
Copy link
Member Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

if (typeof timeoutSecs !== "number" || !Number.isFinite(timeoutSecs)) return undefined;
const timeoutMs = Math.floor(timeoutSecs * 1000);
if (timeoutMs <= 0) return undefined;
return timeoutMs;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor timeout_secs=0 for agent task_await

The updated schema allows timeout_secs to be 0 (non‑blocking) for task_await, but coerceTimeoutMs still treats <= 0 as “unset,” which falls back to the default 10‑minute wait in waitForAgentReport. If a caller uses timeout_secs: 0 to avoid blocking (especially when mixing bash + agent task IDs), agent tasks will still block for up to 10 minutes instead of returning immediately. Consider treating 0 as a real timeout (immediate return) or short‑circuiting agent tasks when timeout_secs === 0 to match the newly‑allowed semantics.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant