Skip to content

Conversation

@MichaelMarav
Copy link
Collaborator

Per contact frame SAC training framework.
Key features:

  • Random initialization point on reset
  • Training does not stop when cf loses contact with ground. It returns 0 reward and the last observation until the foot is in contact again
  • Training stops and reset() is called when filter is diverged (error greater than a threshold
  • Incorporated prestepdqn in the time variable I pass from step to reset and vice versa
  • Added serow convergence in reset so the agent begins the tuning after EKF has converged (using .filter until it converges)
  • Eval function that stops training after reward hits a certain hyperparam threshold (not sure if this is good)
  • Reward computes Mahalanobis distance on SE(3) between estimated and GT pose or normalized SE(3) geodesic metric

GL

@MichaelMarav MichaelMarav requested a review from mrsp November 25, 2025 14:40
@MichaelMarav MichaelMarav self-assigned this Nov 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants