-
Notifications
You must be signed in to change notification settings - Fork 330
Description
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Tell us about your request
Please add GPU monitoring metrics for ECS Managed Instances to help track GPU utilization, memory usage, and other GPU-related performance indicators.
Which service(s) is this request for?
Amazon ECS (Elastic Container Service) - specifically for ECS Managed Instances
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
When running GPU-enabled workloads on ECS Managed Instances, there is currently no native way to monitor GPU metrics such as GPU utilization, GPU memory usage, temperature, and other performance indicators. This makes it difficult to:
- Optimize GPU resource allocation and utilization
- Troubleshoot performance issues with GPU workloads
- Make informed scaling decisions based on actual GPU usage
- Monitor costs effectively for GPU-enabled instances
Are you currently working around this issue?
Currently exploring alternative monitoring approaches, but would prefer native support through the existing ECS metrics framework as documented in the additional metrics documentation.
Additional context
N/A
Attachments
N/A