Skip to content

[ECS] [request]: Add GPU metrics support for ECS Managed Instances #2734

@michal-kosinski

Description

@michal-kosinski

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request
Please add GPU monitoring metrics for ECS Managed Instances to help track GPU utilization, memory usage, and other GPU-related performance indicators.

Which service(s) is this request for?
Amazon ECS (Elastic Container Service) - specifically for ECS Managed Instances

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
When running GPU-enabled workloads on ECS Managed Instances, there is currently no native way to monitor GPU metrics such as GPU utilization, GPU memory usage, temperature, and other performance indicators. This makes it difficult to:

  • Optimize GPU resource allocation and utilization
  • Troubleshoot performance issues with GPU workloads
  • Make informed scaling decisions based on actual GPU usage
  • Monitor costs effectively for GPU-enabled instances

Are you currently working around this issue?
Currently exploring alternative monitoring approaches, but would prefer native support through the existing ECS metrics framework as documented in the additional metrics documentation.

Additional context
N/A

Attachments
N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    ProposedCommunity submitted issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions