Skip to content

Feature request: detect when spot instances are pre-empted and re-submit #100

@aksarkar

Description

@aksarkar

We typically run Batch workflows on AWS spot instances to take advantage of cost savings when possible.

However, when some redun task is interrupted due to its host being terminated, the scheduler halts leading to potentially a lot of lost work.

It would be helpful for redun to detect this case and re-submit the task without halting, up to the configured maximum number of re-submits.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions