Feature 1540 throttles for ci
Created by: TerrenceMcGuinness-NOAA
Description
As an systems administrator managing the CI Bash Scripts on an HPC resource I need a feature so that we can throttle the number of PRs and Cases so that we do not flood the system.
This includes the ability to kill any jobs that are concurrently running in the event that any case for the PR fails.
Requirements
The ability to limit the number of PR and Cases can run at any one given time must be configurable
Acceptance Criteria (Definition of Done)
The CI frame work must demonstrate in real time that when multiple PRs and cases are specified only the limited number that are allowed for each are running at any given time and when forcing a fail of a case that all currently running cases for that PR are terminated from the scheduler.
Type of change
Please delete options that are not relevant.
-
Bug fix (non-breaking change which fixes an issue) -
New feature (non-breaking change which adds functionality) -
Breaking change (fix or feature that would cause existing functionality to not work as expected) -
This change requires a documentation update
How Has This Been Tested?
-
Clone and Build tests Hera and/or Orion -
Cycled test on Orion and/or Orion with multiple PRs and cases running at once -
Slurm jobs get killed on failure and only an upper bound of PRs continue to run
Checklist
-
My code follows the style guidelines of this project -
I have performed a self-review of my own code - [] I have commented my code, particularly in hard-to-understand areas
-
My changes need updates to the documentation. I have made corresponding changes to the documentation -
My changes generate no new warnings -
New and existing tests pass with my changes -
Any dependent changes have been merged and published
This closes Issue$ #1540