lsst-ctrl-bps-htcondor 29.0.1__tar.gz → 29.1.0rc1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {lsst_ctrl_bps_htcondor-29.0.1/python/lsst_ctrl_bps_htcondor.egg-info → lsst_ctrl_bps_htcondor-29.1.0rc1}/PKG-INFO +1 -1
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/doc/lsst.ctrl.bps.htcondor/userguide.rst +56 -2
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst/ctrl/bps/htcondor/htcondor_service.py +438 -209
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst/ctrl/bps/htcondor/lssthtc.py +864 -261
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst/ctrl/bps/htcondor/version.py +1 -1
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1/python/lsst_ctrl_bps_htcondor.egg-info}/PKG-INFO +1 -1
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/tests/test_htcondor_service.py +401 -133
- lsst_ctrl_bps_htcondor-29.1.0rc1/tests/test_lssthtc.py +1231 -0
- lsst_ctrl_bps_htcondor-29.0.1/tests/test_lssthtc.py +0 -320
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/COPYRIGHT +0 -0
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/LICENSE +0 -0
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/MANIFEST.in +0 -0
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/README.rst +0 -0
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/bsd_license.txt +0 -0
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/doc/lsst.ctrl.bps.htcondor/CHANGES.rst +0 -0
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/doc/lsst.ctrl.bps.htcondor/index.rst +0 -0
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/gpl-v3.0.txt +0 -0
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/pyproject.toml +0 -0
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst/ctrl/bps/htcondor/__init__.py +0 -0
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst/ctrl/bps/htcondor/etc/__init__.py +0 -0
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst/ctrl/bps/htcondor/etc/htcondor_defaults.yaml +0 -0
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst/ctrl/bps/htcondor/final_post.sh +0 -0
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst/ctrl/bps/htcondor/handlers.py +0 -0
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst/ctrl/bps/htcondor/htcondor_config.py +0 -0
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst/ctrl/bps/htcondor/provisioner.py +0 -0
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst_ctrl_bps_htcondor.egg-info/SOURCES.txt +0 -0
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst_ctrl_bps_htcondor.egg-info/dependency_links.txt +0 -0
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst_ctrl_bps_htcondor.egg-info/requires.txt +0 -0
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst_ctrl_bps_htcondor.egg-info/top_level.txt +0 -0
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst_ctrl_bps_htcondor.egg-info/zip-safe +0 -0
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/setup.cfg +0 -0
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/tests/test_handlers.py +0 -0
- {lsst_ctrl_bps_htcondor-29.0.1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/tests/test_provisioner.py +0 -0
|
@@ -64,6 +64,27 @@ The plugin supports all settings described in `ctrl_bps documentation`__
|
|
|
64
64
|
|
|
65
65
|
.. Describe any plugin specific aspects of defining a submission below if any.
|
|
66
66
|
|
|
67
|
+
Job Ordering
|
|
68
|
+
^^^^^^^^^^^^
|
|
69
|
+
|
|
70
|
+
This plugin supports both ordering types of ``group`` and ``noop``.
|
|
71
|
+
Job outputs are still underneath the ``jobs`` subdirectory.
|
|
72
|
+
|
|
73
|
+
If one is looking at HTCondor information directly:
|
|
74
|
+
|
|
75
|
+
* ``group`` ordering is implemented as subdags so you will see more dagman
|
|
76
|
+
jobs in the queue as well as a new ``subdags`` subdirectory for the
|
|
77
|
+
internal files for running a group. To enable running other subdags after
|
|
78
|
+
a failure but pruning downstream jobs, another job, name starting with
|
|
79
|
+
``wms_check_status``, runs after the subdag to check for a failure and trigger
|
|
80
|
+
the pruning.
|
|
81
|
+
|
|
82
|
+
* ``noop`` ordering is directly implemented as DAGMan NOOP jobs. These jobs
|
|
83
|
+
do not actually do anything, but provide a mechanism for telling HTCondor
|
|
84
|
+
about more job dependencies without using a large number (all-to-all) of
|
|
85
|
+
dependencies.
|
|
86
|
+
|
|
87
|
+
|
|
67
88
|
Job Environment
|
|
68
89
|
^^^^^^^^^^^^^^^
|
|
69
90
|
|
|
@@ -455,12 +476,42 @@ for a specific reason, here, the memory usage exceeded memory limits
|
|
|
455
476
|
.. note::
|
|
456
477
|
|
|
457
478
|
By default, BPS will automatically retry jobs that failed due to the out of
|
|
458
|
-
memory error (see `Automatic memory scaling`
|
|
479
|
+
memory error (see `Automatic memory scaling`_ section in **ctrl_bps**
|
|
459
480
|
documentation for more information regarding this topic) and the issues
|
|
460
481
|
illustrated by the above examples should only occur if automatic memory
|
|
461
482
|
scalling was explicitly disabled in the submit YAML file.
|
|
462
483
|
|
|
463
|
-
|
|
484
|
+
|
|
485
|
+
Automatic Releasing of Held Jobs
|
|
486
|
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
487
|
+
|
|
488
|
+
Many times releasing the jobs to just try again is successful because the
|
|
489
|
+
system issues are transient.
|
|
490
|
+
|
|
491
|
+
``releaseExpr`` can be set in the submit yaml to add automatic release
|
|
492
|
+
conditions. Like other BPS config values, this can be set globally or
|
|
493
|
+
set for a specific cluster or pipetask. The number of retries is still
|
|
494
|
+
limited by the ``numberOfRetries``. All held jobs count towards this
|
|
495
|
+
limit no matter what the reason. The plugin prohibits the automatic
|
|
496
|
+
release of jobs held by user.
|
|
497
|
+
|
|
498
|
+
Example expressions:
|
|
499
|
+
|
|
500
|
+
* ``releaseExpr: "True"`` - will always release held job unless held by user.
|
|
501
|
+
* ``releaseExpr: "HoldReasonCode =?= 7"`` - release jobs where the standard
|
|
502
|
+
output file for the job could not be opened.
|
|
503
|
+
|
|
504
|
+
For more information about expressions, see HTCondor documentation:
|
|
505
|
+
|
|
506
|
+
* HTCondor `ClassAd expressions`_
|
|
507
|
+
* list of `HoldReasonCodes`_
|
|
508
|
+
|
|
509
|
+
.. warning::
|
|
510
|
+
|
|
511
|
+
System problems should still be tracked and reported. All of the
|
|
512
|
+
hold reasons for a single completed run can be found via ``grep -A
|
|
513
|
+
2 held <submit dir>/*.nodes.log``.
|
|
514
|
+
|
|
464
515
|
|
|
465
516
|
.. _htc-plugin-troubleshooting:
|
|
466
517
|
|
|
@@ -514,3 +565,6 @@ complete your run.
|
|
|
514
565
|
.. _condor_release: https://htcondor.readthedocs.io/en/latest/man-pages/condor_release.html
|
|
515
566
|
.. _condor_rm: https://htcondor.readthedocs.io/en/latest/man-pages/condor_rm.html
|
|
516
567
|
.. _lsst_distrib: https://github.com/lsst/lsst_distrib.git
|
|
568
|
+
.. _Automatic memory scaling: https://pipelines.lsst.io/v/weekly/modules/lsst.ctrl.bps/quickstart.html#automatic-memory-scaling
|
|
569
|
+
.. _HoldReasonCodes: https://htcondor.readthedocs.io/en/latest/classad-attributes/job-classad-attributes.html#HoldReasonCode
|
|
570
|
+
.. _ClassAd expressions: https://htcondor.readthedocs.io/en/latest/classads/classad-mechanism.html#classad-evaluation-semantics
|