lsst-ctrl-bps-htcondor 29.0.1rc1__tar.gz → 29.1.0rc1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (33) hide show
  1. {lsst_ctrl_bps_htcondor-29.0.1rc1/python/lsst_ctrl_bps_htcondor.egg-info → lsst_ctrl_bps_htcondor-29.1.0rc1}/PKG-INFO +1 -1
  2. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/doc/lsst.ctrl.bps.htcondor/userguide.rst +56 -2
  3. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst/ctrl/bps/htcondor/htcondor_service.py +438 -209
  4. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst/ctrl/bps/htcondor/lssthtc.py +864 -261
  5. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst/ctrl/bps/htcondor/version.py +1 -1
  6. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1/python/lsst_ctrl_bps_htcondor.egg-info}/PKG-INFO +1 -1
  7. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/tests/test_htcondor_service.py +401 -133
  8. lsst_ctrl_bps_htcondor-29.1.0rc1/tests/test_lssthtc.py +1231 -0
  9. lsst_ctrl_bps_htcondor-29.0.1rc1/tests/test_lssthtc.py +0 -320
  10. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/COPYRIGHT +0 -0
  11. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/LICENSE +0 -0
  12. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/MANIFEST.in +0 -0
  13. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/README.rst +0 -0
  14. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/bsd_license.txt +0 -0
  15. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/doc/lsst.ctrl.bps.htcondor/CHANGES.rst +0 -0
  16. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/doc/lsst.ctrl.bps.htcondor/index.rst +0 -0
  17. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/gpl-v3.0.txt +0 -0
  18. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/pyproject.toml +0 -0
  19. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst/ctrl/bps/htcondor/__init__.py +0 -0
  20. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst/ctrl/bps/htcondor/etc/__init__.py +0 -0
  21. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst/ctrl/bps/htcondor/etc/htcondor_defaults.yaml +0 -0
  22. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst/ctrl/bps/htcondor/final_post.sh +0 -0
  23. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst/ctrl/bps/htcondor/handlers.py +0 -0
  24. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst/ctrl/bps/htcondor/htcondor_config.py +0 -0
  25. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst/ctrl/bps/htcondor/provisioner.py +0 -0
  26. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst_ctrl_bps_htcondor.egg-info/SOURCES.txt +0 -0
  27. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst_ctrl_bps_htcondor.egg-info/dependency_links.txt +0 -0
  28. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst_ctrl_bps_htcondor.egg-info/requires.txt +0 -0
  29. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst_ctrl_bps_htcondor.egg-info/top_level.txt +0 -0
  30. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/python/lsst_ctrl_bps_htcondor.egg-info/zip-safe +0 -0
  31. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/setup.cfg +0 -0
  32. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/tests/test_handlers.py +0 -0
  33. {lsst_ctrl_bps_htcondor-29.0.1rc1 → lsst_ctrl_bps_htcondor-29.1.0rc1}/tests/test_provisioner.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: lsst-ctrl-bps-htcondor
3
- Version: 29.0.1rc1
3
+ Version: 29.1.0rc1
4
4
  Summary: HTCondor plugin for lsst-ctrl-bps.
5
5
  Author-email: Rubin Observatory Data Management <dm-admin@lists.lsst.org>
6
6
  License: BSD 3-Clause License
@@ -64,6 +64,27 @@ The plugin supports all settings described in `ctrl_bps documentation`__
64
64
 
65
65
  .. Describe any plugin specific aspects of defining a submission below if any.
66
66
 
67
+ Job Ordering
68
+ ^^^^^^^^^^^^
69
+
70
+ This plugin supports both ordering types of ``group`` and ``noop``.
71
+ Job outputs are still underneath the ``jobs`` subdirectory.
72
+
73
+ If one is looking at HTCondor information directly:
74
+
75
+ * ``group`` ordering is implemented as subdags so you will see more dagman
76
+ jobs in the queue as well as a new ``subdags`` subdirectory for the
77
+ internal files for running a group. To enable running other subdags after
78
+ a failure but pruning downstream jobs, another job, name starting with
79
+ ``wms_check_status``, runs after the subdag to check for a failure and trigger
80
+ the pruning.
81
+
82
+ * ``noop`` ordering is directly implemented as DAGMan NOOP jobs. These jobs
83
+ do not actually do anything, but provide a mechanism for telling HTCondor
84
+ about more job dependencies without using a large number (all-to-all) of
85
+ dependencies.
86
+
87
+
67
88
  Job Environment
68
89
  ^^^^^^^^^^^^^^^
69
90
 
@@ -455,12 +476,42 @@ for a specific reason, here, the memory usage exceeded memory limits
455
476
  .. note::
456
477
 
457
478
  By default, BPS will automatically retry jobs that failed due to the out of
458
- memory error (see `Automatic memory scaling`__ section in **ctrl_bps**
479
+ memory error (see `Automatic memory scaling`_ section in **ctrl_bps**
459
480
  documentation for more information regarding this topic) and the issues
460
481
  illustrated by the above examples should only occur if automatic memory
461
482
  scalling was explicitly disabled in the submit YAML file.
462
483
 
463
- .. __: https://pipelines.lsst.io/v/weekly/modules/lsst.ctrl.bps/quickstart.html#automatic-memory-scaling
484
+
485
+ Automatic Releasing of Held Jobs
486
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
487
+
488
+ Many times releasing the jobs to just try again is successful because the
489
+ system issues are transient.
490
+
491
+ ``releaseExpr`` can be set in the submit yaml to add automatic release
492
+ conditions. Like other BPS config values, this can be set globally or
493
+ set for a specific cluster or pipetask. The number of retries is still
494
+ limited by the ``numberOfRetries``. All held jobs count towards this
495
+ limit no matter what the reason. The plugin prohibits the automatic
496
+ release of jobs held by user.
497
+
498
+ Example expressions:
499
+
500
+ * ``releaseExpr: "True"`` - will always release held job unless held by user.
501
+ * ``releaseExpr: "HoldReasonCode =?= 7"`` - release jobs where the standard
502
+ output file for the job could not be opened.
503
+
504
+ For more information about expressions, see HTCondor documentation:
505
+
506
+ * HTCondor `ClassAd expressions`_
507
+ * list of `HoldReasonCodes`_
508
+
509
+ .. warning::
510
+
511
+ System problems should still be tracked and reported. All of the
512
+ hold reasons for a single completed run can be found via ``grep -A
513
+ 2 held <submit dir>/*.nodes.log``.
514
+
464
515
 
465
516
  .. _htc-plugin-troubleshooting:
466
517
 
@@ -514,3 +565,6 @@ complete your run.
514
565
  .. _condor_release: https://htcondor.readthedocs.io/en/latest/man-pages/condor_release.html
515
566
  .. _condor_rm: https://htcondor.readthedocs.io/en/latest/man-pages/condor_rm.html
516
567
  .. _lsst_distrib: https://github.com/lsst/lsst_distrib.git
568
+ .. _Automatic memory scaling: https://pipelines.lsst.io/v/weekly/modules/lsst.ctrl.bps/quickstart.html#automatic-memory-scaling
569
+ .. _HoldReasonCodes: https://htcondor.readthedocs.io/en/latest/classad-attributes/job-classad-attributes.html#HoldReasonCode
570
+ .. _ClassAd expressions: https://htcondor.readthedocs.io/en/latest/classads/classad-mechanism.html#classad-evaluation-semantics