lsst-ctrl-bps-htcondor 29.2025.2200__tar.gz → 29.2025.2400__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {lsst_ctrl_bps_htcondor-29.2025.2200/python/lsst_ctrl_bps_htcondor.egg-info → lsst_ctrl_bps_htcondor-29.2025.2400}/PKG-INFO +1 -1
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/doc/lsst.ctrl.bps.htcondor/userguide.rst +52 -10
- lsst_ctrl_bps_htcondor-29.2025.2400/python/lsst/ctrl/bps/htcondor/version.py +2 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400/python/lsst_ctrl_bps_htcondor.egg-info}/PKG-INFO +1 -1
- lsst_ctrl_bps_htcondor-29.2025.2200/python/lsst/ctrl/bps/htcondor/version.py +0 -2
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/COPYRIGHT +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/LICENSE +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/MANIFEST.in +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/README.rst +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/bsd_license.txt +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/doc/lsst.ctrl.bps.htcondor/CHANGES.rst +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/doc/lsst.ctrl.bps.htcondor/index.rst +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/gpl-v3.0.txt +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/pyproject.toml +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/python/lsst/ctrl/bps/htcondor/__init__.py +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/python/lsst/ctrl/bps/htcondor/etc/__init__.py +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/python/lsst/ctrl/bps/htcondor/etc/htcondor_defaults.yaml +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/python/lsst/ctrl/bps/htcondor/final_post.sh +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/python/lsst/ctrl/bps/htcondor/handlers.py +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/python/lsst/ctrl/bps/htcondor/htcondor_config.py +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/python/lsst/ctrl/bps/htcondor/htcondor_service.py +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/python/lsst/ctrl/bps/htcondor/lssthtc.py +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/python/lsst/ctrl/bps/htcondor/provisioner.py +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/python/lsst_ctrl_bps_htcondor.egg-info/SOURCES.txt +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/python/lsst_ctrl_bps_htcondor.egg-info/dependency_links.txt +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/python/lsst_ctrl_bps_htcondor.egg-info/requires.txt +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/python/lsst_ctrl_bps_htcondor.egg-info/top_level.txt +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/python/lsst_ctrl_bps_htcondor.egg-info/zip-safe +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/setup.cfg +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/tests/test_handlers.py +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/tests/test_htcondor_service.py +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/tests/test_lssthtc.py +0 -0
- {lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/tests/test_provisioner.py +0 -0
|
@@ -181,7 +181,7 @@ DAG, this status can lag behind by a few minutes. Also, DAGMan tracks
|
|
|
181
181
|
deletion of individual jobs as failures (no separate counts for
|
|
182
182
|
deleted jobs). So the summary report flag column will show ``F`` when
|
|
183
183
|
there are either failed or deleted jobs. If getting a detailed report
|
|
184
|
-
(``bps report --id <
|
|
184
|
+
(``bps report --id <ID>``), the plugin reads detailed job information
|
|
185
185
|
from files. So, the detailed report can distinguish between failed and
|
|
186
186
|
deleted jobs, and thus will show ``D`` in the flag column for a running
|
|
187
187
|
workflow if there is a deleted job.
|
|
@@ -202,7 +202,7 @@ jobs are being held, use
|
|
|
202
202
|
|
|
203
203
|
.. code-block:: bash
|
|
204
204
|
|
|
205
|
-
condor_q -hold <
|
|
205
|
+
condor_q -hold <ID> # to see a specific job being held
|
|
206
206
|
condor-q -hold <user> # to see all held jobs owned by the user
|
|
207
207
|
|
|
208
208
|
.. _htc-plugin-cancel:
|
|
@@ -231,18 +231,18 @@ See `bps restart`_.
|
|
|
231
231
|
.. Describe any plugin specific aspects of restarting failed jobs below
|
|
232
232
|
if any.
|
|
233
233
|
|
|
234
|
-
A valid run
|
|
234
|
+
A valid run ID is one of the following:
|
|
235
235
|
|
|
236
|
-
* job
|
|
236
|
+
* job ID, e.g., ``1234.0`` (using just the cluster ID, ``1234``, will also
|
|
237
237
|
work),
|
|
238
|
-
* global job
|
|
238
|
+
* global job ID (e.g.,
|
|
239
239
|
``sdfrome002.sdf.slac.stanford.edu#165725.0#1699393748``),
|
|
240
240
|
* run's submit directory (e.g.,
|
|
241
241
|
``/sdf/home/m/mxk/lsst/bps/submit/u/mxk/pipelines_check/20230713T135346Z``).
|
|
242
242
|
|
|
243
243
|
.. note::
|
|
244
244
|
|
|
245
|
-
If you don't remember any of the run's
|
|
245
|
+
If you don't remember any of the run's ID you may try running
|
|
246
246
|
|
|
247
247
|
.. code::
|
|
248
248
|
|
|
@@ -299,7 +299,7 @@ alongside the other payload jobs in the workflow that should automatically
|
|
|
299
299
|
create and maintain glideins required for the payload jobs to run.
|
|
300
300
|
|
|
301
301
|
If you enable automatic provisioning of resources, you will see the status of
|
|
302
|
-
the provisioning job in the output of the ``bps report --id <
|
|
302
|
+
the provisioning job in the output of the ``bps report --id <ID>`` command.
|
|
303
303
|
Look for the line starting with "Provisioning job status". For example
|
|
304
304
|
|
|
305
305
|
.. code-block:: bash
|
|
@@ -446,7 +446,7 @@ If any of your jobs are being held, it will display something similar to::
|
|
|
446
446
|
|
|
447
447
|
The job that is in the hold state can be released from it with
|
|
448
448
|
`condor_release`_ providing the issue that made HTCondor put it in this state
|
|
449
|
-
has been resolved. For example, if your job with
|
|
449
|
+
has been resolved. For example, if your job with ID 1234.0 was placed in the
|
|
450
450
|
hold state because during the execution it exceeded 2048 MiB you requested for
|
|
451
451
|
it during the submission, you can double the amount of memory it should request with
|
|
452
452
|
|
|
@@ -538,7 +538,49 @@ Troubleshooting
|
|
|
538
538
|
Where is stdout/stderr from pipeline tasks?
|
|
539
539
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
540
540
|
|
|
541
|
-
For now, stdout/stderr can be found in files in the run submit directory
|
|
541
|
+
For now, stdout/stderr can be found in files in the run submit directory
|
|
542
|
+
after the job is done. Python logging goes to stderr so the majority
|
|
543
|
+
of the pipetask output will be in the \*.err file. One exception is
|
|
544
|
+
``finalJob`` which does print some information to stdout (\*.out file)
|
|
545
|
+
|
|
546
|
+
While the job is running, the owner of the job can use ``condor_tail``
|
|
547
|
+
command to peek at the stdout/stderr of a job. ``bps`` uses the ID for
|
|
548
|
+
the entire workflow. But for the HTCondor command ``condor_tail``
|
|
549
|
+
you will need the ID for the individual job. Run the following command
|
|
550
|
+
and look for the ID for the job (undefined's are normal and normally
|
|
551
|
+
correspond to the DAGMan jobs).
|
|
552
|
+
|
|
553
|
+
.. code-block::
|
|
554
|
+
|
|
555
|
+
condor_q -run -nobatch -af:hj bps_job_name bps_run
|
|
556
|
+
|
|
557
|
+
Once you have the HTCondor ID for the particular job you want to peek
|
|
558
|
+
at the output, run this command:
|
|
559
|
+
|
|
560
|
+
.. code-block::
|
|
561
|
+
|
|
562
|
+
condor_tail -stderr -f <ID>
|
|
563
|
+
|
|
564
|
+
If you want to instead see the stdout, leave off the ``-stderr``.
|
|
565
|
+
If you need to see more of the contents specify ``-maxbytes <numbytes>``
|
|
566
|
+
(defaults to 1024 bytes).
|
|
567
|
+
|
|
568
|
+
I need to look around on the compute node where my job is running.
|
|
569
|
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
570
|
+
|
|
571
|
+
If using glideins, you might be able to just ``ssh`` to the compute
|
|
572
|
+
node from the submit node. First, need to find out on which node the
|
|
573
|
+
job is running.
|
|
574
|
+
|
|
575
|
+
.. code-block::
|
|
576
|
+
|
|
577
|
+
condor_q -run -nobatch -af:hj RemoteHost bps_job_name bps_run
|
|
578
|
+
|
|
579
|
+
Alternatively, HTCondor has the command ``condor_ssh_to_job`` where you
|
|
580
|
+
just need the job ID. This is not the workflow ID (the ID that ``bps``
|
|
581
|
+
commands use), but an individual job ID. The command above also prints
|
|
582
|
+
the job IDs.
|
|
583
|
+
|
|
542
584
|
|
|
543
585
|
Why did my submission fail?
|
|
544
586
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
@@ -556,7 +598,7 @@ will continue normally until the existing gliedins expire. As a result,
|
|
|
556
598
|
payload jobs may get stuck in the job queue if the glideins were not created
|
|
557
599
|
or expired before the execution of the workflow could be completed.
|
|
558
600
|
|
|
559
|
-
Firstly, use ``bps report --id <run
|
|
601
|
+
Firstly, use ``bps report --id <run ID>`` to display the run report and look
|
|
560
602
|
for the line
|
|
561
603
|
|
|
562
604
|
.. code-block::
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
{lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/tests/test_handlers.py
RENAMED
|
File without changes
|
|
File without changes
|
{lsst_ctrl_bps_htcondor-29.2025.2200 → lsst_ctrl_bps_htcondor-29.2025.2400}/tests/test_lssthtc.py
RENAMED
|
File without changes
|
|
File without changes
|