Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
109 commits
Select commit Hold shift + click to select a range
30f54ca
initial version of dynamic file list classes
astro-friedel May 13, 2024
69d8f02
integrated dynamic file into output file handling
astro-friedel May 21, 2024
882e3ba
data flow kernel changes to accommodate dynamic file lists
astro-friedel Jun 7, 2024
ce369aa
Merge remote-tracking branch 'upstream/master' into fixing_dynamic_fi…
astro-friedel Jun 7, 2024
7138adc
Auto stash before checking out "HEAD"
astro-friedel Jun 7, 2024
5bff70f
creation of file tale in the monitoring
astro-friedel Jun 7, 2024
6025691
added initial file provenance data in database
astro-friedel Jun 14, 2024
efc3b14
fixed error where uuid's were not strings
astro-friedel Jun 17, 2024
222166a
fixed typos in names
astro-friedel Jun 17, 2024
92597f6
initial working version
astro-friedel Jun 18, 2024
8b922d9
Merge branch 'fixing_dynamic_file_inputs_and_outputs' into trackingFi…
astro-friedel Jun 27, 2024
632890b
added flask-wtf to monitoring requirements for form processing
astro-friedel Jun 27, 2024
17e5c43
added file size and md5sum tracking for files
astro-friedel Jun 27, 2024
d8df5fe
fixed issue with clean_copy in dynamic files
astro-friedel Jun 27, 2024
b16cad6
added initial provenance interface to flask pages
astro-friedel Jun 27, 2024
0275b28
indentation fix
astro-friedel Jul 1, 2024
3a1238b
fixed database code for provenance tracking
astro-friedel Jul 1, 2024
bb013fe
added environment tracking to monitoring
astro-friedel Jul 9, 2024
bc8247a
Merge remote-tracking branch 'upstream/master' into trackingFileProve…
astro-friedel Jul 31, 2024
45af5f9
added file provenance tracking as an option to monitoring framework
astro-friedel Jul 31, 2024
cd99828
better reporting on environment
astro-friedel Jul 31, 2024
558d170
ensure that files are tagged with the task id that generated them, no…
astro-friedel Jul 31, 2024
05caec8
get the task reporting the environment correctly
astro-friedel Jul 31, 2024
8f212ba
only provide file link if files were actually used in the workflow
astro-friedel Jul 31, 2024
3ade95a
only provide file link if there were files
astro-friedel Jul 31, 2024
7501cc3
properly report environment with file details
astro-friedel Jul 31, 2024
66238e5
properly format and report files
astro-friedel Jul 31, 2024
00ffa6f
make header responsive to url
astro-friedel Jul 31, 2024
da73f91
fix bug in file size reporting
astro-friedel Jul 31, 2024
76b8008
documentation on file provenance
astro-friedel Jul 31, 2024
93b17b0
fix bug in format
astro-friedel Jul 31, 2024
1e004a6
get the correct timestamp for the file
astro-friedel Sep 17, 2024
8dde82c
remove unneeded prints
astro-friedel Sep 17, 2024
cb550ee
auto determine file size, md5sum, timestamp if possible
astro-friedel Sep 17, 2024
5ebd009
refactor variable
astro-friedel Sep 17, 2024
baf2332
make sure dfk is propagated from dynamic file list to children
astro-friedel Sep 17, 2024
79211bc
documentation and annotation cleanup
astro-friedel Sep 17, 2024
825842f
cleanup
astro-friedel Sep 17, 2024
117e66d
Merge remote-tracking branch 'upstream/master' into trackingFileProve…
astro-friedel Sep 17, 2024
50d989d
removed list inheritance
astro-friedel Oct 31, 2024
14c6bc6
make sure the DynamicFileList is preserved
astro-friedel Oct 31, 2024
e116fe1
fixed dynamic file list indexing
astro-friedel Nov 12, 2024
9a90bff
Merge branch 'fixingDynamicFilesBug' into addingDynamicFiles
astro-friedel Nov 12, 2024
8c9a2a0
backed out DynamicFile stuff so that this branch is pure file tracking
astro-friedel Nov 12, 2024
eff8ab6
Merge branch 'master' into trackingFileProvenance
astro-friedel Nov 12, 2024
8a24a36
Merge branch 'master' into addingDynamicFiles
astro-friedel Nov 12, 2024
5ca48cf
Merge branch 'master' into trackingFileProvenance
astro-friedel Nov 27, 2024
9a05b2c
reorganized to group similar codes together
astro-friedel Nov 27, 2024
14aac2b
fixed message format
astro-friedel Nov 27, 2024
585fd03
fixed some typos
astro-friedel Nov 27, 2024
19f7747
updates to include misc info table
astro-friedel Nov 27, 2024
27f6391
updated docs
astro-friedel Nov 27, 2024
97ade30
fixed bug for remote files
astro-friedel Nov 27, 2024
33be080
test for provenance framework
astro-friedel Nov 27, 2024
07c2e45
flake8 fixes
astro-friedel Nov 27, 2024
97108e1
fixed missing line in docs
astro-friedel Nov 27, 2024
901e95f
Merge branch 'trackingFileProvenance' into addingDynamicFiles
astro-friedel Dec 2, 2024
d4ab526
Merge branch 'Parsl:master' into addingDynamicFiles
astro-friedel Dec 2, 2024
2554d45
more updates from file provenance
astro-friedel Dec 2, 2024
33a7336
Merge remote-tracking branch 'origin/addingDynamicFiles' into addingD…
astro-friedel Dec 2, 2024
9e482fe
fuxes to make sure outputs get properly propagated to apps
astro-friedel Dec 2, 2024
a837f08
removed extraneous ignores
astro-friedel Dec 3, 2024
6bef04f
reverted removal of trailing white spaces
astro-friedel Dec 3, 2024
5057d19
fixes per review comments
astro-friedel Dec 3, 2024
89d5e0a
ensure that md5sum is only calculated when file provenance tracking i…
astro-friedel Dec 3, 2024
c653cbc
fixes based on review comments
astro-friedel Dec 3, 2024
7efebad
added dfk as a required parameter to DataFuture
astro-friedel Dec 3, 2024
d6e7e5b
make sure file md5sum is only calculated
astro-friedel Dec 3, 2024
1fcdbc6
added full path and parsing for path for file database entries
astro-friedel Dec 3, 2024
b443cbb
fixed typos and tests
astro-friedel Dec 3, 2024
69cfc7b
put back required SECRET_KEY so that the file search form works
astro-friedel Dec 3, 2024
0316cf9
isort fixes
astro-friedel Dec 3, 2024
af51f0e
Merge branch 'Parsl:master' into trackingFileProvenance
astro-friedel Dec 3, 2024
9ed699d
removed unneeded import
astro-friedel Dec 3, 2024
ce609cc
mypy fixes
astro-friedel Dec 3, 2024
d646aaa
Merge remote-tracking branch 'upstream/master'
astro-friedel Dec 10, 2024
53f323d
fixed incorrect variable name
astro-friedel Dec 10, 2024
9444f42
Merge branch 'master' into trackingFileProvenance
astro-friedel Dec 10, 2024
ca70951
Merge branch 'trackingFileProvenance' into addingDynamicFiles
astro-friedel Dec 10, 2024
ee3f937
added bash file watcher
astro-friedel Jan 21, 2025
d7e06dc
Merge branch 'master' into addingDynamicFiles
astro-friedel Jan 21, 2025
ed23e59
fixes from isort
astro-friedel Jan 22, 2025
a327dc2
removed unnecessary code
astro-friedel Jan 28, 2025
22780b9
fixed regression of parameter typechecking
astro-friedel Jan 28, 2025
37042d7
removed redundant file info capture
astro-friedel Jan 28, 2025
bdb61a5
added notations to bash_watch
astro-friedel Jan 28, 2025
d50f1b3
fixed typos
astro-friedel Jan 28, 2025
8c34d43
re-ordered code so that file provenance data is captured before marki…
astro-friedel Jan 28, 2025
a45e23d
revert change for typecasting
astro-friedel Feb 18, 2025
4f8e209
Merge branch 'master' into trackingFileProvenance
astro-friedel Mar 14, 2025
4774d46
Merge branch 'master' into addingDynamicFiles
astro-friedel Mar 14, 2025
15e16fd
added file provenance docs, which got lost at some point
astro-friedel Mar 14, 2025
fa85433
added setters
astro-friedel Mar 27, 2025
96a1073
Merge branch 'master' into fixing
astro-friedel Mar 31, 2025
b1dd848
Merge branch 'fixing' into addingDynamicFiles
astro-friedel Mar 31, 2025
dde1c38
revert ordering of set_result to avoid deadlock
astro-friedel Mar 31, 2025
373649d
fixed bugs related to running on HPC machines, also updated docs
astro-friedel May 23, 2025
438d519
Merge branch 'master' into trackingFileProvenance
astro-friedel May 23, 2025
657df79
Merge branch 'master' into addingDynamicFiles
astro-friedel May 23, 2025
a76ff89
fixes for failing chekcs
astro-friedel May 29, 2025
a278a08
fix for failing checks
astro-friedel May 29, 2025
6089ec4
backed out bash_watch, so it can be a separate PR
astro-friedel Jul 8, 2025
5282a72
Merge branch 'master' into addingDynamicFiles
astro-friedel Jul 17, 2025
4fd8ab3
Merge branch 'master' into trackingFileProvenance
astro-friedel Jul 17, 2025
7ab4dfa
fixed broken test
astro-friedel Jul 17, 2025
24f03ca
fix for bug when environment is not recorded
astro-friedel Jul 21, 2025
8a77bcc
fix for broken test due to changes in argparse
astro-friedel Jul 21, 2025
3b88eb1
fix for warning in test
astro-friedel Jul 21, 2025
864eb3d
Merge branch 'trackingFileProvenance' into addingDynamicFiles
astro-friedel Jul 21, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -121,3 +121,17 @@ ENV/

# emacs buffers
\#*

runinfo*
parsl/tests/.pytest*

# documentation generation
docs/stubs/*
docs/1-parsl-introduction.ipynb

/tmp
parsl/data_provider/dyn.new.py

examples/

.jupyter/
Binary file added docs/images/mon_env_detail.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/mon_file_detail.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/mon_file_provenance.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/mon_task_detail.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/mon_workflow_files.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/images/mon_workflows_page.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ Data management
parsl.data_provider.data_manager.DataManager
parsl.data_provider.staging.Staging
parsl.data_provider.files.File
parsl.data_provider.dynamic_files.DynamicFileList
parsl.data_provider.ftp.FTPSeparateTaskStaging
parsl.data_provider.ftp.FTPInTaskStaging
parsl.data_provider.file_noop.NoOpFileStaging
Expand Down
150 changes: 147 additions & 3 deletions docs/userguide/advanced/monitoring.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ SQLite tools.
Monitoring configuration
------------------------

Parsl monitoring is only supported with the `parsl.executors.HighThroughputExecutor`.

The following example shows how to enable monitoring in the Parsl
configuration. Here the `parsl.monitoring.MonitoringHub` is specified to use port
55055 to receive monitoring messages from workers every 10 seconds.
Expand Down Expand Up @@ -47,6 +49,116 @@ configuration. Here the `parsl.monitoring.MonitoringHub` is specified to use por
)


File Provenance
---------------

The monitoring system can also be used to track file provenance. File provenance is defined as the
history of a file including:

* When the files was created
* File size in bytes
* File md5sum
* What task created the file
* What task(s) used the file
* What inputs were given to the task that created the file
* What environment was used (e.g. the 'worker_init' entry from a :py:class:`~parsl.providers.ExecutionProvider`),
not available with every provider.

The purpose of the file provenance tracking is to provide a mechanism where the user can see exactly
how a file was created and used in a workflow. This can be useful for debugging, understanding the
workflow, for ensuring that the workflow is reproducible, and reviewing past work. The file
provenance information is stored in the monitoring database and can be accessed using the
``parsl-visualize`` tool. To enable file provenance tracking, set the ``file_provenance`` flag to
``True`` in the `parsl.monitoring.MonitoringHub` configuration.

This functionality also enables you to log informational messages from you scripts, to capture
anything not automatically gathered. The main change to your code to use this functionality is to
assign the return value of the ``parsl.load`` to a variable. Then use the ``log_info`` function to
log the messages in the database. Note that this feature is only available in the main script, not
inside Apps. Passing this variable, ``my_cfg`` in the example below to an App will have undefined
behavior. The following example shows how to use this feature.

.. code-block:: python

import parsl
from parsl.monitoring.monitoring import MonitoringHub
from parsl.config import Config
from parsl.executors import HighThroughputExecutor
from parsl.addresses import address_by_hostname

import logging

config = Config(
executors=[
HighThroughputExecutor(
label="local_htex",
cores_per_worker=1,
max_workers_per_node=4,
address=address_by_hostname(),
)
],
monitoring=MonitoringHub(
hub_address=address_by_hostname(),
hub_port=55055,
monitoring_debug=False,
resource_monitoring_interval=10,
file_provenance=True,
),
strategy='none'
)

my_cfg = parsl.load(config)

my_cfg.log_info("This is an informational message")

The file provenance framework also works with the :ref:`label-dynamic-file-list` feature. When a
:py:class:`parsl.data_provider.dynamic_files.DynamicFileList` is used the framework will wait until the app completes
and any files contained in the :py:class:`parsl.data_provider.dynamic_files.DynamicFileList` are marked as done before
completing its processing.

.. note::
Known limitations: The file provenance feature will capture the creation of files and the use of files in an app,
but does not capture the modification of files it already knows about.

This functionality also enables you to log informational messages from you scripts, to capture anything not
automatically gathered. The main change to your code to use this functionality is to assign the return value of the
``parsl.load`` to a variable. Then use the ``log_info`` function to log the messages in the database. Note that this
feature is only available in the main script, not inside apps, unless you pass the variable (``my_cfg`` in the example
below), as an argument to the app. The following example shows how to use this feature.

.. code-block:: python

import parsl
from parsl.monitoring.monitoring import MonitoringHub
from parsl.config import Config
from parsl.executors import HighThroughputExecutor
from parsl.addresses import address_by_hostname

import logging

config = Config(
executors=[
HighThroughputExecutor(
label="local_htex",
cores_per_worker=1,
max_workers_per_node=4,
address=address_by_hostname(),
)
],
monitoring=MonitoringHub(
hub_address=address_by_hostname(),
hub_port=55055,
monitoring_debug=False,
resource_monitoring_interval=10,
capture_file_provenance=True,
),
strategy='none'
)

my_cfg = parsl.load(config)

my_cfg.log_info("This is an informational message")

Visualization
-------------

Expand All @@ -72,7 +184,7 @@ By default, the visualization web server listens on ``127.0.0.1:8080``. If the w
$ ssh -L 50000:127.0.0.1:8080 username@cluster_address

This command will bind your local machine's port 50000 to the remote cluster's port 8080.
The dashboard can then be accessed via the local machine's browser at ``127.0.0.1:50000``.
The dashboard can then be accessed via the local machine's browser at ``127.0.0.1:50000``.

.. warning:: Alternatively you can deploy the visualization server on a public interface. However, first check that this is allowed by the cluster's security policy. The following example shows how to deploy the web server on a public port (i.e., open to Internet via ``public_IP:55555``)::

Expand All @@ -96,12 +208,12 @@ Workflow Summary

The workflow summary page captures the run level details of a workflow, including start and end times
as well as task summary statistics. The workflow summary section is followed by the *App Summary* that lists
the various apps and invocation count for each.
the various apps and invocation count for each.

.. image:: ../../images/mon_workflow_summary.png


The workflow summary also presents three different views of the workflow:
The workflow summary also presents several different views of the workflow:

* Workflow DAG - with apps differentiated by colors: This visualization is useful to visually inspect the dependency
structure of the workflow. Hovering over the nodes in the DAG shows a tooltip for the app represented by the node and it's task ID.
Expand All @@ -117,3 +229,35 @@ The workflow summary also presents three different views of the workflow:

.. image:: ../../images/mon_resource_summary.png

* Workflow file provenance (only if enabled and files were used in the workflow): This visualization gives a tabular listing of each task that created (output) or used (input) a file. Each listed file has a link to a page detailing the file's information.

.. image:: ../../images/mon_workflow_files.png

File Provenance
^^^^^^^^^^^^^^^

The file provenance page provides an interface for searching for files and viewing their provenance. The % wildcard can be used in the search bar to match any number of characters. Any results are listed in a table below the search bar. Clicking on a file in the table will take you to the file's detail page.

.. image:: ../../images/mon_file_provenance.png

File Details
^^^^^^^^^^^^

The file details page provides information about a specific file, including the file's name, size, md5sum, and the tasks that created and used the file. Clicking on any of the tasks will take you to their respective details page. If the file was created by a task there will be an entry for the Environment used by that task. Clicking that link will take you to the Environment Details page.

.. image:: ../../images/mon_file_detail.png


Task Details
^^^^^^^^^^^^

The task details page provides information about a specific instantiation of a task. This information includes task dependencies, executor (environment), input and output files, and task arguments.

.. image:: ../../images/mon_task_detail.png

Environment Details
^^^^^^^^^^^^^^^^^^^

The environment details page provides information on the compute environment a task was run including the provider and launcher used and the worker_init that was used.

.. image:: ../../images/mon_env_detail.png
Loading
Loading