- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10
[WIP] Add Developer Documentation Content #184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a lot of this confusing, mostly due to a lack of detail and inconsistent organization.
Regarding detail, these are developer docs, so more details are almost always going to be better than fewer details. The people reading these docs want to know the technical details of how Thicket works, what it does, and what constraints they have to comply with.
Regarding organization, it's better to essentially hold the reader's hand through the pages than not. The more you can do to make the organization easy to understand and almost formulaic, the better. For example, in the "Concatenating Thickets" page, you could end the introduction by saying that concatenating Thickets consists of 3 steps:
- Unifying graphs
- Updating Node objects based on changes to the graphs
- Merging DataFrames by row or column
The text for these bullets would then literally become the names of the subsections.
|  | ||
| 1. :code:`utils.validate_dataframe._check_duplicate_inner_idx` - There are no duplicate indices. | ||
| 2. :code:`utils.validate_dataframe._check_missing_hnid` - *Node* objects, identified by their :code:`_hatchet_nid` are in ascending order without gaps. | ||
| 3. :code:`utils.validate_dataframe._validate_name_column` - The values in the "name" column match the :code:`Node.frame["name"]` attribute for that row or are :code:`None`. No newline at end of file | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general format of this page is really confusing. I had to reread several times before I understood what was being conveyed.
|  | ||
| ################## | ||
| Unifying Thickets | ||
| ################## | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These sections don't make a ton of sense right now because they don't clearly connect back to anything in the intro text above or the figure. I'd add a sentence or two to do this connecting. Maybe something that says "step X in the figure consists of the following steps."
| Unifying Thickets | ||
| ################## | ||
|  | ||
| This section mainly refers to the :code:`Thicket.Ensemble._unify()` function. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain what "Unifying Thickets" means here. What is it? How does it differ from "Concatenating Thickets?"
        
          
                docs/concatenating_thickets.rst
              
                Outdated
          
        
      | Unifying Calltrees | ||
| =================== | ||
|  | ||
| *Unifying Calltrees* is the process of performing a **set operation** (e.g. :code:`Hatchet.graph.union()`) on multiple :code:`Thicket.graph`'s. Comparing two graphs involves comparing :code:`Hatchet.Node` objects between the graphs. The :code:`Hatchet.graph.union()` function computes the union graph between two Hatchet graphs. Nodes are compared by: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no such thing as a universal "set operation." Is it creating a set? Is it performing a union on an existing set? Intersection? Some other set operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this to "graph operation", referring to operations that can be performed on a graph. This could just say "... performing a union or intersection on multiple ..."
        
          
                docs/concatenating_thickets.rst
              
                Outdated
          
        
      |  | ||
| Nodes that match in #1 and #2 are merged in the resulting union graph as a new :code:`Hatchet.Node` object (`deep copy of the first node <https://github.com/LLNL/hatchet/blob/6a6d7027056df96bd1c919ab34a9acce81f3b9a1/hatchet/graph.py#L227>`_). Deep copies of nodes that do **not** match are inserted into the union graph at the appropriate depth. | ||
|  | ||
| *Note:* The :code:`Thicket.intersection()` function first applies the :code:`Hatchet.graph.union()` before computing the intersection of the graphs, since their does not exist a :code:`Hatchet.graph.intersection` function. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did intersection() come from? I have absolutely no idea how this plugs into Thicket based off of the rest of the page.
| Updating Node Objects | ||
| ====================== | ||
|  | ||
| Because Node objects must be identical between Thicket components (see :ref:`/thicket_properties.rst#nodes`), The resulting new nodes in the union graph must replace the old node objects in components like the :code:`Thicket.dataframe.index` (see `code <https://github.com/LLNL/thicket/blob/develop/thicket/ensemble.py#L68-L83>`_). The :code:`Hatchet.graph.union()` function provides a dictionary mapping old nodes to new nodes, however to avoid applying these updates after every union between two graphs, we `update a dictionary of all the node mappings <https://github.com/LLNL/thicket/blob/develop/thicket/ensemble.py#L53-L67>`_ and apply the updates after all of the unions have been computed. This is **only** necessary when concatenating more than **two** Thickets, as only one union will be performed when concatenating two Thickets. We `apply this idea when reading files <https://github.com/LLNL/thicket/blob/develop/thicket/thicket.py#L393-L413>`_ to avoid this cost. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My other comments about connecting back to the earlier parts of the page still apply here, but otherwise, this section looks good.
| Index Concatenation | ||
| #################### | ||
|  | ||
| *Index Concatenation* refers to the process that happens for the performance and metadata tables. We concatenate the tables, which is essentially "stacking the rows on top of each other". Because we check that the performance profiles we concatenate are unique (:ref:`/thicket_properties.rst#profiles`), we do not need to worry about duplicate indices in either table. We sort the index of both tables, which interleaves the profiles in the MultiIndex of the performance table to visually group all of the profiles in the table for each node. An example of this operation can be seen in the :ref:`/thicket_tutorial.ipynb`, when :code:`axis="columns"`. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Index Concatenation refers to the process that happens for the performance and metadata tables.
I don't understand this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, most of the wording in this section is hard to understand. Honestly, if I didn't have pre-existing knowledge of how Thicket works, I wouldn't understand this at all.
| Column Concatenation | ||
| ##################### | ||
|  | ||
| *Column Concatenation* refers to the process that happens in the performance, metadata, and statistics tables. We create a MultiIndex out of the columns, such that for each metric, there is a higher level index label. An example of this operation can be seen in the :ref:`/thicket_tutorial.ipynb`, when :code:`axis="columns"`. No newline at end of file | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Column Concatenation refers to the process that happens in the performance, metadata, and statistics tables.
Again, I don't really understand the first sentence of this section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We create a MultiIndex out of the columns, such that for each metric, there is a higher level index label.
Although correct, this seems kind of "hand wavey." If these are meant to be developer docs, don't be afraid of going into detail with this stuff. The people reading this want to know all the technical stuff about how Thicket works.
This PR adds some developer documentation pages:
thicket_properties- descriptions about the assumptions we make in Thicket and how they are enforced in the code.concatenating_thickets- explanation of the process of concatenating Thickets for developers, focusing on the topics that are hard to understand from the code and we get the most questions on.