AlexsLemonade
diff --git a/‎docs/CHANGELOG.md
Lines changed: 14 additions & 0 deletions b/‎docs/CHANGELOG.md
Lines changed: 14 additions & 0 deletions
diff --git a/‎docs/download_files.md
Lines changed: 12 additions & 11 deletions b/‎docs/download_files.md
Lines changed: 12 additions & 11 deletions
diff --git a/‎docs/getting_started.md
Lines changed: 4 additions & 2 deletions b/‎docs/getting_started.md
Lines changed: 4 additions & 2 deletions
diff --git a/‎docs/images/anndata-project-download-folder.png
30.4 KB b/‎docs/images/anndata-project-download-folder.png
30.4 KB
diff --git a/‎docs/images/anndata-sample-citeseq-download-folder.png
65.6 KB b/‎docs/images/anndata-sample-citeseq-download-folder.png
65.6 KB
diff --git a/‎docs/images/anndata-sample-download-folder.png
45.4 KB b/‎docs/images/anndata-sample-download-folder.png
45.4 KB
diff --git a/‎docs/images/merged-anndata-project-citeseq-download-folder.png
19.6 KB b/‎docs/images/merged-anndata-project-citeseq-download-folder.png
19.6 KB
diff --git a/‎docs/images/merged-anndata-project-download-folder.png
18.4 KB b/‎docs/images/merged-anndata-project-download-folder.png
18.4 KB
diff --git a/‎docs/images/merged-project-download-folder.png
-7.52 KB b/‎docs/images/merged-project-download-folder.png
-7.52 KB
diff --git a/‎docs/images/multiplexed-download-folder.png
12.3 KB b/‎docs/images/multiplexed-download-folder.png
12.3 KB
diff --git a/‎docs/images/project-download-folder.png
-277 Bytes b/‎docs/images/project-download-folder.png
-277 Bytes
diff --git a/‎docs/images/sample-download-folder.png
5.74 KB b/‎docs/images/sample-download-folder.png
5.74 KB
diff --git a/‎docs/images/spatial-download-folder.png
43.5 KB b/‎docs/images/spatial-download-folder.png
43.5 KB
diff --git a/‎docs/merged_objects.md
Lines changed: 5 additions & 4 deletions b/‎docs/merged_objects.md
Lines changed: 5 additions & 4 deletions
diff --git a/‎docs/sce_file_contents.md
Lines changed: 9 additions & 5 deletions b/‎docs/sce_file_contents.md
Lines changed: 9 additions & 5 deletions
@@ -12,6 +12,20 @@ For more information about `AlexsLemonade/scpca-nf` versions, please see [the re
 <!-- PUT THE NEW CHANGELOG ENTRY RIGHT BELOW THIS -->
 <!-------------------------------------------------->
 
+## 2024.08.13
+
+* A new column, `age_timing`, is now present in the sample metadata tables included with each download.
+  * This column indicates if the age specified in the `age` column is the age at diagnosis (`diagnosis`), age at collection (`collection`), or `unknown`.
+  * This will also be present in the metadata of the `SingleCellExperiment` and `AnnData` objects.
+* AnnData objects have been updated to improve compatibility with [`Scanpy`](https://scanpy.readthedocs.io/en/stable/).
+  * PCA and UMAP embeddings are now stored as `X_pca` and `X_umap` (previously `X_PCA` and `X_UMAP`).
+  * A new column has been added to the `.var` slot, `highly_variable`, indicating if the given gene can be found in the list of highly variable genes.
+  * Parameters and variance weights associated with the PCA results is now available in `.uns["pca"]`.
+  * See {ref}`Components of an AnnData object<sce_file_contents:Components of an anndata object>` for more information.
+* Downloads now follow a new naming convention: `{identifier}_{modality}_{file format}_{date}.zip`
+  * For example, a sample (`SCPCS999990`) downloaded on 2024-08-13 in AnnData format will be named: `SCPCP999990_SINGLE-CELL_ANN-DATA_2024-08-13.zip`
+  * See the {ref}`Downloadable files page <download_files:downloadable files>` for more information.
+
 ## 2024.08.01
 
 * A table containing sample metadata (e.g., age, sex, diagnosis) is now available in both the QC report (`qc.html`) and the supplemental cell type report (`celltype-report.html`) included in all downloads.
 
@@ -39,21 +39,21 @@ See the [description of the Spatial transcriptomics output section below](#spati
 ## `SingleCellExperiment` downloads
 
 ### Download folder structure for project downloads:
-![project download folder](images/project-download-folder.png){width="400"}
+![project download folder](images/project-download-folder.png){width="600"}
 
 ### Download folder structure for individual sample downloads:
-![sample download folder](images/sample-download-folder.png){width="400"}
+![sample download folder](images/sample-download-folder.png){width="600"}
 
 ## `AnnData` downloads
 
 ### Download folder structure for project downloads:
-![project download folder](images/anndata-project-download-folder.png){width="400"}
+![project download folder](images/anndata-project-download-folder.png){width="600"}
 
 ### Download folder structure for individual sample downloads:
-![sample download folder](images/anndata-sample-download-folder.png){width="400"}
+![sample download folder](images/anndata-sample-download-folder.png){width="600"}
 
 ### Download folder structure for individual sample downloads with CITE-seq (ADT) data:
-![sample download folder](images/anndata-sample-citeseq-download-folder.png){width="400"}
+![sample download folder](images/anndata-sample-citeseq-download-folder.png){width="600"}
 
 If downloading a sample that contains a CITE-seq library as an `AnnData` object (`.h5ad` file), the quantified CITE-seq expression data is included as a separate file with the suffix `_adt.h5ad`.
 
@@ -103,7 +103,8 @@ Each row corresponds to a unique sample/library combination and contains the fol
 | `diagnosis`       | Tumor type                                                     |
 | `subdiagnosis`    | Subcategory of diagnosis or mutation status (if applicable)    |
 | `disease_timing`  | At what stage of disease the sample was obtained, either diagnosis or recurrence |
-| `age_at_diagnosis` | Age at time sample was obtained                               |
+| `age`             | Age provided by submitter                                |
+| `age_timing`      | Whether age is the age at diagnosis (`diagnosis`), age at collection (`collection`), or `unknown`. This will be `diagnosis` for all samples collected at diagnosis, indicated by the `disease_timing` column |
 | `sex`             | Sex of patient that the sample was obtained from               |
 | `tissue_location` | Where in the body the tumor sample was located                 |
 | `participant_id`  | Unique id corresponding to the donor from which the sample was obtained |
@@ -175,7 +176,7 @@ For project downloads, the counts and QC files will be organized by the _set_ of
 These sample set folders are named with an underscore-separated list of the sample ids for the libraries within, _e.g._, `SCPCS999990_SCPCS999991_SCPCS999992`.
 Bulk RNA-seq data, if present, will follow the [same format as bulk RNA-seq for single-sample libraries](#download-folder-structure-for-project-downloads).
 
-![multiplexed project download folder](images/multiplexed-download-folder.png){width="400"}
+![multiplexed project download folder](images/multiplexed-download-folder.png){width="600"}
 
 Because we do not perform demultiplexing to separate cells from multiplexed libraries into sample-specific count matrices, sample downloads from a project with multiplexed data will include all libraries that contain the sample of interest, but these libraries _will still contain cells from other samples_.
 
@@ -212,13 +213,13 @@ This includes a summary of the types of libraries (e.g., single-cell, single-nuc
 Every download also includes the individual [QC report](#qc-report) and, if applicable, [cell type annotation reports](#cell-type-report) for each library included in the merged object.
 
 ### Download folder structure for `SingleCellExperiment` merged downloads:
-![project download folder](images/merged-project-download-folder.png){width="400"}
+![project download folder](images/merged-project-download-folder.png){width="600"}
 
 ### Download folder structure for `AnnData` merged downloads:
-![project download folder](images/merged-anndata-project-download-folder.png){width="400"}
+![project download folder](images/merged-anndata-project-download-folder.png){width="600"}
 
 ### Download folder structure for `AnnData` merged downloads with CITE-seq (ADT) data:
-![project download folder](images/merged-anndata-project-citeseq-download-folder.png){width="400"}
+![project download folder](images/merged-anndata-project-citeseq-download-folder.png){width="600"}
 
 
 ## Spatial transcriptomics libraries
@@ -238,4 +239,4 @@ A full description of all files included in the download for spatial transcripto
 
 Every download also includes a single `spatial_metadata.tsv` file containing metadata for all libraries included in the download.
 
-![sample download with spatial](images/spatial-download-folder.png){width="400"}
+![sample download with spatial](images/spatial-download-folder.png){width="600"}
@@ -146,10 +146,10 @@ Dimensionality reduction results can be accessed in the `AnnData` objects using
 
 ```python
 # principal component analysis results
-processed_adata.obsm["X_PCA"]
+processed_adata.obsm["X_pca"]
 
 # UMAP results
-processed_adata.obsm["X_UMAP"]
+processed_adata.obsm["X_umap"]
 ```
 
 See below for more resources on dimensionality reduction:
@@ -179,6 +179,8 @@ This list can be accessed using the following command in the `AnnData` objects:
 processed_adata.uns["highly_variable_genes"]
 ```
 
+Additionally, the `AnnData` objects contain a column in the `.var` slot, `"highly_variable"`, indicating whether or not a gene is found in the list of highly variable genes.
+
 ### Clustering
 
 Cluster assignments obtained from [Graph-based clustering](http://bioconductor.org/books/3.16/OSCA.basic/clustering.html#clustering-graph) is also available in the processed objects.
 
@@ -197,7 +197,8 @@ metadata(merged_sce)$sample_metadata # sample metadata only for projects with mu
 | `participant_id`                           | Unique ID corresponding to the donor from which the sample was obtained                                                                                                                                                                                                                                                                                                                                                                                |
 | `submitter_id`                             | Original sample identifier from submitter                                                                                                                                                                                                                                                                                                                                                                                                              |
 | `submitter`                                | Submitter name/ID                                                                                                                                                                                                                                                                                                                                                                                                                                      |
-| `age`                                      | Age at time sample was obtained                                                                                                                                                                                                                                                                                                                                                                                                                        |
+| `age`             | Age provided by submitter                                |
+| `age_timing`      | Whether age is the age at diagnosis (`diagnosis`), age at collection (`collection`), or `unknown`. This will be `diagnosis` for all samples collected at diagnosis, indicated by the `disease_timing` column |
 | `sex`                                      | Sex of patient that the sample was obtained from                                                                                                                                                                                                                                                                                                                                                                                                       |
 | `diagnosis`                                | Tumor type                                                                                                                                                                                                                                                                                                                                                                                                                                             |
 | `subdiagnosis`                             | Subcategory of diagnosis or mutation status (if applicable)                                                                                                                                                                                                                                                                                                                                                                                            |
@@ -393,15 +394,15 @@ Additional experiment metadata is available in the {ref}`metadata TSV file inclu
 
 ### AnnData dimensionality reduction results
 
-The merged `AnnData` object contains a slot `.obsm` with both principal component analysis (`X_PCA`) and UMAP (`X_UMAP`) results.
+The merged `AnnData` object contains a slot `.obsm` with both principal component analysis (`X_pca`) and UMAP (`X_umap`) results.
 
 For information on how PCA and UMAP results were calculated see the {ref}`section on processed gene expression data <processing_information:Processed gene expression data>`.
 
 The following command can be used to access the PCA and UMAP results:
 
 ```python
-merged_adata_object.obsm["X_PCA"] # pca results
-merged_adata_object.obsm["X_UMAP"] # umap results
+merged_adata_object.obsm["X_pca"] # pca results
+merged_adata_object.obsm["X_umap"] # umap results
 ```
 
 
 
@@ -175,7 +175,8 @@ The following columns are included in the sample metadata data frame for all lib
 | `particpant_id`  | Unique ID corresponding to the donor from which the sample was obtained |
 | `submitter_id`    | Original sample identifier from submitter                      |
 | `submitter`       | Submitter name/ID                                              |
-| `age`             | Age at time sample was obtained                                |
+| `age`             | Age provided by submitter                                |
+| `age_timing`      | Whether age is the age at diagnosis (`diagnosis`), age at collection (`collection`), or `unknown`. This will be `diagnosis` for all samples collected at diagnosis, indicated by the `disease_timing` column |
 | `sex`             | Sex of patient that the sample was obtained from               |
 | `diagnosis`       | Tumor type                                                     |
 | `subdiagnosis`    | Subcategory of diagnosis or mutation status (if applicable)    |
@@ -389,7 +390,8 @@ The `AnnData` object also includes the following additional cell-level metadata
 | `participant_id`                           | Unique ID corresponding to the donor from which the sample was obtained                                                                                                                                                                   |
 | `submitter_id`                             | Original sample identifier from submitter                                                                                                                                                                                                 |
 | `submitter`                                | Submitter name/ID                                                                                                                                                                                                                         |
-| `age`                                      | Age at time sample was obtained                                                                                                                                                                                                           |
+| `age`             | Age provided by submitter                                |
+| `age_timing`      | Whether age is the age at diagnosis (`diagnosis`), age at collection (`collection`), or `unknown`. This will be `diagnosis` for all samples collected at diagnosis, indicated by the `disease_timing` column |
 | `sex`                                      | Sex of patient that the sample was obtained from                                                                                                                                                                                          |
 | `diagnosis`                                | Tumor type                                                                                                                                                                                                                                |
 | `subdiagnosis`                             | Subcategory of diagnosis or mutation status (if applicable)                                                                                                                                                                               |
@@ -425,6 +427,7 @@ The `AnnData` object also includes the following additional gene-level metadata
 | Column name   | Contents                                                         |
 | ------------- | ---------------------------------------------------------------- |
 | `is_feature_filtered` | Boolean indicating if the gene or feature is filtered out in the normalized matrix but is present in the raw matrix     |
+| `highly_variable` | Boolean indicating if the gene or feature is found in the highly variable gene list determined using `scran::modelGeneVar` and `scran::getTopHVGs`. Only present for `processed` objects   |
 
 
 ### AnnData experiment metadata
@@ -445,20 +448,21 @@ The `AnnData` object also includes the following additional items in the `.uns`
 | Item name   | Contents                                                         |
 | ------------- | ---------------------------------------------------------------- |
 | `schema_version` | CZI schema version used for `AnnData` formatting |
+| `pca` | A dictionary object containing the parameters and variance weights associated with the PCA matrix found in `.obsm["X_pca"]`. Only available for processed objects |
 
 
 ### AnnData dimensionality reduction results
 
-The H5AD file containing the processed `AnnData` object (`_processed_rna.h5ad`) contains a slot `.obsm` with both principal component analysis (`X_PCA`) and UMAP (`X_UMAP`) results.
+The H5AD file containing the processed `AnnData` object (`_processed_rna.h5ad`) contains a slot `.obsm` with both principal component analysis (`X_pca`) and UMAP (`X_umap`) results stored as a `numpy.ndarray`.
 For all other H5AD files, the `.obsm` slot will be empty as no dimensionality reduction was performed.
 
 For information on how PCA and UMAP results were calculated see the {ref}`section on processed gene expression data <processing_information:Processed gene expression data>`.
 
 The following command can be used to access the PCA and UMAP results:
 
 ```python
-adata_object.obsm["X_PCA"] # pca results
-adata_object.obsm["X_UMAP"] # umap results
+adata_object.obsm["X_pca"] # pca results
+adata_object.obsm["X_umap"] # umap results
 ```
 
 ### Additional AnnData components for CITE-seq libraries (with ADT tags)