Merge pull request #338 from AlexsLemonade/allyhawkins/scanpy-compatibility-updates

allyhawkins · web-flow · commit 13645eaa39e8 · 2024-07-31T08:55:44.000-05:00
Updates to AnnData contents based on new scanpy compatibility
diff --git a/docs/getting_started.md b/docs/getting_started.md
@@ -146,10 +146,10 @@ Dimensionality reduction results can be accessed in the `AnnData` objects using
 
 ```python
 # principal component analysis results
-processed_adata.obsm["X_PCA"]
+processed_adata.obsm["X_pca"]
 
 # UMAP results
-processed_adata.obsm["X_UMAP"]
+processed_adata.obsm["X_umap"]
 ```
 
 See below for more resources on dimensionality reduction:
@@ -179,6 +179,8 @@ This list can be accessed using the following command in the `AnnData` objects:
 processed_adata.uns["highly_variable_genes"]
 ```
 
+Additionally, the `AnnData` objects contain a column in the `.var` slot, `"highly_variable"`, indicating whether or not a gene is found in the list of highly variable genes.
+
 ### Clustering
 
 Cluster assignments obtained from [Graph-based clustering](http://bioconductor.org/books/3.16/OSCA.basic/clustering.html#clustering-graph) is also available in the processed objects.
diff --git a/docs/merged_objects.md b/docs/merged_objects.md
@@ -394,15 +394,15 @@ Additional experiment metadata is available in the {ref}`metadata TSV file inclu
 
 ### AnnData dimensionality reduction results
 
-The merged `AnnData` object contains a slot `.obsm` with both principal component analysis (`X_PCA`) and UMAP (`X_UMAP`) results.
+The merged `AnnData` object contains a slot `.obsm` with both principal component analysis (`X_pca`) and UMAP (`X_umap`) results.
 
 For information on how PCA and UMAP results were calculated see the {ref}`section on processed gene expression data <processing_information:Processed gene expression data>`.
 
 The following command can be used to access the PCA and UMAP results:
 
 ```python
-merged_adata_object.obsm["X_PCA"] # pca results
-merged_adata_object.obsm["X_UMAP"] # umap results
+merged_adata_object.obsm["X_pca"] # pca results
+merged_adata_object.obsm["X_umap"] # umap results
 ```
 
 
diff --git a/docs/sce_file_contents.md b/docs/sce_file_contents.md
@@ -427,6 +427,7 @@ The `AnnData` object also includes the following additional gene-level metadata
 | Column name   | Contents                                                         |
 | ------------- | ---------------------------------------------------------------- |
 | `is_feature_filtered` | Boolean indicating if the gene or feature is filtered out in the normalized matrix but is present in the raw matrix     |
+| `highly_variable` | Boolean indicating if the gene or feature is found in the highly variable gene list determined using `scran::modelGeneVar` and `scran::getTopHVGs`. Only present for `processed` objects   |
 
 
 ### AnnData experiment metadata
@@ -447,20 +448,21 @@ The `AnnData` object also includes the following additional items in the `.uns`
 | Item name   | Contents                                                         |
 | ------------- | ---------------------------------------------------------------- |
 | `schema_version` | CZI schema version used for `AnnData` formatting |
+| `pca` | A dictionary object containing the parameters and variance weights associated with the PCA matrix found in `.obsm["X_pca"]`. Only available for processed objects |
 
 
 ### AnnData dimensionality reduction results
 
-The H5AD file containing the processed `AnnData` object (`_processed_rna.h5ad`) contains a slot `.obsm` with both principal component analysis (`X_PCA`) and UMAP (`X_UMAP`) results.
+The H5AD file containing the processed `AnnData` object (`_processed_rna.h5ad`) contains a slot `.obsm` with both principal component analysis (`X_pca`) and UMAP (`X_umap`) results stored as a `numpy.ndarray`.
 For all other H5AD files, the `.obsm` slot will be empty as no dimensionality reduction was performed.
 
 For information on how PCA and UMAP results were calculated see the {ref}`section on processed gene expression data <processing_information:Processed gene expression data>`.
 
 The following command can be used to access the PCA and UMAP results:
 
 ```python
-adata_object.obsm["X_PCA"] # pca results
-adata_object.obsm["X_UMAP"] # umap results
+adata_object.obsm["X_pca"] # pca results
+adata_object.obsm["X_umap"] # umap results
 ```
 
 ### Additional AnnData components for CITE-seq libraries (with ADT tags)