You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* A new column, `age_timing`, is now present in the sample metadata tables included with each download.
18
+
* This column indicates if the age specified in the `age` column is the age at diagnosis (`diagnosis`), age at collection (`collection`), or `unknown`.
19
+
* This will also be present in the metadata of the `SingleCellExperiment` and `AnnData` objects.
20
+
* AnnData objects have been updated to improve compatibility with [`Scanpy`](https://scanpy.readthedocs.io/en/stable/).
21
+
* PCA and UMAP embeddings are now stored as `X_pca` and `X_umap` (previously `X_PCA` and `X_UMAP`).
22
+
* A new column has been added to the `.var` slot, `highly_variable`, indicating if the given gene can be found in the list of highly variable genes.
23
+
* Parameters and variance weights associated with the PCA results is now available in `.uns["pca"]`.
24
+
* See {ref}`Components of an AnnData object<sce_file_contents:Components of an anndata object>` for more information.
25
+
* Downloads now follow a new naming convention: `{identifier}_{modality}_{file format}_{date}.zip`
26
+
* For example, a sample (`SCPCS999990`) downloaded on 2024-08-13 in AnnData format will be named: `SCPCP999990_SINGLE-CELL_ANN-DATA_2024-08-13.zip`
27
+
* See the {ref}`Downloadable files page <download_files:downloadable files>` for more information.
28
+
15
29
## 2024.08.01
16
30
17
31
* A table containing sample metadata (e.g., age, sex, diagnosis) is now available in both the QC report (`qc.html`) and the supplemental cell type report (`celltype-report.html`) included in all downloads.
If downloading a sample that contains a CITE-seq library as an `AnnData` object (`.h5ad` file), the quantified CITE-seq expression data is included as a separate file with the suffix `_adt.h5ad`.
59
59
@@ -103,7 +103,8 @@ Each row corresponds to a unique sample/library combination and contains the fol
103
103
|`diagnosis`| Tumor type |
104
104
|`subdiagnosis`| Subcategory of diagnosis or mutation status (if applicable) |
105
105
|`disease_timing`| At what stage of disease the sample was obtained, either diagnosis or recurrence |
106
-
|`age_at_diagnosis`| Age at time sample was obtained |
106
+
|`age`| Age provided by submitter |
107
+
|`age_timing`| Whether age is the age at diagnosis (`diagnosis`), age at collection (`collection`), or `unknown`. This will be `diagnosis` for all samples collected at diagnosis, indicated by the `disease_timing` column |
107
108
|`sex`| Sex of patient that the sample was obtained from |
108
109
|`tissue_location`| Where in the body the tumor sample was located |
109
110
|`participant_id`| Unique id corresponding to the donor from which the sample was obtained |
@@ -175,7 +176,7 @@ For project downloads, the counts and QC files will be organized by the _set_ of
175
176
These sample set folders are named with an underscore-separated list of the sample ids for the libraries within, _e.g._, `SCPCS999990_SCPCS999991_SCPCS999992`.
176
177
Bulk RNA-seq data, if present, will follow the [same format as bulk RNA-seq for single-sample libraries](#download-folder-structure-for-project-downloads).
Because we do not perform demultiplexing to separate cells from multiplexed libraries into sample-specific count matrices, sample downloads from a project with multiplexed data will include all libraries that contain the sample of interest, but these libraries _will still contain cells from other samples_.
181
182
@@ -212,13 +213,13 @@ This includes a summary of the types of libraries (e.g., single-cell, single-nuc
212
213
Every download also includes the individual [QC report](#qc-report) and, if applicable, [cell type annotation reports](#cell-type-report) for each library included in the merged object.
213
214
214
215
### Download folder structure for `SingleCellExperiment` merged downloads:
Copy file name to clipboardExpand all lines: docs/getting_started.md
+4-2Lines changed: 4 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -146,10 +146,10 @@ Dimensionality reduction results can be accessed in the `AnnData` objects using
146
146
147
147
```python
148
148
# principal component analysis results
149
-
processed_adata.obsm["X_PCA"]
149
+
processed_adata.obsm["X_pca"]
150
150
151
151
# UMAP results
152
-
processed_adata.obsm["X_UMAP"]
152
+
processed_adata.obsm["X_umap"]
153
153
```
154
154
155
155
See below for more resources on dimensionality reduction:
@@ -179,6 +179,8 @@ This list can be accessed using the following command in the `AnnData` objects:
179
179
processed_adata.uns["highly_variable_genes"]
180
180
```
181
181
182
+
Additionally, the `AnnData` objects contain a column in the `.var` slot, `"highly_variable"`, indicating whether or not a gene is found in the list of highly variable genes.
183
+
182
184
### Clustering
183
185
184
186
Cluster assignments obtained from [Graph-based clustering](http://bioconductor.org/books/3.16/OSCA.basic/clustering.html#clustering-graph) is also available in the processed objects.
Copy file name to clipboardExpand all lines: docs/merged_objects.md
+5-4Lines changed: 5 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -197,7 +197,8 @@ metadata(merged_sce)$sample_metadata # sample metadata only for projects with mu
197
197
|`participant_id`| Unique ID corresponding to the donor from which the sample was obtained |
198
198
|`submitter_id`| Original sample identifier from submitter |
199
199
|`submitter`| Submitter name/ID |
200
-
|`age`| Age at time sample was obtained |
200
+
|`age`| Age provided by submitter |
201
+
|`age_timing`| Whether age is the age at diagnosis (`diagnosis`), age at collection (`collection`), or `unknown`. This will be `diagnosis` for all samples collected at diagnosis, indicated by the `disease_timing` column |
201
202
|`sex`| Sex of patient that the sample was obtained from |
202
203
|`diagnosis`| Tumor type |
203
204
|`subdiagnosis`| Subcategory of diagnosis or mutation status (if applicable) |
@@ -393,15 +394,15 @@ Additional experiment metadata is available in the {ref}`metadata TSV file inclu
393
394
394
395
### AnnData dimensionality reduction results
395
396
396
-
The merged `AnnData` object contains a slot `.obsm` with both principal component analysis (`X_PCA`) and UMAP (`X_UMAP`) results.
397
+
The merged `AnnData` object contains a slot `.obsm` with both principal component analysis (`X_pca`) and UMAP (`X_umap`) results.
397
398
398
399
For information on how PCA and UMAP results were calculated see the {ref}`section on processed gene expression data <processing_information:Processed gene expression data>`.
399
400
400
401
The following command can be used to access the PCA and UMAP results:
Copy file name to clipboardExpand all lines: docs/sce_file_contents.md
+9-5Lines changed: 9 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -175,7 +175,8 @@ The following columns are included in the sample metadata data frame for all lib
175
175
|`particpant_id`| Unique ID corresponding to the donor from which the sample was obtained |
176
176
|`submitter_id`| Original sample identifier from submitter |
177
177
|`submitter`| Submitter name/ID |
178
-
|`age`| Age at time sample was obtained |
178
+
|`age`| Age provided by submitter |
179
+
|`age_timing`| Whether age is the age at diagnosis (`diagnosis`), age at collection (`collection`), or `unknown`. This will be `diagnosis` for all samples collected at diagnosis, indicated by the `disease_timing` column |
179
180
|`sex`| Sex of patient that the sample was obtained from |
180
181
|`diagnosis`| Tumor type |
181
182
|`subdiagnosis`| Subcategory of diagnosis or mutation status (if applicable) |
@@ -389,7 +390,8 @@ The `AnnData` object also includes the following additional cell-level metadata
389
390
|`participant_id`| Unique ID corresponding to the donor from which the sample was obtained |
390
391
|`submitter_id`| Original sample identifier from submitter |
391
392
|`submitter`| Submitter name/ID |
392
-
|`age`| Age at time sample was obtained |
393
+
|`age`| Age provided by submitter |
394
+
|`age_timing`| Whether age is the age at diagnosis (`diagnosis`), age at collection (`collection`), or `unknown`. This will be `diagnosis` for all samples collected at diagnosis, indicated by the `disease_timing` column |
393
395
|`sex`| Sex of patient that the sample was obtained from |
394
396
|`diagnosis`| Tumor type |
395
397
|`subdiagnosis`| Subcategory of diagnosis or mutation status (if applicable) |
@@ -425,6 +427,7 @@ The `AnnData` object also includes the following additional gene-level metadata
|`is_feature_filtered`| Boolean indicating if the gene or feature is filtered out in the normalized matrix but is present in the raw matrix |
430
+
|`highly_variable`| Boolean indicating if the gene or feature is found in the highly variable gene list determined using `scran::modelGeneVar` and `scran::getTopHVGs`. Only present for `processed` objects |
428
431
429
432
430
433
### AnnData experiment metadata
@@ -445,20 +448,21 @@ The `AnnData` object also includes the following additional items in the `.uns`
|`schema_version`| CZI schema version used for `AnnData` formatting |
451
+
|`pca`| A dictionary object containing the parameters and variance weights associated with the PCA matrix found in `.obsm["X_pca"]`. Only available for processed objects |
448
452
449
453
450
454
### AnnData dimensionality reduction results
451
455
452
-
The H5AD file containing the processed `AnnData` object (`_processed_rna.h5ad`) contains a slot `.obsm` with both principal component analysis (`X_PCA`) and UMAP (`X_UMAP`) results.
456
+
The H5AD file containing the processed `AnnData` object (`_processed_rna.h5ad`) contains a slot `.obsm` with both principal component analysis (`X_pca`) and UMAP (`X_umap`) results stored as a `numpy.ndarray`.
453
457
For all other H5AD files, the `.obsm` slot will be empty as no dimensionality reduction was performed.
454
458
455
459
For information on how PCA and UMAP results were calculated see the {ref}`section on processed gene expression data <processing_information:Processed gene expression data>`.
456
460
457
461
The following command can be used to access the PCA and UMAP results:
458
462
459
463
```python
460
-
adata_object.obsm["X_PCA"] # pca results
461
-
adata_object.obsm["X_UMAP"] # umap results
464
+
adata_object.obsm["X_pca"] # pca results
465
+
adata_object.obsm["X_umap"] # umap results
462
466
```
463
467
464
468
### Additional AnnData components for CITE-seq libraries (with ADT tags)
0 commit comments