Skip to content

Commit bfbbfb8

Browse files
Merge pull request #340 from AlexsLemonade/allyhawkins/age-timing-release
Add CHANGELOG entry for `age_timing`, scanpy compatibility, and new download convention + new download images
2 parents 1c4ae3b + a97b61c commit bfbbfb8

15 files changed

+44
-22
lines changed

docs/CHANGELOG.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,20 @@ For more information about `AlexsLemonade/scpca-nf` versions, please see [the re
1212
<!-- PUT THE NEW CHANGELOG ENTRY RIGHT BELOW THIS -->
1313
<!-------------------------------------------------->
1414

15+
## 2024.08.13
16+
17+
* A new column, `age_timing`, is now present in the sample metadata tables included with each download.
18+
* This column indicates if the age specified in the `age` column is the age at diagnosis (`diagnosis`), age at collection (`collection`), or `unknown`.
19+
* This will also be present in the metadata of the `SingleCellExperiment` and `AnnData` objects.
20+
* AnnData objects have been updated to improve compatibility with [`Scanpy`](https://scanpy.readthedocs.io/en/stable/).
21+
* PCA and UMAP embeddings are now stored as `X_pca` and `X_umap` (previously `X_PCA` and `X_UMAP`).
22+
* A new column has been added to the `.var` slot, `highly_variable`, indicating if the given gene can be found in the list of highly variable genes.
23+
* Parameters and variance weights associated with the PCA results is now available in `.uns["pca"]`.
24+
* See {ref}`Components of an AnnData object<sce_file_contents:Components of an anndata object>` for more information.
25+
* Downloads now follow a new naming convention: `{identifier}_{modality}_{file format}_{date}.zip`
26+
* For example, a sample (`SCPCS999990`) downloaded on 2024-08-13 in AnnData format will be named: `SCPCP999990_SINGLE-CELL_ANN-DATA_2024-08-13.zip`
27+
* See the {ref}`Downloadable files page <download_files:downloadable files>` for more information.
28+
1529
## 2024.08.01
1630

1731
* A table containing sample metadata (e.g., age, sex, diagnosis) is now available in both the QC report (`qc.html`) and the supplemental cell type report (`celltype-report.html`) included in all downloads.

docs/download_files.md

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -39,21 +39,21 @@ See the [description of the Spatial transcriptomics output section below](#spati
3939
## `SingleCellExperiment` downloads
4040

4141
### Download folder structure for project downloads:
42-
![project download folder](images/project-download-folder.png){width="400"}
42+
![project download folder](images/project-download-folder.png){width="600"}
4343

4444
### Download folder structure for individual sample downloads:
45-
![sample download folder](images/sample-download-folder.png){width="400"}
45+
![sample download folder](images/sample-download-folder.png){width="600"}
4646

4747
## `AnnData` downloads
4848

4949
### Download folder structure for project downloads:
50-
![project download folder](images/anndata-project-download-folder.png){width="400"}
50+
![project download folder](images/anndata-project-download-folder.png){width="600"}
5151

5252
### Download folder structure for individual sample downloads:
53-
![sample download folder](images/anndata-sample-download-folder.png){width="400"}
53+
![sample download folder](images/anndata-sample-download-folder.png){width="600"}
5454

5555
### Download folder structure for individual sample downloads with CITE-seq (ADT) data:
56-
![sample download folder](images/anndata-sample-citeseq-download-folder.png){width="400"}
56+
![sample download folder](images/anndata-sample-citeseq-download-folder.png){width="600"}
5757

5858
If downloading a sample that contains a CITE-seq library as an `AnnData` object (`.h5ad` file), the quantified CITE-seq expression data is included as a separate file with the suffix `_adt.h5ad`.
5959

@@ -103,7 +103,8 @@ Each row corresponds to a unique sample/library combination and contains the fol
103103
| `diagnosis` | Tumor type |
104104
| `subdiagnosis` | Subcategory of diagnosis or mutation status (if applicable) |
105105
| `disease_timing` | At what stage of disease the sample was obtained, either diagnosis or recurrence |
106-
| `age_at_diagnosis` | Age at time sample was obtained |
106+
| `age` | Age provided by submitter |
107+
| `age_timing` | Whether age is the age at diagnosis (`diagnosis`), age at collection (`collection`), or `unknown`. This will be `diagnosis` for all samples collected at diagnosis, indicated by the `disease_timing` column |
107108
| `sex` | Sex of patient that the sample was obtained from |
108109
| `tissue_location` | Where in the body the tumor sample was located |
109110
| `participant_id` | Unique id corresponding to the donor from which the sample was obtained |
@@ -175,7 +176,7 @@ For project downloads, the counts and QC files will be organized by the _set_ of
175176
These sample set folders are named with an underscore-separated list of the sample ids for the libraries within, _e.g._, `SCPCS999990_SCPCS999991_SCPCS999992`.
176177
Bulk RNA-seq data, if present, will follow the [same format as bulk RNA-seq for single-sample libraries](#download-folder-structure-for-project-downloads).
177178

178-
![multiplexed project download folder](images/multiplexed-download-folder.png){width="400"}
179+
![multiplexed project download folder](images/multiplexed-download-folder.png){width="600"}
179180

180181
Because we do not perform demultiplexing to separate cells from multiplexed libraries into sample-specific count matrices, sample downloads from a project with multiplexed data will include all libraries that contain the sample of interest, but these libraries _will still contain cells from other samples_.
181182

@@ -212,13 +213,13 @@ This includes a summary of the types of libraries (e.g., single-cell, single-nuc
212213
Every download also includes the individual [QC report](#qc-report) and, if applicable, [cell type annotation reports](#cell-type-report) for each library included in the merged object.
213214

214215
### Download folder structure for `SingleCellExperiment` merged downloads:
215-
![project download folder](images/merged-project-download-folder.png){width="400"}
216+
![project download folder](images/merged-project-download-folder.png){width="600"}
216217

217218
### Download folder structure for `AnnData` merged downloads:
218-
![project download folder](images/merged-anndata-project-download-folder.png){width="400"}
219+
![project download folder](images/merged-anndata-project-download-folder.png){width="600"}
219220

220221
### Download folder structure for `AnnData` merged downloads with CITE-seq (ADT) data:
221-
![project download folder](images/merged-anndata-project-citeseq-download-folder.png){width="400"}
222+
![project download folder](images/merged-anndata-project-citeseq-download-folder.png){width="600"}
222223

223224

224225
## Spatial transcriptomics libraries
@@ -238,4 +239,4 @@ A full description of all files included in the download for spatial transcripto
238239

239240
Every download also includes a single `spatial_metadata.tsv` file containing metadata for all libraries included in the download.
240241

241-
![sample download with spatial](images/spatial-download-folder.png){width="400"}
242+
![sample download with spatial](images/spatial-download-folder.png){width="600"}

docs/getting_started.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -146,10 +146,10 @@ Dimensionality reduction results can be accessed in the `AnnData` objects using
146146

147147
```python
148148
# principal component analysis results
149-
processed_adata.obsm["X_PCA"]
149+
processed_adata.obsm["X_pca"]
150150

151151
# UMAP results
152-
processed_adata.obsm["X_UMAP"]
152+
processed_adata.obsm["X_umap"]
153153
```
154154

155155
See below for more resources on dimensionality reduction:
@@ -179,6 +179,8 @@ This list can be accessed using the following command in the `AnnData` objects:
179179
processed_adata.uns["highly_variable_genes"]
180180
```
181181

182+
Additionally, the `AnnData` objects contain a column in the `.var` slot, `"highly_variable"`, indicating whether or not a gene is found in the list of highly variable genes.
183+
182184
### Clustering
183185

184186
Cluster assignments obtained from [Graph-based clustering](http://bioconductor.org/books/3.16/OSCA.basic/clustering.html#clustering-graph) is also available in the processed objects.
30.4 KB
Loading
Loading
45.4 KB
Loading
Loading
Loading
-7.52 KB
Loading
12.3 KB
Loading
-277 Bytes
Loading
5.74 KB
Loading
43.5 KB
Loading

docs/merged_objects.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -197,7 +197,8 @@ metadata(merged_sce)$sample_metadata # sample metadata only for projects with mu
197197
| `participant_id` | Unique ID corresponding to the donor from which the sample was obtained |
198198
| `submitter_id` | Original sample identifier from submitter |
199199
| `submitter` | Submitter name/ID |
200-
| `age` | Age at time sample was obtained |
200+
| `age` | Age provided by submitter |
201+
| `age_timing` | Whether age is the age at diagnosis (`diagnosis`), age at collection (`collection`), or `unknown`. This will be `diagnosis` for all samples collected at diagnosis, indicated by the `disease_timing` column |
201202
| `sex` | Sex of patient that the sample was obtained from |
202203
| `diagnosis` | Tumor type |
203204
| `subdiagnosis` | Subcategory of diagnosis or mutation status (if applicable) |
@@ -393,15 +394,15 @@ Additional experiment metadata is available in the {ref}`metadata TSV file inclu
393394

394395
### AnnData dimensionality reduction results
395396

396-
The merged `AnnData` object contains a slot `.obsm` with both principal component analysis (`X_PCA`) and UMAP (`X_UMAP`) results.
397+
The merged `AnnData` object contains a slot `.obsm` with both principal component analysis (`X_pca`) and UMAP (`X_umap`) results.
397398

398399
For information on how PCA and UMAP results were calculated see the {ref}`section on processed gene expression data <processing_information:Processed gene expression data>`.
399400

400401
The following command can be used to access the PCA and UMAP results:
401402

402403
```python
403-
merged_adata_object.obsm["X_PCA"] # pca results
404-
merged_adata_object.obsm["X_UMAP"] # umap results
404+
merged_adata_object.obsm["X_pca"] # pca results
405+
merged_adata_object.obsm["X_umap"] # umap results
405406
```
406407

407408

docs/sce_file_contents.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,8 @@ The following columns are included in the sample metadata data frame for all lib
175175
| `particpant_id` | Unique ID corresponding to the donor from which the sample was obtained |
176176
| `submitter_id` | Original sample identifier from submitter |
177177
| `submitter` | Submitter name/ID |
178-
| `age` | Age at time sample was obtained |
178+
| `age` | Age provided by submitter |
179+
| `age_timing` | Whether age is the age at diagnosis (`diagnosis`), age at collection (`collection`), or `unknown`. This will be `diagnosis` for all samples collected at diagnosis, indicated by the `disease_timing` column |
179180
| `sex` | Sex of patient that the sample was obtained from |
180181
| `diagnosis` | Tumor type |
181182
| `subdiagnosis` | Subcategory of diagnosis or mutation status (if applicable) |
@@ -389,7 +390,8 @@ The `AnnData` object also includes the following additional cell-level metadata
389390
| `participant_id` | Unique ID corresponding to the donor from which the sample was obtained |
390391
| `submitter_id` | Original sample identifier from submitter |
391392
| `submitter` | Submitter name/ID |
392-
| `age` | Age at time sample was obtained |
393+
| `age` | Age provided by submitter |
394+
| `age_timing` | Whether age is the age at diagnosis (`diagnosis`), age at collection (`collection`), or `unknown`. This will be `diagnosis` for all samples collected at diagnosis, indicated by the `disease_timing` column |
393395
| `sex` | Sex of patient that the sample was obtained from |
394396
| `diagnosis` | Tumor type |
395397
| `subdiagnosis` | Subcategory of diagnosis or mutation status (if applicable) |
@@ -425,6 +427,7 @@ The `AnnData` object also includes the following additional gene-level metadata
425427
| Column name | Contents |
426428
| ------------- | ---------------------------------------------------------------- |
427429
| `is_feature_filtered` | Boolean indicating if the gene or feature is filtered out in the normalized matrix but is present in the raw matrix |
430+
| `highly_variable` | Boolean indicating if the gene or feature is found in the highly variable gene list determined using `scran::modelGeneVar` and `scran::getTopHVGs`. Only present for `processed` objects |
428431

429432

430433
### AnnData experiment metadata
@@ -445,20 +448,21 @@ The `AnnData` object also includes the following additional items in the `.uns`
445448
| Item name | Contents |
446449
| ------------- | ---------------------------------------------------------------- |
447450
| `schema_version` | CZI schema version used for `AnnData` formatting |
451+
| `pca` | A dictionary object containing the parameters and variance weights associated with the PCA matrix found in `.obsm["X_pca"]`. Only available for processed objects |
448452

449453

450454
### AnnData dimensionality reduction results
451455

452-
The H5AD file containing the processed `AnnData` object (`_processed_rna.h5ad`) contains a slot `.obsm` with both principal component analysis (`X_PCA`) and UMAP (`X_UMAP`) results.
456+
The H5AD file containing the processed `AnnData` object (`_processed_rna.h5ad`) contains a slot `.obsm` with both principal component analysis (`X_pca`) and UMAP (`X_umap`) results stored as a `numpy.ndarray`.
453457
For all other H5AD files, the `.obsm` slot will be empty as no dimensionality reduction was performed.
454458

455459
For information on how PCA and UMAP results were calculated see the {ref}`section on processed gene expression data <processing_information:Processed gene expression data>`.
456460

457461
The following command can be used to access the PCA and UMAP results:
458462

459463
```python
460-
adata_object.obsm["X_PCA"] # pca results
461-
adata_object.obsm["X_UMAP"] # umap results
464+
adata_object.obsm["X_pca"] # pca results
465+
adata_object.obsm["X_umap"] # umap results
462466
```
463467

464468
### Additional AnnData components for CITE-seq libraries (with ADT tags)

0 commit comments

Comments
 (0)