-
Notifications
You must be signed in to change notification settings - Fork 0
Return pca results in AnnData format #132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the PCA plotting data preparation by replacing prepare_pca_data_to_plot with extract_pca_anndata, which now returns an AnnData object instead of a DataFrame. The file plot_data_handling.py has also been moved from the pl (plotting) module to the tl (tools) module.
Key changes:
- New
extract_pca_anndatafunction returns AnnData format with PCA coordinates as.X, original obs/var as.obs, and PC metadata as.var - Updated
plot_pcato work with the new AnnData-based data structure - Removed validation of PC dimensions from
_validate_pca_plot_inputand moved it toplot_pca
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/alphatools/tl/plot_data_handling.py | Refactored prepare_pca_data_to_plot to extract_pca_anndata that returns AnnData instead of DataFrame; updated validation function signatures; removed import of data_column_to_array |
| src/alphatools/pl/plots.py | Updated plot_pca to use new extract_pca_anndata function and work with AnnData structure; added import of data_column_to_array; renamed parameters from x_column/y_column to pc_x/pc_y |
| tests/tl/test_data_handling.py | Updated import path from alphatools.pl to alphatools.tl; removed tests for old prepare_pca_data_to_plot function; added new tests for extract_pca_anndata |
| docs/api.md | Added tl.extract_pca_anndata to API documentation |
Comments suppressed due to low confidence (9)
src/alphatools/tl/plot_data_handling.py:76
- Missing period and space in error message. Should be:
f"PCA embeddings layer '{pca_embeddings_layer_name}' not found in data.{pca_coors_attr}. Found layers: {available_layers}"
src/alphatools/tl/plot_data_handling.py:238 - Debug print statement should be removed before merging to production code.
src/alphatools/tl/plot_data_handling.py:231 - Multiple issues with the var dim_space branch:
- Undefined variable:
expr_colsshould beexpr_rows(defined on line 230) - Incorrect subsetting:
data[:, expr_cols]subsets columns (var_names), but fordim_space="var", we need to subset rows (obs_names). This should bedata[expr_rows, :] - The transposition and data extraction logic needs review for the var space case
src/alphatools/tl/plot_data_handling.py:180
- Typo in docstring: "Fetched" should be "Fetch" (imperative form for docstring).
src/alphatools/tl/plot_data_handling.py:196 - Documentation does not match return value: The docstring says the function returns "AnnData object containing PCA coordinates, color mapping, and labels" but the function doesn't add labels to the returned AnnData. Labels are handled separately in the calling code (plot_pca). The description should be updated to accurately reflect what the function returns.
tests/tl/test_data_handling.py:372 - Test coverage is less comprehensive than the previous
prepare_pca_data_to_plottests. Consider adding tests for:
- Different PC combinations (the old parametrized test checked pc_x/pc_y variations)
- Validation that the returned AnnData has the correct shape and structure for different input sizes
- Edge cases with single PC
- Expression columns with
dim_space="var"(currently missing and where a bug exists)
tests/tl/test_data_handling.py:317
- Typo in comment: "costom" should be "custom".
src/alphatools/tl/plot_data_handling.py:190 - Incorrect type annotation in docstring: The parameter is documented as
str | Nonebut should belist[str] | Noneto match the actual parameter signature.
src/alphatools/tl/plot_data_handling.py:231 - Missing test coverage for
extract_pca_anndatawithexpression_columnsparameter whendim_space="var". The existing testtest_extract_pca_anndata_with_expression_columns_obsonly tests the obs case. Adding a test for the var case would have caught the bug in line 231.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
|
|
||
| pca_coor_df = prepare_pca_data_to_plot( | ||
| data, x_column, y_column, dim_space, embeddings_name, color_map_column, label_column, label=label | ||
| pca_anndata = extract_pca_anndata( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adata_pca? for consistency..
| color_map_column: str | None = None, | ||
| color_column: str | None = None, | ||
| dim_space: str = "obs", | ||
| embeddings_name: str | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the TODO is no longer valid?
| if not (1 <= pc_x <= n_pcs) or not (1 <= pc_y <= n_pcs): | ||
| raise ValueError(f"pc_x and pc_y must be between 1 and {n_pcs} (inclusive). Got {pc_x=}, {pc_y=}") | ||
| # Check if the variance layer exists in uns | ||
| if pca_var_key not in data.uns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to adata?
| raise ValueError(f"pc_x and pc_y must be between 1 and {n_pcs} (inclusive). Got {pc_x=}, {pc_y=}") | ||
| # Check if the variance layer exists in uns | ||
| if pca_var_key not in data.uns: | ||
| raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was there a reason to move the if not (1 <= pc_x <= n_pcs) check?
| var_df = pd.DataFrame(data.uns[pca_var_key]) | ||
|
|
||
| pca_coor_df = pd.DataFrame(pca_coordinates[:, [dim1_z, dim2_z]], columns=["dim1", "dim2"]) | ||
| pca_anndata = ad.AnnData(X=pca_coordinates) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adata_pca?
| expr_data.index = data.var_names | ||
| pca_anndata.obs = pca_anndata.obs.join(expr_data) | ||
| else: | ||
| print("here") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accidental commit?
| f"No matching expression columns found in the data.X for the specified PCA dim_space '{dim_space}' and {expression_columns}." | ||
| ) | ||
|
|
||
| pca_anndata.var_names = [str(i + 1) for i in range(pca_anndata.X.shape[1])] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just "1"? not "pc_1"?
Following @vbrennsteiner 's suggestion, the function that preprocesses the pca results before plotting them in
plot_pca, is now returning ananndataformat in whichvar= PCs,obs= the obs/vars used for PCA (depending ondim_space).