Skip to content

Conversation

@shanibmo
Copy link
Collaborator

Following @vbrennsteiner 's suggestion, the function that preprocesses the pca results before plotting them in plot_pca, is now returning an anndata format in which var = PCs, obs = the obs/vars used for PCA (depending on dim_space).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the PCA plotting data preparation by replacing prepare_pca_data_to_plot with extract_pca_anndata, which now returns an AnnData object instead of a DataFrame. The file plot_data_handling.py has also been moved from the pl (plotting) module to the tl (tools) module.

Key changes:

  • New extract_pca_anndata function returns AnnData format with PCA coordinates as .X, original obs/var as .obs, and PC metadata as .var
  • Updated plot_pca to work with the new AnnData-based data structure
  • Removed validation of PC dimensions from _validate_pca_plot_input and moved it to plot_pca

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated no comments.

File Description
src/alphatools/tl/plot_data_handling.py Refactored prepare_pca_data_to_plot to extract_pca_anndata that returns AnnData instead of DataFrame; updated validation function signatures; removed import of data_column_to_array
src/alphatools/pl/plots.py Updated plot_pca to use new extract_pca_anndata function and work with AnnData structure; added import of data_column_to_array; renamed parameters from x_column/y_column to pc_x/pc_y
tests/tl/test_data_handling.py Updated import path from alphatools.pl to alphatools.tl; removed tests for old prepare_pca_data_to_plot function; added new tests for extract_pca_anndata
docs/api.md Added tl.extract_pca_anndata to API documentation
Comments suppressed due to low confidence (9)

src/alphatools/tl/plot_data_handling.py:76

  • Missing period and space in error message. Should be: f"PCA embeddings layer '{pca_embeddings_layer_name}' not found in data.{pca_coors_attr}. Found layers: {available_layers}"
    src/alphatools/tl/plot_data_handling.py:238
  • Debug print statement should be removed before merging to production code.
    src/alphatools/tl/plot_data_handling.py:231
  • Multiple issues with the var dim_space branch:
  1. Undefined variable: expr_cols should be expr_rows (defined on line 230)
  2. Incorrect subsetting: data[:, expr_cols] subsets columns (var_names), but for dim_space="var", we need to subset rows (obs_names). This should be data[expr_rows, :]
  3. The transposition and data extraction logic needs review for the var space case
    src/alphatools/tl/plot_data_handling.py:180
  • Typo in docstring: "Fetched" should be "Fetch" (imperative form for docstring).
    src/alphatools/tl/plot_data_handling.py:196
  • Documentation does not match return value: The docstring says the function returns "AnnData object containing PCA coordinates, color mapping, and labels" but the function doesn't add labels to the returned AnnData. Labels are handled separately in the calling code (plot_pca). The description should be updated to accurately reflect what the function returns.
    tests/tl/test_data_handling.py:372
  • Test coverage is less comprehensive than the previous prepare_pca_data_to_plot tests. Consider adding tests for:
  1. Different PC combinations (the old parametrized test checked pc_x/pc_y variations)
  2. Validation that the returned AnnData has the correct shape and structure for different input sizes
  3. Edge cases with single PC
  4. Expression columns with dim_space="var" (currently missing and where a bug exists)
    tests/tl/test_data_handling.py:317
  • Typo in comment: "costom" should be "custom".
    src/alphatools/tl/plot_data_handling.py:190
  • Incorrect type annotation in docstring: The parameter is documented as str | None but should be list[str] | None to match the actual parameter signature.
    src/alphatools/tl/plot_data_handling.py:231
  • Missing test coverage for extract_pca_anndata with expression_columns parameter when dim_space="var". The existing test test_extract_pca_anndata_with_expression_columns_obs only tests the obs case. Adding a test for the var case would have caught the bug in line 231.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.


pca_coor_df = prepare_pca_data_to_plot(
data, x_column, y_column, dim_space, embeddings_name, color_map_column, label_column, label=label
pca_anndata = extract_pca_anndata(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adata_pca? for consistency..

color_map_column: str | None = None,
color_column: str | None = None,
dim_space: str = "obs",
embeddings_name: str | None = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the TODO is no longer valid?

if not (1 <= pc_x <= n_pcs) or not (1 <= pc_y <= n_pcs):
raise ValueError(f"pc_x and pc_y must be between 1 and {n_pcs} (inclusive). Got {pc_x=}, {pc_y=}")
# Check if the variance layer exists in uns
if pca_var_key not in data.uns:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to adata?

raise ValueError(f"pc_x and pc_y must be between 1 and {n_pcs} (inclusive). Got {pc_x=}, {pc_y=}")
# Check if the variance layer exists in uns
if pca_var_key not in data.uns:
raise ValueError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was there a reason to move the if not (1 <= pc_x <= n_pcs) check?

var_df = pd.DataFrame(data.uns[pca_var_key])

pca_coor_df = pd.DataFrame(pca_coordinates[:, [dim1_z, dim2_z]], columns=["dim1", "dim2"])
pca_anndata = ad.AnnData(X=pca_coordinates)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adata_pca?

expr_data.index = data.var_names
pca_anndata.obs = pca_anndata.obs.join(expr_data)
else:
print("here")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accidental commit?

f"No matching expression columns found in the data.X for the specified PCA dim_space '{dim_space}' and {expression_columns}."
)

pca_anndata.var_names = [str(i + 1) for i in range(pca_anndata.X.shape[1])]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just "1"? not "pc_1"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants