README

Acute Myeloid Leukemia Heatmap Analysis

This repository contains an R notebook for analyzing acute myeloid leukemia (AML) RNA-sequencing data and generating annotated heatmaps for gene expression clustering analysis.

Overview

This analysis uses RNA-sequencing data from 19 AML model mice samples to create clustered heatmaps that visualize gene expression patterns. The data comes from Shih et al., 2017 and has been pre-processed by refine.bio.

The analysis focuses on:

Gene expression clustering
Sample clustering
Treatment and mutation annotation
High-variance gene selection

Dataset Information

Source: refine.bio experiment SRP070849
Samples: 19 AML model mice samples
Data type: RNA-sequencing (quantile normalized)
Mutations studied: IDH2, TET2, and wild-type (WT)
Treatments:
- IDH2 mutant AML: Vehicle or AG-221
- TET2 mutant AML: Vehicle or 5-Azacytidine (Decitabine)

Prerequisites

R Version

R >= 3.6.0 (recommended)

Required R Packages

# Core packages (auto-installed by script)
pheatmap
magrittr
readr
dplyr
tibble

# Optional for session info
sessioninfo

Installation

Clone this repository:

git clone <repository-url>
cd aml-heatmap-analysis

Install required R packages (if not already installed):

# Run in R console
if (!("pheatmap" %in% installed.packages())) {
  install.packages("pheatmap", update = FALSE)
}
install.packages(c("magrittr", "readr", "dplyr", "tibble", "sessioninfo"))

Usage

Data Structure

Ensure your data files are organized as follows:

project/
├── data/
│   └── SRP070849/
│       ├── SRP070849.tsv          # Gene expression matrix
│       └── metadata_SRP070849.tsv  # Sample metadata
├── plots/                          # Generated plots (auto-created)
├── results/                        # Analysis results (auto-created)
└── analysis.Rmd                   # Main analysis notebook

Running the Analysis

Option 1: Run the entire notebook

# In RStudio
# Open the .Rmd file and click "Run All"
# Or knit to HTML: Ctrl+Shift+K (Windows) or Cmd+Shift+K (Mac)

Option 2: Run individual sections

# Load libraries
library(pheatmap)
library(magrittr)
set.seed(12345)

# Read data
metadata <- readr::read_tsv("data/SRP070849/metadata_SRP070849.tsv")
expression_df <- readr::read_tsv("data/SRP070849/SRP070849.tsv") %>%
  tibble::column_to_rownames("Gene")

# Generate heatmap
variances <- apply(expression_df, 1, var)
upper_var <- quantile(variances, 0.75)
df_by_var <- data.frame(expression_df) %>%
  dplyr::filter(variances > upper_var)

# Create annotation
annotation_df <- metadata %>%
  dplyr::mutate(
    mutation = dplyr::case_when(
      startsWith(refinebio_title, "TET2") ~ "TET2",
      startsWith(refinebio_title, "IDH2") ~ "IDH2",
      startsWith(refinebio_title, "WT") ~ "WT",
      TRUE ~ "unknown"
    )
  ) %>%
  dplyr::select(refinebio_accession_code, mutation, refinebio_treatment) %>%
  tibble::column_to_rownames("refinebio_accession_code")

# Generate heatmap
heatmap_annotated <- pheatmap(
  df_by_var,
  cluster_rows = TRUE,
  cluster_cols = TRUE,
  show_rownames = FALSE,
  annotation_col = annotation_df,
  main = "Annotated Heatmap",
  colorRampPalette(c("deepskyblue", "black", "yellow"))(25),
  scale = "row"
)

Command Examples

Basic heatmap generation:

# Create simple heatmap without annotation
basic_heatmap <- pheatmap(df_by_var, scale = "row")

Customized heatmap:

# Heatmap with custom colors and clustering
custom_heatmap <- pheatmap(
  df_by_var,
  cluster_rows = TRUE,
  cluster_cols = TRUE,
  clustering_distance_rows = "euclidean",
  clustering_method = "complete",
  color = colorRampPalette(c("blue", "white", "red"))(50),
  scale = "row"
)

Save heatmap to different formats:

# Save as PNG
png("plots/my_heatmap.png", width = 800, height = 600)
print(heatmap_annotated)
dev.off()

# Save as PDF
pdf("plots/my_heatmap.pdf", width = 10, height = 8)
print(heatmap_annotated)
dev.off()

Output Files

The analysis generates the following files:

Results Directory (`results/`)

top_90_var_genes.tsv: High-variance genes used for clustering

Plots Directory (`plots/`)

aml_heatmap.png: Annotated heatmap visualization

Key Features

Gene Filtering: Selects genes with variance in the upper quartile (75th percentile)
Sample Annotation: Automatically annotates samples by mutation type and treatment
Clustering: Performs hierarchical clustering on both genes and samples
Visualization: Creates publication-ready heatmaps with color-coded annotations

Customization Options

Gene Selection Criteria

# Select top 100 most variable genes
top_genes <- head(order(variances, decreasing = TRUE), 100)
df_top_genes <- expression_df[top_genes, ]

# Select genes with specific fold change
# (requires additional differential expression analysis)

Color Schemes

# Alternative color palettes
colors_viridis <- viridis::viridis(25)
colors_rcolorbrewer <- RColorBrewer::brewer.pal(11, "RdYlBu")
colors_custom <- c("navy", "white", "firebrick")

Clustering Methods

# Different clustering options
pheatmap(df_by_var,
  clustering_distance_rows = "correlation",  # or "euclidean", "maximum", etc.
  clustering_method = "ward.D2"             # or "complete", "average", etc.
)

Troubleshooting

Common Issues

File not found errors

# Check if files exist
file.exists("data/SRP070849/SRP070849.tsv")
file.exists("data/SRP070849/metadata_SRP070849.tsv")

Memory issues with large datasets

# Increase memory limit (Windows)
memory.limit(size = 8000)  # 8GB

# Use data.table for large files
library(data.table)
expression_df <- fread("data/SRP070849/SRP070849.tsv")

Package installation issues

# Install from Bioconductor if needed
if (!require("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install("package_name")

Citation

If you use this analysis in your research, please cite:

Original paper: Shih et al., 2017. PMID: 28193779
refine.bio: https://www.refine.bio/
pheatmap package: Kolde R (2019). pheatmap: Pretty Heatmaps. R package version 1.0.12.

License

This analysis is adapted from the refine.bio-examples repository by CCDL for ALSF and modified by Candace Savonen.

Support

For questions about the analysis or issues with the code, please:

Check the troubleshooting section above
Review the original refine.bio examples
Open an issue in this repository

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
data		data
docker		docker
plots		plots
renv		renv
results		results
.gitignore		.gitignore
00-download-data.py		00-download-data.py
01-heatmap.Rmd		01-heatmap.Rmd
01-heatmap.nb.html		01-heatmap.nb.html
README.md		README.md
renv.lock		renv.lock
run_analysis.sh		run_analysis.sh

Uh oh!

Uh oh!

compbio/itcr-2025-reproducibility

Folders and files

Latest commit

History

Repository files navigation

README

Acute Myeloid Leukemia Heatmap Analysis

Overview

Dataset Information

Prerequisites

R Version

Required R Packages

Installation

Usage

Data Structure

Running the Analysis

Option 1: Run the entire notebook

Option 2: Run individual sections

Command Examples

Basic heatmap generation:

Customized heatmap:

Save heatmap to different formats:

Output Files

Results Directory (results/)

Plots Directory (plots/)

Key Features

Customization Options

Gene Selection Criteria

Color Schemes

Clustering Methods

Troubleshooting

Common Issues

Citation

License

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Results Directory (`results/`)

Plots Directory (`plots/`)

Packages