Plasticity of cancer and normal alveolar cells in lung adenocarcinoma

Repository containing the code for the analyses shown in the manuscript.

The clinical and molecular data for the various datasets was retreived from the sources cited in the manuscript and preprocessed with standard pipelines as outlined in the Methods section.

All the steps are performed following lung_plasticity_pipeline.R, which calls functions from lung_plasticity_functions.R and lung_plasticity_utils.R

The functions to run Non-negative Matrix Factorization with patient cross-validation (cvNMF) to extract patient-independent transcriptional programs from malignant cells are stored in cvNMF.R . The input is a standard count matrix (genes x cells) and an annotation file with cell IDs as row names and a 'Sample' column with patient names. The code is in R (v4) and requires the installation of R packages Seurat (v4), reshape2, RcppML, Matrix, corrplot, igraph, clustree, tidygraph, msigdbr and fgsea.

The demo below runs cvNMF on a subset of the dataset compendium used in the paper, which is available under demo/data/.

library(Seurat)
library(reshape2)
library(RcppML)
library(Matrix)
library(corrplot)
library(igraph)
library(clustree)
library(tidygraph)
library(msigdbr)
library(fgsea)

source("cvNMF.R")
DataDir = "demo/"

dataset = "qkwlxb"
load( file = paste0(DataDir,dataset,"_annot_subset.RData") ) # a random subset of 20 patients drawn from the harmonized and preprocessed metadata of qian, kim, wu, laughney, xing, bischoff (aka 'qkwlxb') datasets
load( file = paste0(DataDir,dataset,"_counts_subset.RData") ) # the corresponding counts matrix

# Setting parameters for demo
OutDir = paste0("demo/cvNMF/test_01/")
dir.create(OutDir,recursive=T,showWarnings = F)
kvec = c(5:30) # in the paper: c(5:100)
ntrials = 10 # same as in the paper
unbalanced_size_margin = 0.3 # in the paper: 0.1
cor_thresh = 0.75 # same as in the paper
recurrence_ratio = 0.2 # in the paper: 0.5

# Preprocessing
cvNMF_preprocessing( annot, counts, OutDir )

# cvNMF runs
dcat( "Running cvNMF" )
load(paste0(OutDir,"annot.RData"))
load(file=paste0(OutDir,"preprocessed_expression_mat.RData"))
for (chosen_rank in kvec)
{
	dcat( paste0("Current rank = ",chosen_rank),1 )
	this_OutDir = paste0(OutDir,"cvNMF_runs/","rank",chosen_rank,"/")
	dir.create(this_OutDir,showWarnings = F, recursive = T)

	# Computing 'ntrials' NMF models of ~evenly-partitioned (in terms of cell numbers) samples, and comparing latent factors
	cvNMF_partitioned_runs( annot, snp0, chosen_rank, ntrials, unbalanced_size_margin, cor_thresh, this_OutDir )

	# Computing 'ntrials' NMF models with all patients, comparing with results of partitioned runs, and extracting robust latent factors
	cvNMF_allpatients_run(snp0, chosen_rank, ntrials, cor_thresh, recurrence_ratio, this_OutDir)
}

# Aggregating results into co-occurrence matrix
ambient_genes = read.table(paste0(DataDir,"ambient_genes.txt"),header=T,stringsAsFactors=F,sep='\t')
exclude_genes = ambient_genes$ambient_genes
cvNMF_adjmat_construction(OutDir, kvec, exclude_genes)

# Clustering the co-occurrence matrix
res_range = seq(0.1,2,0.1)
cvNMF_adjmat_clustering(OutDir, res_range, minimum_n_lfs = 5)
clustering_solution = get_most_stable_clustering_solution(OutDir)

# Loading the list of transcriptional programs
load(file = paste0(OutDir,"louvain_clustering/",clustering_solution,"/tps.RData"))

# Gene ontology analysis
gsea_enrichment(OutDir, tps, exclude_genes)

The demo should yield 7 transcriptional programs with functional enrichments similar to a subset of those reported in the manuscript.

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /home/daniele/miniconda3/envs/r4/lib/libopenblasp-r0.3.17.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] fgsea_1.21.2       msigdbr_7.5.1      tidygraph_1.2.2    clustree_0.5.0    
 [5] ggraph_2.1.0       ggplot2_3.5.2      igraph_2.0.2       corrplot_0.92     
 [9] Matrix_1.5-3       RcppML_0.5.4       reshape2_1.4.4     SeuratObject_4.1.3
[13] Seurat_4.3.0      

loaded via a namespace (and not attached):
  [1] Rtsne_0.16             colorspace_2.0-3       RcppEigen_0.3.4.0.0   
  [4] deldir_1.0-6           ellipsis_0.3.2         ggridges_0.5.4        
  [7] spatstat.data_3.0-0    leiden_0.4.3           listenv_0.8.0         
 [10] farver_2.1.1           graphlayouts_0.8.4     ggrepel_0.9.2         
 [13] fansi_1.0.3            codetools_0.2-18       splines_4.1.1         
 [16] knitr_1.41             polyclip_1.10-4        jsonlite_1.8.4        
 [19] ica_1.0-3              cluster_2.1.4          png_0.1-8             
 [22] uwot_0.1.14            ggforce_0.4.1          shiny_1.7.3           
 [25] sctransform_0.3.5      spatstat.sparse_3.0-0  compiler_4.1.1        
 [28] httr_1.4.4             fastmap_1.1.0          lazyeval_0.2.2        
 [31] cli_3.6.4              later_1.3.0            tweenr_2.0.2          
 [34] htmltools_0.5.4        tools_4.1.1            gtable_0.3.1          
 [37] glue_1.6.2             RANN_2.6.1             dplyr_1.1.4           
 [40] fastmatch_1.1-3        Rcpp_1.0.8.3           scattermore_0.8       
 [43] vctrs_0.6.5            babelgene_22.9         spatstat.explore_3.0-5
 [46] nlme_3.1-160           progressr_0.12.0       lmtest_0.9-40         
 [49] spatstat.random_3.0-1  xfun_0.35              stringr_1.5.0         
 [52] globals_0.16.2         mime_0.12              miniUI_0.1.1.1        
 [55] lifecycle_1.0.4        irlba_2.3.5.1          goftest_1.2-3         
 [58] future_1.29.0          MASS_7.3-58.3          zoo_1.8-11            
 [61] scales_1.3.0           promises_1.2.0.1       spatstat.utils_3.1-3  
 [64] parallel_4.1.1         RColorBrewer_1.1-3     reticulate_1.26       
 [67] pbapply_1.6-0          gridExtra_2.3          stringi_1.7.8         
 [70] BiocParallel_1.28.3    rlang_1.1.6            pkgconfig_2.0.3       
 [73] matrixStats_0.63.0     lattice_0.20-45        ROCR_1.0-11           
 [76] purrr_1.0.2            tensor_1.5             patchwork_1.1.2       
 [79] htmlwidgets_1.6.2      cowplot_1.1.1          tidyselect_1.2.0      
 [82] parallelly_1.32.1      RcppAnnoy_0.0.20       plyr_1.8.8            
 [85] magrittr_2.0.3         R6_2.5.1               generics_0.1.3        
 [88] DBI_1.1.3              pillar_1.9.0           withr_2.5.0           
 [91] fitdistrplus_1.1-8     survival_3.4-0         abind_1.4-5           
 [94] sp_1.5-1               tibble_3.2.1           future.apply_1.10.0   
 [97] KernSmooth_2.23-20     utf8_1.2.2             spatstat.geom_3.0-3   
[100] plotly_4.10.1          viridis_0.6.2          grid_4.1.1            
[103] data.table_1.14.6      digest_0.6.31          xtable_1.8-4          
[106] tidyr_1.2.1            httpuv_1.6.6           munsell_0.5.0         
[109] viridisLite_0.4.2     
>

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
demo		demo
LICENSE		LICENSE
README.md		README.md
cvNMF.R		cvNMF.R
luad_plasticity_functions.R		luad_plasticity_functions.R
luad_plasticity_pipeline.R		luad_plasticity_pipeline.R
luad_plasticity_utils.R		luad_plasticity_utils.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Plasticity of cancer and normal alveolar cells in lung adenocarcinoma

About

Uh oh!

Releases

Packages

Languages

License

CSOgroup/lung_plasticity

Folders and files

Latest commit

History

Repository files navigation

Plasticity of cancer and normal alveolar cells in lung adenocarcinoma

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages