Skip to content

scmcd24/DataScience_Reference

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataScience_Reference

A collection of tools, visualizations, snippets, and thoughts for implementing and explaining data science concepts.

Sections:

  • Introductory Statistics

    • Quantiles / Percent Rank calculation
    • Measures of Central Tendency
      • Mean (arithmetic, geometric, harmonic)
      • Median
      • Mode
    • Measures of Dispersion:
      • Range
      • IQR
      • Variance
      • Standard Deviation
    • Skew
    • Kurtosis
    • Z scores & Confidence Intervals
    • T tests
    • Effect sizes
      • Cohen's data
      • Cramer's V
    • Tests of Normality
    • Non-Parametric Tests
      • Chi-Square Test
    • Heteroskedasticity
  • Probability

    • Basics
    • Unions, intersections, conditional probability
    • PDF, CDF
    • Marginal / joint / conditional distributions
  • Linear Algebra & Calculus

    • Matrix 101: determinant, rank, etc.
    • Eigenvectors & Eigenvalues (Eigendecomp)
    • Hessian, Jacobian, Laplacian matrix
    • Properties of matrices (rank, determinant, etc)
  • Programming

    • Python
    • R
    • Java
    • Linux / Bash
    • Slurm
  • Software & Tools

    • Conda / Anaconda
    • Virtual Environments
    • Docker / Singularity
    • Git/github
  • EDA, Data Cleaning, & Feature Engineering

    • Scaling: log, sqrt, minmax
    • one-hot encoding
  • Classic Statistical Tools (maybe combine with Intro)

    • Correlation: Pearson, Spearman, Kendall's tau, distance
    • Confusion matrix
    • Power, Type 1 error, type 2 error, alpha
    • Metrics:
      • MSE, MAE, R2, Adjusted R2, Pseudo R2, AUC, BIC
    • Distributions & Thier attributes:
      • Gamma
      • Cauchy
      • Normal, Standard Normal
      • Uniform
      • Binomial
      • Poisson
      • Bernoulli
      • Kernel density estimation
    • QQ plot
  • Advanced Statistics (??)

    • Linear Mixed Modeling
    • Mediation vs Moderation; Interactions, Effects
    • Tukey's Hinges, Ladder of Power
    • Family-Wise Error & Bonferroni correction
  • Data Mining

    • Clustering Methods
      • Spectral
      • DBSCAN
      • Hierarchical Clustering
      • Kmeans / Kmedoid
      • Elbow curve vs Silhoutte score
      • Affinity propogation
    • Regression
      • Linear
      • Logisitic
      • Gamma
      • Poisson
      • Quantile
    • Decision Trees / Ensemble Methods
      • Random Forest
      • AdaBoost / XGBoost
      • Bagging / Boosting / Bootstrapping
      • Gridsearch vs random search
      • Entropy, Gini index
      • CART
    • Support Vector Machine
    • k Nearest Neighbors
  • Machine Learning

    • Maximum Likelihood Estimation
    • Taylor Series/ Newton-Raphson
    • Expectation Maximization (EM)
      • Find maximum likelihood solutions for models with latent variables
    • Log-Likelihood
    • Survival Analysis
    • ICA / PCA / t-SNE / UMAP
    • Non-negative matrix factorization
  • Time Series

    • White noise / random walk, etc.
    • Stationarity
    • ADF vs PACF
    • ARIMA Suite
    • Test of Causality (Granger, etc.)
  • MLOps

  • Bayesian Statistics

  • Deep Learning

    • Activation functions
    • Loss functions
    • Glorot Uniform Initializer
    • Mixture of Experts
    • Attention
    • Architectures (CNN, RNN, UNet, Transformer, diffusion, autoencoder, adversarial)
    • feed forward
    • backpropogation
    • optimizers (Adam, etc.)
  • NLP

    • Similarity Metrics: Jaccard, cosine, Manhattan, Euclidean, Hamming, Minkowski, etc.
    • Kullback-Leibler divergence
    • Latent Dirichlet allocation
    • Sentiment analysis
    • Named Entity Recognition (NER)
  • Computer Vision

    • Image preprocessing
    • Image registration
    • segmentation
  • Practice Sets

  • Data Visualization

    • Voronoi diagrams
    • QQ Plots
  • Other (dump):

    • SHAPP values
    • Monte Carlo // Markov chain
    • Monte cristo problem
    • Kolmogorov-Smirnoff test
    • Wilcoxon
    • Cronbach's alpha
    • ROC curves
    • precision /recall curves
    • opencv
    • internal covariance shift
    • causality analysis
  • Mixture Models:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published