Statistics
Online Courses
-
Introduction to Probability and Statistics, MIT OpenCourseWare
This course provides an elementary introduction to probability and statistics with applications. Topics include: basic combinatorics, random variables, probability distributions, Bayesian inference, hypothesis testing, confidence intervals, and linear regression.
Books covering general topics
-
Statistical Inference, Casella and Berger
A good introduction to graduate level statistics.
-
Elements of Statistical Learning, Hastie, Tibshirani, and Friedman
The textbook showing the rigorous math behind many ML techniques. Available as a free PDF.
-
Probabilistic Programming and Bayesian Methods for Hackers, Cam Davidson-Pilon
A great introduction to Probabilistic programming and bayesian methods, e.g. Monte Carlo, MCMC, etc. lots of examples using PyMC3. Free on github.
Videos
-
A popular youtube that covers lots of statistical concepts.
Odds & Ends
-
Common Probability Distributions: The Data Scientist’s Crib Sheet
Data scientists have hundreds of probability distributions from which to choose. Where to start? Includes descriptions of the following distributions: Bernoulli and Uniform, Binomial and Hypergeometric, Poisson, Geometric and Negative Binomial, Exponential and Weibull, Normal, Log-Normal, Student’s t, and Chi-squared, Gamma and Beta.
-
A convenient way to see if samples are from a probability distribution. Can compare one group of samples to a known distribution or a set of 2 samples to each other. Positives: Non-parametric can compare two samples without knowing the underlying distributions. Downsides, 1D distributions only, hard to implement multi dimensional methods.
-
Heteroscedasticity vs Homoscedasticity
If the underlying distribution is heteroscedastic it may mess up methods that assume variance is uniform and uncorrelated like goodness of fits in regression problems. For more info, check out this epsiode of the Data Skeptic Podcast.