01. Texts & resources
There are some basic data analysis texts that are very useful for this course.
(0) A very basic introduction to R and RStudio.
A very basic introduction to Mathematica
Software application notes(JASP, Mathematica, PSPP, Python, R & RStudio; version 9/14/2025)
(1) Exploratory Data Analysis with R by Roger Peng.
This book is available as a PDF through leanpub.com where you have several options for the material, and several pricing options (you choose the price within the given range):
- The Book (PDF): $0 – $30. If $0 is your option you can also get a PDF of the book here.
- The Book (PDF) + Datasets + R Code Files: $15 – $50
- The Book (PDF) + Lecture Videos (HD) + Datasets + R Code: $30 – $70
You can also obtain a paperback version from Lulu.com for $20 + shipping + tax
Roger Peng EDA:
(2) R for Data Science by Garrett Grolemund & Hadley Wickham
The book is available as:
- An HTML version (free)
- A PDF produced from the HTML version (free)
- Paperback via Amazom.com (price variable) or through dealoz.com (range of prices and suppliers)
(3) Leek, J. (2015). The Elements of Data Analytic Style. J. Leek.—Amazon Digital Services, Inc.
This book is freely available as a PDF.
Other resources
- Exploratory Data Analysis. Engineering Statistics Handbook
- Exploratory data analysis. Wikipedia
Articles to orient you to Exploratory Data Analysis
- Behrens, J. T., & Yu, C. H. (2003). Exploratory data analysis. Handbook of Psychology.
- Grolemund, G., & Wickham, H. (2014). A cognitive interpretation of data analysis. International Statistical Review, 82(2), 184-204.
- Hoaglin, D. C. (2003). John W. Tukey and data analysis. Statistical Science, 311-318.
- Morgenthaler, S. (2009). Exploratory data analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 1(1), 33-44.
- Sarmento, R.P. & Costa, V. (2019) An overview of statistical data analysis. arXiv: 1908.07390
- Tukey, J. W. (1980). We need both exploratory and confirmatory. The American Statistician, 34(1), 23-25.
R programming
- Peng, R. D. (2015). R programming for data science. Lulu. com.
This book is available as a PDF through leanpub.com where you have several options for the material, and several pricing options (you choose the price within the given range):
- The Book (PDF): $0 – $40
- The Book (PDF) + Datasets + R Code Files: $25 – $50
- The Book (PDF) + Lecture Videos (HD) + Datasets + R Code: $30 – $70
You can also obtain a paperback version from Lulu.com for $20 + shipping + tax
R Graphics
- Winston Chung has many examples of R graphics on his web page that come from his book R Graphics Cookbook.
- Hadley Wickam, the author of ggplot, has a book devoted to ggplot2.
Cluster analysis
- Cluster Analysis in R
- Cluster Analysis in R – Examples and Case Studies
- Types of Clustering Methods – Overview and Quick Start R Code
- K-means Cluster Analysis – R Programming Guide
- Practical Guide to Cluster Analysis in R
Odds ratios
- Szumilas, M. (2010). Explaining odds ratios. Journal of the Canadian academy of child and adolescent psychiatry, 19(3), 227.
- Welch, H. G. (2000). Primer on probability and odds and interpreting their ratios. Effective Clinical Practice, 3, 145-156.
- Odds ratio (2018) Wikipedia.
- Bland, J. M., & Altman, D. G. (2000). The odds ratio. BMJ, 320(7247), 1468.
Logistic & Probit Regression
Logistic
- Logistic Regression (2018) Wikipedia
- Peng, C. Y. J., Lee, K. L., & Ingersoll, G. M. (2002). An introduction to logistic regression analysis and reporting. The journal of educational research, 96(1), 3-14.
- Rodríguez, G. (2007). Lecture Notes on Generalized Linear Models. Chapter 3: Logit Models for Binary Data.
- Bursac, Z., Gauss, C. H., Williams, D. K., & Hosmer, D. W. (2008). Purposeful selection of variables in logistic regression. Source code for biology and medicine, 3(1), 17.
- Josh Starmer
- Video: Logistic Regression in R, Clearly Explained!!!! This video describes how to do Logistic Regression in R, step-by-step. We start by importing a dataset and cleaning it up, then we perform logistic regression on a very simple model, followed by a fancy model. Lastly we draw a graph of the predicted probabilities that came from the Logistic Regression.
- Video: Logistic Regression Details Pt1: Coefficients When you do logistic regression you have to make sense of the coefficients. These are based on the log(odds) and log(odds ratio), but, to be honest, the easiest way to make sense of these are through examples. In this StatQuest, I walk you though two Logistic Regression Examples, step-by-step, and show you exactly how the coefficients are derived and how to interpret them.
- Video: Logistic Regression Details Pt 2: Maximum Likelihood This video follows from where we left off in Part 1 in this series on the details of Logistic Regression. This time we’re going to talk about how the squiggly line is optimized to best fit the data.
- Video: Logistic Regression Details Pt 3: R-squared and p-value This video follows from where we left off in Part 2 in this series on the details of Logistic Regression. Last time we saw how to fit a squiggly line to the data. This time we’ll learn how to evaluate if that squiggly line is worth anything. In short, we’ll calculate the R-squared value and it’s associated p-value.
 
- Brandon Foltz (2015)
- Video: Logistic Regression, An Introduction
- Video: Logistic Regression Probability, Odds, and Odds Ratio
- Video: Logistic Regression, Logit and Regression Equation
- Video: Logistic Regression, Estimating the Probability
- Video: Logistic Regression, Odds Ratio for Any Interval
- Video: Logistic Regression in Excel / Google Sheets, PC / Mac
 
- Smith, T. J., & McKenna, C. M. (2013). A comparison of logistic regression pseudo R2 indices. Multiple Linear Regression Viewpoints, 39(2), 17-26.
Probit
- Probit Model (2018) Wikipedia
Communication
- Gray, J., Chambers, L., & Bounegru, L. (2012). The data journalism handbook: How journalists can use data to improve the news. O’Reilly Media, Inc.
- Peng, R.D. (2015) Report Writing for Data Science in R (freely downloadable from LeanPub)
- Nolan, D., & Speed, T. P. (2001). Stat labs: mathematical statistics through applications. Appendix A: Writing Lab Reports. Springer Science & Business Media.
