This repository contains a collection of introduction slides and bioinf tutorials.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
chassenr 532873b33a updated R stats course material 1 year ago
.. added R stats course material session 1 1 year ago
R_stats_script.R updated R stats course material 1 year ago
R_stats_slides.pdf updated R stats course material 1 year ago
machine_learning_slides.pdf updated R stats course material 1 year ago

Statistics in R

Disclaimer: I apologize if most of the examples that I pick for the course come from microbial ecoology and sequence data analysis. Many concepts that will be presented are not subject-specific, however the workshop will have a strong focus on molecular data. The methods presented in the workshop are a small subset of what is available for data analysis. The workshop will not cover spatial analysis, time series analysis or additive models to name just a few more advanced methods, which I know exist, but which I am not yet sufficiently familiar with to explain to others. The workshop will focus on the concepts (not the math) behind the different statistical approaches and the implementation in R. We will not cover the basics of using R, so you should be familiar with loading data into R and how to set R data and object types correctly.


The workshop will consist of 3 sessions: Additionally, there is the option to work on your own data in break-out rooms. If you don't have your own data set, feel free to bring data from previous studies similar to what you will be working with. If you were not able to attend one of the sessions, the break-out rooms can also be used for a short recap. If there are no objections, the workshop sessions will be recorded.

Morning (9-12) Afternoon (13-16)
Mon, 25.04.2022 session 1 break-out
Tue, 26.04.2022 session 2 break-out
Wed, 27.04.2022 session 3
Thu, 28.04.2022 break-out

Sessions (preliminary)

Session 1: Univariate statistics

  • Data exploration
  • Finding the most suitable statistical approach
  • Assumptions of statistical tests
  • Correlation
  • Parametric tests: t-test, ANOVA, linear regression
  • Non-parametric tests: Wilcoxon, Kruskal Wallis
  • Multiple testing and post-hoc tests
  • General mixed models and repeated measurements

Session 2: Multivariate statistics

  • Dissimilarity metrics: Euclidean, Bray-Curtis, Jaccard
  • Hierarchical clustering
  • Ordination: PCA, NMDS, PCoA
  • ANOSIM, SIMPER, PERMANOVA, RDA (incl. mixed model approaches implemented via restricted permutation tests)
  • Mantel and Procrustes tests
  • Specific to sequencing data: compositionality
  • Differential abundance analysis (ALDEx2)

Session 3: Miscellaneous

  • Q&A session 1 and 2
  • Co-occurrence network analysis
  • Machine learning (by Theodor Sperlea)

Course material: