TRGN 527: Applied Data Science and Bioinformatics

Bioinformatics skills have become an inherent component of life-science research and yet the majority of life science researchers lack basic skills in data analysis and interpretation, and especially in data management, even though such skills are essential to many research projects today. This course will provide students from non-quantitative backgrounds with the skill sets for applying data science and bioinformatics tools in the study of human health and disease using R and Bioconductor. This course is intended for students who are not experts in either data science or bioinformatics. Teaching approaches will alternate between lecture and in-class analysis workshops that will focus on to the selection and statistical analysis of large publicly available data sets. Topics will include basic statistics, hypothesis testing, both parametric and non-parametric analyses (e.g., such as hierarchal clustering and principal component analysis), linear regression analysis, data normalization, reproducibility/sensitivity analysis, multiple test correction, and power assessment Finally, the course will provide an introductory exposure to command-line and Unix-based large-scale data processing, complementing the use of R and Bioconductor as tools for conducting and reproducing analysis frequently required in scientific journals.