Watch this space for upcoming dates for our “R BootCamp” (approximately 3 days) to be offered in Atlanta, GA in Summer 2016. Do not hesitate contact me if you are interested in being contacted once the class is announced. We can also work with you to develop individual training requests or to present on-site training sessions for your group or company.
Why R ?
Our primary training offerings relate to the R Programming language which is considered the lingua franca of Data Science, Bioinformatics, Statistics, and Quantitative Research in general. The R language has great utility to a large variety of professionals including scientists of all domains, business and financial analysts, sociologists, marketing professionals, and anyone with a need to wrangle, visualize, and analyze data. One does not need in-depth statistical skills to benefit from R. In fact many professionals use it just for its superior graphics capabilities and visualization packages that allow for interactive and dynamic dashboards. Of course for those with an analysis and statistical background, R offers a comprehensive set of tools to accomplish “one-and-done” analyses or complex studies. Since R includes a fully specified programming language users can create resuable objects and packages for easy redistribution to a larger community. To understand why R is such a powerful framework let’s first consider the steps involved in any type of analysis project.
Obtaining and Cleaning Data – Data comes in many forms and increasingly researchers and analysts will pull information directly from the Internet and/or Relational Database Management Systems such as MySQL, Oracle, or NoSQL data sources. R has a number of ways to interact directly with such resources and importing data has never been easier. Once you import the data you can use native data structures such as the intuitive data frame to filter out data or impute missing values. For many projects data cleaning can represent a majority of the effort ! Packages such as dplyr and data.table greatly simplify the manipulation of large files and connection to relational data.
Data Visualization – A major step in any analysis involves graphical representation of your data which in turn can require any number of different graph types to discern patterns and relationships in your data. R provides two major end-user graphics packages, (Base and lattice), to help you understand your data. However, add on packages such as ggplot2, leaflet, rCharts, and Shiny that facilitate the creation of publication quality interactive visualizations and graphics involving many different types of data including GIS, time series, financial, genomic, clinical, and many others. No other package or language provides as much support for graphics and visualization especially when considering all of these packages are open source.
Statistical Analysis and Machine Learning – R’s raison d’être is to provide a comprehensive suite of statistical analysis tools. However, the Comprehensive R Archive Network offers over 6,500 additional packages involving easy to use analysis and graphics tools to help make sophisticated analysis much easier. In fact some domains such as Epidemiology have specialized packages containing tools specific to that domain so that users can rapidly execute common forms of analyses. Additionally, R supports Machine Learning and visualization which means that the capability to create models to accomplish predictions and optimizations is part of the framework.
Reproducible Research – According to the National Institutes of Health:
Two of the cornerstones of science advancement are rigor in designing and performing scientific research and the ability to reproduce biomedical research findings.
This means that researchers using NIH funds are required to provide research findings in a way that allows others to reproduce the result. R provides the ability to author documents that embed code, data, and text in way that another can “execute” the documents to reproduce the published result. This also provides the ability to inspect any and all intermediate steps and, if desired, change the flow to experiment or test another analysis path. Even if your project does not require reproducibility as part of a deliverable using such documents makes collaborations easier and also simplifies revisions for older projects since everything required to regenerate the result is in one place.
In summary R provides all the tools and framework required to accomplish cutting-edge research although it is also appropriate for creating high impact visualizations and dashboards – all for free. Of course labor is required though pricey software licenses are not required and R is super easy to install on your laptop. If you are a “cloud person” and maintain and Amazon Web Services account you can use existing Amazon Machine Instances to boot up a working instance of RStudio whichis the de factor IDE for R. Please feel free to contact me if you have questions about R training.