Skip to main content

Software & Databases

General Resources:

DAGitty

DAGitty is a browser-based environment for creating, editing, and analyzing causal models (also known as directed acyclic graphs or causal Bayesian networks). The focus is on the use of causal diagrams for minimizing bias in empirical studies in epidemiology and other disciplines 

Princeton University Multilevel Models:

This is a home page of a course developed by German Rodriguez in 2017, a population statistician, at Princeton University. The following resource houses multilevel statistical methods commonly used among statistic professionals. Linear Models, Logit Models, Bayesian Models, and Survival Models are illustrated through statistical software’s STATA and R.

UCLA institute for digital research and education:

This resource is ideal for any researcher who wants to find statistical answers through SAS, STATA, SPSS, and R. Common statistical tests are represented among all four software’s. Codes are listed for each statistical test.

University of Virginia training:

This website offers workshops created for the intent to train statisticians in data analysis, computation, and software. One must sign up for the workshop in order to participate. They range from introductory classes to R and RStudio to Research Metrics for Academics.

 

REDCap:

SDBC REDCap Collection and Analysis Resource

Our Data Collection and Analysis section covers some points about REDCap that will be helpful in your study.

REDCap

Platform used for the creation of databases and surveys.

 

    This is our library for R Studio packages and program training

R:

R Studio:

Website allows for the free download of the main Integrated Development Environment R uses, which is RStudio.

R for data science:

This is an online textbook containing many functions of R. Including how to import data, factors, vectors, and more.

A Handbook of Statistical Analyses Using R:

Handbook providing many R syntaxes describing Statistical Analyses, more specifically, Survival Analysis. The handbook contains the complementary graphical and statistical outputs to each syntax. 

Advanced R:

Advanced R is targeted toward a more experienced R User Audience. It was created to better equipped R Users with the knowledge they need to improve their programming skills. Within each tab is a topic containing a quiz, outline of proposed concepts that will found on the page, and the complementary statistical outputs and graphics accompanying the syntaxes provided throughout the page.

Building Packages:

This .com contains six video tutorials explaining how to build R packages.

Cookbook for R (for investigation):

Cookbook for R provides solutions to common tasks such as generating random numbers, creating strings from variables, manipulating data frames, creating scatter plots and many more functions. 

CRAN R Project:

Free download of R is offered via CRAN. The CRAN is up to date web servers used by R to efficiently perform statistical tests, time series analyses, and further statistical modeling.

 Flutterbys R Studio:

Website containing workshops and tutorials teaching R users how to perform the R-basics, R-graphics, linear models, nonlinear models, and multivariate analyses.

A ggplot2cheatsheet:

This website is an R User Cheat sheet for programmers interested in creating graphics. It provides statements to create titles, changing background colors, colors, themes, and other representations.

Graphics with R:

This is an online statistical manual, specifically chapter 3, “Graphics with R.” The chapter is split into two main parts: low level graphics, and high-level graphics. Example of low-level graphics include syntaxes to add points to a plot or adding arrows to a plot. Example of high-level graphics include how to construct a scatter plot or how to create high density needle plots.

HSAUR2: A Handbook of Statistical Analyses Using R (2nd Edition):   

This is a handbook of statistical analyses using R.  

The Pirates Guide to R:

The Pirates Guide to R created by Nathanial Phillips in 2018 contains chapters explaining why the book was created and who he is. The remainder of the online book contains chapters explaining how to install R, describing statistics, hypothesis testing in R, regression analysis, to mid-range level statistics like creating matrices and data frames within R Studio.

Princeton University:

Website created by German Rodriguez includes information introducing R, typing in the R Console, calculating with the R Console, formatting variables, plotting data, illustrating multiple regression, and other statistical concepts. The syntaxes are provided to reproduce the examples in one’s own RStudio.

R Graph Gallery:

Reproducible code is provided on this website to create graphics for distribution, correlation, rankings, part of a whole, evolution, maps, and flow (including chord and Sankey diagrams.)

UCLA R Studio:

UCLA’s statistical website containing information on upcoming R Webinars and Seminars, common code fragments from R, data analysis information, reference books for R, and frequently asked questions for R users.

  

   This is our library for SAS packages and program training

SAS:

SAS

SAS online home website.­­

SAS Documentation Home

SAS Documentation Home Page

SAS Blog

SAS Blog containing common SAS commands such as simulating multivariate outliers, or how to detect character variables.

    SAS Blog-Graphically Speaking

    SAS Blog containing common SAS commands for SAS Graphics.

    SAS Dummy

    Common SAS blog for any questions or demonstrations that do not fall within the prior two

 

SAS Mac

Website created by Michael Friendly at York University containing many macros for SAS Software, SAS graphics, categorical data, and much more.

Lex Jansen

Contains all papers related to the SAS Conference Proceedings all the way from 1976.

Longitudinal Data and SAS: A Programmer's Guide:

This is a chapter in the book Longitudinal Data and SAS: A Programmer’s Guide. It provides examples of datasets with complimentary SAS code to illustrate.

PROC POWER:

This is a SAS User Guide from SAS, specifically Chapter 89, titled The POWER Procedure. As one would assume this is about computing the POWER for statistical tests like One Sample t Tests and Two Sample t Tests. Other SAS code for equivalency testing, and Confidence Intervals for the mean are also provided.

SAS Graphic Programs:

Website created by Michael Friendly at York University. Page provides SAS statements to create many graphics used for statistical description such as boxplots and dot plots for univariate data and two-way tables for categorical data.

UCLA SAS Page

UCLA’s statistical website containing information on upcoming R Webinars and Seminars, common code fragments from R, data analysis information, reference books for R, and frequently asked questions for R users.

 

   This is our library for STATA packages and program training

STATA:

STATA Home

Official STATA Website containing tabs for STATA products, purchase information, learning information, support, and company information.

Princeton University STATA:

Page created by German Rodriguez titled “Stata Tutorial”. It provides a tutorial of Stata, how to use STATA as a calculator with a simple command, getting help, descriptive statistics, graphics, and more.

Teaching with STATA

This website includes teaching resources published by the Company for STATA users to access. Resources include a readily accessible YouTube channel, Net Courses, Cheat Sheets, Blogs, and downloadable datasets to practice and play with STATA. This webpage also provides licenses available for students, these including: Perpetual license, 6-month license, 1-week license, and campus licenses

Sub Resources Within:

NetCourse
This source provides many NetCourses that run for 7 weeks introducing state and finishing with survival          analysis. Prices start at $95 and run to $295.

Classroom and Web

Classroom and web training are offered for enrollment, prices are either $950 to $1295. These being web based and in Austin, Texas.

On-site Training

On-site training is provided for STATA users of all placements, whether that be beginning, or very advanced. On-site training at any facility, whether that be a University, or company, provides one on one training as long as the individual has access to a laptop. Each course provides course materials, a STATA expert, and a course license. A maximum of 24 participants are allowed to join per course. Courses last a maximum of two days. Preset courses include:

  1. Advanced STATA programming
  2. Data Visualization in STATA
  3. Handling Missing Data using Multiple Imputation
  4. Multilevel/Mixed Models Using STATA
  5. Panel-Data Analysis
  6. Programming Estimation Commands in STATA
  7. Structural Equation Modeling Using STATA
  8. Survey Data Analysis Using STATA
  9. Time Series Analysis Using STATA
  10. Using STATA Effectively: Data Management, Analysis, and Graphic Fundamentals

Individual Topics Include:

Basics

  • Keeping organized
  • Using Dialog boxes efficiently

Fundamentals

  • Generating new variables
  • Making graphics

Advanced

  • Writing commands in ado language
  • Changing the format of graphs via writing schemes

Specialized Analyses

  • Maximum Likelihood Estimations
  • Sample Size Calculations

    Webinars

Webinars include:

Ready. Set. Go. Stata.

  • How to clean data
  • Creating graphs
  • Manipulating data

Tips and Tricks:

  • Manipulating STATA to produce correct results

Introduction to Bayesian Analysis using STATA:

  • Bayesian regression models
  • Interval Hypothesis Testing

    Video Tutorials

This website contains video tutorials for individuals who are seeking answers for STATA that arrange from Beginners to Intermediate to Advanced STATA users.

 Video tutorials include some of the following:

  1. Stata Basics
  2. Data Management
  3. Graphics
  4. Bayesian Statistics
  5. Case-Control Studies
  6. Descriptive Statistics
  7. Effect Sizes
  8. Factor Variables
  9. Power and Sample Size
  10. Survival Analysis

Third-party Courses

Third-party courses are offered through this website. Locations vary all across the world from the United States, Germany, Africa, Sweden, Italy, Sweden, and a few other remote locations. Registration dates are posted with each course title and location. Also, the instructor name is provided they generally range from 2-3 days, so individuals have enough time to figure out scheduling information. Online courses are also offered.

Links

This webpage includes web resources separated into STATA resources and Statistical Resources. STATA resources include URL’s to blogs, social media, online discussion, STATA examples and datasets, along with online training. Statistical resources include book publishers, statistical journals, and statistical organizations. 

 

Advanced Methods:

Clinical Trial Simulations

The link below describes Novel Methods for 21st Century Clinical trials

DAGitty

DAGitty is a browser-based environment for creating, editing, and analyzing causal models (also known as directed acyclic graphs or causal Bayesian networks). The focus is on the use of causal diagrams for minimizing bias in empirical studies in epidemiology and other disciplines

Deep Learning and big data platform with R

A very nice workshop with introduction to Deep Learning, hands on examples, and big data platform with Spark through R sparklyr package. 

Joint Modeling

A joint modeling framework that takes into account the dependency and association between outcome variables with different data types. It has been demonstrated that use of joint modeling framework will lead to correction of biases in estimation and the potential for enhanced efficiency by borrowing statistical strength across multiple outcomes

Joint modeling: survival observations and longitudinal observations

Joint modeling: survival observations and count data

Mixed Effect Model

Also known as multilevel hierarchical or random coefficient model. Mixed effect model is used for analyzing outcome data that are correlated.

Multiple Imputation for missing data

A method that hands missing data in multivariate analysis using sequential. The application of multiple imputation will assure that statistical inferences are approximately unbiased so long as the mechanism of missingness follows a missing at random [MAR] structure.

Regularization Methods (ex. LASSO, Ridge, Elastic Net)

Regularization methods, particularly Least Absolute Shrinkage and Selection Operator (LASSO) and elastic net, are commonly used in the situation where the number of predictors exceeds the number of observations. LASSO uses an L1-norm penalty to shrink coefficients toward zero, performing model selection; ridge regression uses an L2-norm penalty, which performs some shrinkage but all coefficients are maintained in the model; and elastic net implements a mixture of the two penalties. Lasso enables model selection in high dimensional data sets, selecting a minimal set of uncorrelated predictors. Ridge regression may be useful if there is collinearity among the predictors, and it is desirable to retain all predictors in the model. Elastic net is useful if one wishes to strike a balance between lasso and ridge, performing some model selection but also allowing correlation among predictors. The primary package used in R is glmnet, and here is a link to the glmnet vignette: https://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html.

Here is a link to another tutorial on regularization methods in R: https://www.datacamp.com/community/tutorials/tutorial-ridge-lasso-elastic-net

One down side to glmnet’s implementation is that levels of categorical variables, for example the variable “colors” (with levels: red, orange, yellow, green, blue) are coded as separate dummy variables, and the selection process may keep some of the levels and exclude others. In general, we would like the selection process to exclude or keep all levels of the colors variable.  An alternative approach that enables grouping of levels of categorical variables is to use group lasso, which has now been implemented in several R packages: gglasso, grplasso, and grpreg.

Lasso/elastic net/group lasso are typically used for prediction modeling, not for explanatory modeling.  R packages implementing regularization typically do not provide inference that investigators are looking for in the case of explanatory modeling (95% confidence intervals and p-values). The reason provided by the authors of the penalized package is that the coefficients are strongly biased – the penalization method reduces the variance of the coefficients by introducing bias. While bootstrapping could provide estimates of the variance of the coefficients, it does not provide a reliable estimate of the bias.  That said, the authors of the glmnet package have created a package called SelectiveInference that provides confidence intervals and p-values for both lasso and step-wise regression methods, where the confidence intervals “properly account for the inherent selection carried out by the procedure”.

Our Office

Williams Building
University of Utah Research Park
Williams Building, 1st floor
295 South Chipeta Way
Salt Lake City, Utah
Map

Parking: During construction, you may park on the bottom floor of the south parking structure.

Contact

Camie Derricott
Phone: 801-587-5212
Fax: 801-581-3623

Acknowledging the SDBC

Please use the following text to acknowledge the CTSI Study Design and Biostatistics Center:

"This investigation was supported by TRIAD, with funding in part from the National Center for Advancing Translational Sciences of the National Institutes of Health under Award Number UL1TR002538. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health."