Skip to main content

Data Collection & Analysis

Data Collection & Analysis

Whether you are planning to collect your data via a survey, from patient or medical records, or other means, SDBC collaborators can review survey or other your data collection tool prior to data collection. This will help ensure that your data collection will be comprehensive and include the appropriate variables to meet your study objectives.

* Appropriate collection and formatting of your data is crucial for facilitating statistical analysis.*

While there are a number of database management systems to choose from, the use of spreadsheets such as Excel for data entry and storage is usually not a good idea. Below are suggestions to help you choose the right database management system:

  • Keep your original primary data in a secured database, which can be easily exported to a spreadsheet or statistical package for analysis.
  • We highly recommend collecting and storing your data in Research Electronic Data Capture (REDCap), see the “Data Collection Details” section below.
  • Click here for Data Basics (Tanya Hoskin, Mayo Clinic), and also see the "Data Collection Details" section below.

For those conducting surveys or focus groups, other data collection issues are involved including:

  • Survey mode
  • Participant recruitment strategies to maximize response
  • Working with special populations
  • Transcription, audio and video needs, and other resources necessary when conducting focus groups

For studies involving the Utah Population Database an initial meeting should be scheduled with both the SDBC and UPDB staff to determine the most appropriate database to meet the investigator’s needs

For studies involving the Enterprise Data Warehouse (EDW) an initial meeting should be scheduled with both the SDBC and EDW staff to determine the most appropriate database to meet the investigator’s needs.

Data Collection Details

Research Electronic Data Capture (REDCap)

  • We recommend collecting and storing data using the Research Electronic Data Capture (REDCap), a secure, web-based application for building and managing online surveys and databases, which is free to University of Utah researchers. 
    • You can learn more about REDCap by watching a brief summary video (4 min).
    • If you would like to view other quick video tutorials of REDCap in action and an overview of its features, please see the Training Resources page.
    • Please visit REDCap support for more information for signing up for training or to visit the CCTS REDCap help center.
  • If you do not use REDCap, please contact BMIC for a free consultation for assistance with data management plans and storage options.

Data Dictionary

A data dictionary provides a list of the variable names in your database, a brief summary of what each variable is, as well as the range of possible values that each variable may take. This helps the statistician read through your database easily and also allows them to perform data integrity checks. A data dictionary is also an excellent way to keep track of your variables, especially if abbreviations are used in your database. If you have any questions about the data dictionary, feel free to ask a statistician. See below for an example:

Variable Name Description Valid Values
Gender Patient's gender M, F or 1=M, 2=F
Survmon Patient's survival in months 0.001 - 160
Surg Patient's surgery status 1=Yes, 2=No or Yes, No
Race Patient's Race AA=African American, H=Hispanic...
Age Patient's Age 11 - 35

Variable Name Requirements

All variable names should have the following properties:

  • 20 characters in length
  • No Special Characters (*, &, ^, _, etc)
  • No spaces/blanks

Data Formatting

Formatting your data will expedite your statistical analysis. While REDCap implements many critical aspects of data formatting, it may be helpful to review the following before you begin.

  • Tips on Data Preparation (Tonya Hoskin, Mayo Clinic)
  • Always Look at the Data (Tanya Hoskin, Mayo Clinic)

Specific formats for longitudinal or repeated measures studies:

  • If the same variables are collected multiple times for the same subject, the "long form" is the most efficient way to enter your data
  • Please meet with your SDBC collaborator to review proper data entry techniques

Protected Health Information (PHI)

All data should be securely stored, and access should be restricted to those individuals who are entering data. Properly dispose of paper and electronic files, keep paper copies in a locked cabinet or drawer, and store electronic files on a secure-access central server. Also keep in mind the Health Insurance Portability and Accountability Act’s (HIPAA) Minimum Necessary Principle when listing what variables to include in a database. Use or disclose only the information necessary to the task. Excluding unnecessary items that make information identifiable is an important step to ensure privacy, security, and patient confidentiality. Identifiable information includes items listed below. If identifiable information is necessary for research (e.g. birth date, visit date, physical address), take necessary precautions to protect the database: strong passwords, anti-virus software, data backup, possibly encryption, and being very cautious with email. Please refer to COMIRB and HIPAA for additional stipulations.

List of Identifiable Information

  1. Name
  2. Fax number
  3. Phone number
  4. E-mail address
  5. Account numbers
  6. Social Security number
  7. Medical Record number
  8. Health Plan number
  9. Certificate/license numbers
  10. URL
  11. IP address
  12. Vehicle identifiers
  13. Device ID
  14. Biometric ID
  15. Full face/identifying photo
  16. Other unique identifying number, characteristic, or code
  17. Postal address (geographic subdivisions smaller than state)
  18. Date precision beyond year


After the data is collected, we will implement the analyses dictated in the statistical analysis plan (SAP) that we developed collaboratively with you prior to the data collection stage. There may also be additional follow-up analyses that are requested after viewing results from the SAP. These additional analyses will be delineated from pre-specified analyses in the SAP in an addendum section. We are also available to help with the presentation of methods and results in your manuscript – please see the next section "Publish" for details!

Conducting Your Own Statistical Analysis

If you wish to conduct your own statistical analysis, please indicate this when you contact the SDBC (via our online Request for Collaboration form). We can connect you with statistician(s) who are experienced in teaching statistics and statistical programming.

If you plan to conduct your own statistical analysis, the following resources may be helpful:

Our Office

Williams Building
University of Utah Research Park
Williams Building, 1st floor
295 South Chipeta Way
Salt Lake City, Utah

Parking: During construction, you may park on the bottom floor of the south parking structure.


Camie Derricott
Phone: 801-587-5212
Fax: 801-581-3623

Acknowledging the SDBC

Please use the following text to acknowledge the CTSI Study Design and Biostatistics Center:

"This investigation was supported by TRIAD, with funding in part from the National Center for Advancing Translational Sciences of the National Institutes of Health under Award Number UL1TR002538. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health."