Skip to Main Content
Chat With Us

ENVS 4943 Environmental Studies Research Seminar

:

Working With Your Own Data

Guide to environmental resources in the Boston College Libraries and beyond

Best Practices When Creating and Using Your Own Data

In addition to using data provided by library and web resources, you will likely be collecting your own data as you conduct your BC-specific studies.  Here are some good practices to follow as you do this.

File Naming Conventions Best Practices

•      Make the names consistent and descriptive … and UNIQUE

•      Avoid spaces and special characters

•      Use brief names

•      Can Contain:

  • Project acronyms
  • Researchers’ initials
  • File type information
  • Version number
  • Date
  • File Status (Final, for example)

 

Organize by Folders

Work as a group to choose folders that make sense to all.  Here are some possible ways to organize.

By Types of Data  ("Interview Data" or "GIS Data")

By Data Source (BC Facilities, BC Libraries, etc.)

By Name of collector (group member)

 

Entering Your Own Data

Handling of null values

  • Can be problematic in moving across software platforms
  • Consider using blanks  -- treated as null values by R, Python, SQL and Exce
  • Don’t use text (as in, “no data”) in a data column formatted for numbers
  • Whatever you use, be consistent

Define abbreviations in readme.txt file or in a “codebook”
Record dates for best sorting (YYYYMMDD)
Check periodically for data corruption/integrity using functions such as checksum. 
Flag problematic data
Consider making your raw data files “read only”
Finally, avoid manual data entry whenever possible