Skip to main content

Data Management for the Sciences

A guide to best practices for management of research data, including links to data services from the University of California.

File Naming Conventions

When organizing files, it's important to standardize file naming and directories so they're descriptive.

DataONE shares an excellent best practice and example:

Best Practice

File names should reflect the contents of the file and include enough information to uniquely identify the data file. File names may contain information such as project acronym, study title, location, investigator, year(s) of study, data type, version number, and file type.

When choosing a file name, check for any database management limitations on file name length and use of special characters. Also, in general, lower-case names are less software and platform dependent. Avoid using spaces and special characters in file names, directory paths and field names. Automated processing, URLs and other systems often use spaces and special characters for parsing text string. Instead, consider using underscore ( _ ) or dashes ( - ) to separate meaningful parts of file names. Avoid $ % ^ & # | : and similar.

If versioning is desired a date string within the file name is recommended to indicate the version.

Avoid using file names such as mydata.dat or 1998.dat.

Description Rationale

Clear, descriptive, and unique file names may be important when your data file is combined in a directory or FTP site with your own data files or with the data files of other investigators. File names that reflect the contents of the file and uniquely identify the data file enable precise search and discovery of particular files.

Examples

An example of a good data file name:

Sevilleta_LTER_NM_2001_NPP.csv

  • Sevilleta_LTER is the project name
  • NM is the state abbreviation
  • 2001 is the calendar year
  • NPP represents Net Primary Productivity data
  • csv stands for the file type—ASCII comma separated variable

Source: DataOne

File Formats

The file format(s) in which you record, store, and transmit your data is a primary factor in one's ability to use your data in the future.

Since technology continually changes, researchers should plan for both hardware and software obsolescence. How will your data be read if the software used to produce it becomes unavailable?

Formats more likely to be accessible in the future are:

  • Non-proprietary
  • Open, documented standard
  • Common usage by research community
  • Standard representation (ASCII, Unicode)
  • Unencrypted
  • Uncompressed

Consider migrating your data into a format with the above characteristics, in addition to keeping a copy in the original software format.

Examples of preferred format choices:

  • PDF/A, not Word
  • ASCII, not Excel
  • MPEG-4, not Quicktime
  • TIFF or JPEG2000, not GIF or JPG
  • XML or RDF, not RDBMS

For examples of how data archives treat different file formats, see the UK Data Archive page on data formats and software. Note that not all repositories are able to migrate data files to newer file formats for preservation.

Source: from the University of New Hampshire

More Naming Conventions

Directory Structure Naming Conventions

When organizing files, directory top-level folder should include the project title, unique identifier, and date (year).

The substructure should have a clear, documented naming convention; for example, each run of an experiment, each version of a dataset, and/or each person in the group.

Renaming Tools

Use free tools to help you:

Naming Conventions for Specific Disciplines

Many disciplines have recommendations, for example:

If your discipline has not promulgated specific recomendations on file naming and organization, refer to the general principles laid out in the other two boxes on this page.

Source: from MIT Libraries