Naming conventions

1. Introduction

The purpose of this document is to establish best practices for Forest development. This document is a work-in-progress. Changes, edits, and updates are welcome.

General guidelines for Python naming conventions are found in PEP 8. For some additional perspectives, see:

For legibility & convenience, we’d like to establish a common naming framework for use in Forest modules. Our goals are to:

Enhance human-readability of Forest modules,
Simplify the process of writing & reviewing code,
Create the basis for a user-friendly API,
Clarify the mechanics of the Beiwe platform with accurate documentation,
Respect conventions established by the developers of Beiwe’s apps and backend, whenever possible.

Because we expect Forest to be implemented by diverse end-users, we should also:

Avoid overly-prescriptive names,
Avoid names that may conflict with typical use-cases,
Respect the conventions of diverse research cultures.

2. Usage Guidelines

Naming Collections

We’ll often use Python data structures to collect multiple objects of the same type. The naming convention for a specific type of object can be extended to a name for a collection. There are two reasonable strategies:

Pluralize: For example, refer to a list of timestamp objects as timestamps.
Descriptive suffix: For example, refer to a list or array of timestamp objects as timestamp_list or timestamp_array, respectively.

Long & Short Variable Names

In some cases it’s useful to have both a long and short variable name for a particular type of object.

The long name is generally preferable for function arguments. For example:

def some_function(question_id, **kwargs):
	.
	.
	.

For control variables, it’s sometimes easier to work with a short variable name. For example:

for qid in question_ids:
	.
	.
	.

Recycling Names

We suggest recycling variable names whenever it does not create confusion. When possible, use consistent naming conventions for variables, keywords, headers, file names, etc. For example, question_id may be used as a variable name, a key in a dict or OrderedDict, the column label in a pandas.DataFrame, and so on.

3. Organizing Raw Data

Overview

The Beiwe platform enables collection of raw smartphone data from multiple sources and multiple individuals. Several identifiers and keywords are used to organize and attribute these data.

Variable Names

Long Name	Short Name	Data Type	Details
`backend_id`	`bid`	string	A short alphanumeric string (~8 characters). Referred to as a `patient_id` on the Beiwe backend. The basic identifier for organizing raw Beiwe data. Note that a single individual may contribute raw data under multiple `backend_ids`. Also, note that a single `backend_id` may correspond to data from multiple devices.
`person_id`	`pid`	string	An identifier from a non-Beiwe framework, such as the end-user’s identification system. A single `person_id` corresponds to exactly one individual. Raw data for a `person_id` may be bundled under one or more `backend_ids`.
`data_stream`	`stream`	string	A name for a Beiwe data stream, e.g. `gyro` or `survey_answers`. Note that some data streams are available only on iPhones (e.g. `magnetometer`) or Android phones (e.g. `texts`). See `data_streams.json` for details.

4. Directories and Files

Overview

Raw Beiwe data are organized into a specific directory structure. The output of Forest modules may also be delivered to directories with a common structure. In general, the terminology for file names and directory names is not standardized across operating systems. Therefore, it’s useful for Forest developers to agree on a common framework for referring to these locations.

Also, it’s important to note that conventions for path syntax differ across operating systems. Some of these discrepancies are handled natively by Python. However, the best practice is to use either the os package or the pathlib package for all path manipulation tasks. For example:

	Python Code
Not safe:	`filepath = dirpath + '/' + filename`
Safe:	`filepath = os.path.join(dirpath, filename)`
Safe:	`filepath = pathlib.Path(dirpath)/filename`

Variable Names

Long Name	Short Name	Data Type	Details
`filename`	`f`, `fn`	string	The base name of a file path, including the file extension, e.g. `data.csv`.
`filepath`	`f`, `fp`	string	The full path to a file, e.g. `absolute/path/to/data.csv`.
`dirname`	`d`, `dn`	string	The name of a directory or folder, e.g. `data`.
`dirpath`	`d`, `dp`	string	The full path to a directory, e.g. `absolute/path/to/data`.

Special Paths

For absolute paths to special directories and files, use the suffixes _dir and _file, respectively. For example:

Name	Description
`raw_dir`	Full path to a directory containing raw data, which are organized into folders named after backend identifiers.
`log_dir`	Full path to an output directory where log records should be written.
`config_file`	Full path to a study configuration file in JSON format.

5. Time

Overview

Several different time formats appear in raw Beiwe data and configuration files. Additional time formats may be used in output from Forest modules.

Variable Names

For time formats found on the Beiwe platform:

Long Name	Short Name	Data Type	Details
`filename_datetime`	`fdt`	string	A UTC datetime string formatted as `%Y-%m-%d %H_%M_%S`, as found in the names of files containing raw Beiwe data.
`data_datetime`	`ddt`	string	A UTC datetime string formatted as `%Y-%m-%dT%H:%M:%S.%f`, as found in the `UTC time` column of raw Beiwe files.
`timestamp`	`t`	integer	A millisecond timestamp, as found in the `timestamp` column of raw Beiwe files. The number of elapsed milliseconds since the Unix epoch.

In some cases, it may be necessary to work with isolated dates or times without specifying a timezone or UTC offset. If necessary, use the following prefixes for clarification:

Prefix	Description
`utc_`	To identify a date or time in UTC.
`local_`	To identify a date or time that has been localized to the user’s timezone.

Special Times

When processing raw Beiwe data, it’s often necessary to refer to specific times. Use the following prefixes:

Prefix	Description
`start_`, `end_`	To identify a specific time period, e.g. a particular followup duration or a particular window of observations.
`first_`, `last_`	To identify the first and last observations in a data set.

6. Configuration Settings

Overview

Beiwe configuration parameters determine app behavior, including collection of raw sensor data and delivery of surveys. These parameters are organized using keywords and identifiers.

Variable Names

Long Name	Short Name	Data Type	Details
`config_name`		string	The Beiwe backend’s human-readable name for the configuration. Note that this name does not appear in a configuration file, but may appear in the name of a configuration file.
`config_id`		string or integer	The hex string that uniquely identifies the configuration. Note that some configuration files may disagree with the backend’s identifier.
`survey_id`	`sid`	string	A hex string that uniquely identifies each Beiwe survey.
`question_id`	`qid`	string	A string that uniquely identifies each tracking survey question. The format of this identifier is five hex strings separated by dashes.

7. Device Parameters

Overview

Information about a participant’s smartphone is collected whenever the Beiwe app is installed. This information appears in each participant’s identifiers data stream directory.

Variable Names

Variable names for device parameters are taken from the header of identifiers CSV files, for example:

Long Name	Short Name	Data Type	Details
`device_os`		string	The operating system of a smartphone. Either `iOS` or `Android`.
`os_version`		string	The version number of the operating system.

8. Packages & Modules

9. Functions & Classes

10. Summary statistics

snake case
descriptive
check if it doesn’t already exist