Naming conventions

1. Introduction

The purpose of this document is to establish best practices for Forest development. This document is a work-in-progress. Changes, edits, and updates are welcome.

General guidelines for Python naming conventions are found in PEP 8. For some additional perspectives, see:

For legibility & convenience, we’d like to establish a common naming framework for use in Forest modules. Our goals are to:

  • Enhance human-readability of Forest modules,

  • Simplify the process of writing & reviewing code,

  • Create the basis for a user-friendly API,

  • Clarify the mechanics of the Beiwe platform with accurate documentation,

  • Respect conventions established by the developers of Beiwe’s apps and backend, whenever possible.

Because we expect Forest to be implemented by diverse end-users, we should also:

  • Avoid overly-prescriptive names,

  • Avoid names that may conflict with typical use-cases,

  • Respect the conventions of diverse research cultures.

2. Usage Guidelines

Naming Collections

We’ll often use Python data structures to collect multiple objects of the same type. The naming convention for a specific type of object can be extended to a name for a collection. There are two reasonable strategies:

  • Pluralize: For example, refer to a list of timestamp objects as timestamps.

  • Descriptive suffix: For example, refer to a list or array of timestamp objects as timestamp_list or timestamp_array, respectively.


Long & Short Variable Names

In some cases it’s useful to have both a long and short variable name for a particular type of object.

The long name is generally preferable for function arguments. For example:

def some_function(question_id, **kwargs):
	.
	.
	.

For control variables, it’s sometimes easier to work with a short variable name. For example:

for qid in question_ids:
	.
	.
	.

Recycling Names

We suggest recycling variable names whenever it does not create confusion. When possible, use consistent naming conventions for variables, keywords, headers, file names, etc. For example, question_id may be used as a variable name, a key in a dict or OrderedDict, the column label in a pandas.DataFrame, and so on.

3. Organizing Raw Data

Overview

The Beiwe platform enables collection of raw smartphone data from multiple sources and multiple individuals. Several identifiers and keywords are used to organize and attribute these data.

Variable Names

Long Name

Short Name

Data Type

Details

backend_id

bid

string

A short alphanumeric string (~8 characters). Referred to as a patient_id on the Beiwe backend. The basic identifier for organizing raw Beiwe data. Note that a single individual may contribute raw data under multiple backend_ids. Also, note that a single backend_id may correspond to data from multiple devices.

person_id

pid

string

An identifier from a non-Beiwe framework, such as the end-user’s identification system. A single person_id corresponds to exactly one individual. Raw data for a person_id may be bundled under one or more backend_ids.

data_stream

stream

string

A name for a Beiwe data stream, e.g. gyro or survey_answers. Note that some data streams are available only on iPhones (e.g. magnetometer) or Android phones (e.g. texts). See data_streams.json for details.

4. Directories and Files

Overview

Raw Beiwe data are organized into a specific directory structure. The output of Forest modules may also be delivered to directories with a common structure. In general, the terminology for file names and directory names is not standardized across operating systems. Therefore, it’s useful for Forest developers to agree on a common framework for referring to these locations.

Also, it’s important to note that conventions for path syntax differ across operating systems. Some of these discrepancies are handled natively by Python. However, the best practice is to use either the os package or the pathlib package for all path manipulation tasks. For example:

Python Code

Not safe:

filepath = dirpath + '/' + filename

Safe:

filepath = os.path.join(dirpath, filename)

Safe:

filepath = pathlib.Path(dirpath)/filename

Variable Names

Long Name

Short Name

Data Type

Details

filename

f, fn

string

The base name of a file path, including the file extension, e.g. data.csv.

filepath

f, fp

string

The full path to a file, e.g. absolute/path/to/data.csv.

dirname

d, dn

string

The name of a directory or folder, e.g. data.

dirpath

d, dp

string

The full path to a directory, e.g. absolute/path/to/data.

Special Paths

For absolute paths to special directories and files, use the suffixes _dir and _file, respectively. For example:

Name

Description

raw_dir

Full path to a directory containing raw data, which are organized into folders named after backend identifiers.

log_dir

Full path to an output directory where log records should be written.

config_file

Full path to a study configuration file in JSON format.

5. Time

Overview

Several different time formats appear in raw Beiwe data and configuration files. Additional time formats may be used in output from Forest modules.

Variable Names

For time formats found on the Beiwe platform:

Long Name

Short Name

Data Type

Details

filename_datetime

fdt

string

A UTC datetime string formatted as %Y-%m-%d %H_%M_%S, as found in the names of files containing raw Beiwe data.

data_datetime

ddt

string

A UTC datetime string formatted as %Y-%m-%dT%H:%M:%S.%f, as found in the UTC time column of raw Beiwe files.

timestamp

t

integer

A millisecond timestamp, as found in the timestamp column of raw Beiwe files. The number of elapsed milliseconds since the Unix epoch.

In some cases, it may be necessary to work with isolated dates or times without specifying a timezone or UTC offset. If necessary, use the following prefixes for clarification:

Prefix

Description

utc_

To identify a date or time in UTC.

local_

To identify a date or time that has been localized to the user’s timezone.

Special Times

When processing raw Beiwe data, it’s often necessary to refer to specific times. Use the following prefixes:

Prefix

Description

start_, end_

To identify a specific time period, e.g. a particular followup duration or a particular window of observations.

first_, last_

To identify the first and last observations in a data set.

6. Configuration Settings

Overview

Beiwe configuration parameters determine app behavior, including collection of raw sensor data and delivery of surveys. These parameters are organized using keywords and identifiers.

Variable Names

Long Name

Short Name

Data Type

Details

config_name

string

The Beiwe backend’s human-readable name for the configuration. Note that this name does not appear in a configuration file, but may appear in the name of a configuration file.

config_id

string or integer

The hex string that uniquely identifies the configuration. Note that some configuration files may disagree with the backend’s identifier.

survey_id

sid

string

A hex string that uniquely identifies each Beiwe survey.

question_id

qid

string

A string that uniquely identifies each tracking survey question. The format of this identifier is five hex strings separated by dashes.

7. Device Parameters

Overview

Information about a participant’s smartphone is collected whenever the Beiwe app is installed. This information appears in each participant’s identifiers data stream directory.

Variable Names

Variable names for device parameters are taken from the header of identifiers CSV files, for example:

Long Name

Short Name

Data Type

Details

device_os

string

The operating system of a smartphone. Either iOS or Android.

os_version

string

The version number of the operating system.

8. Packages & Modules

9. Functions & Classes

10. Summary statistics

  • snake case

  • descriptive

  • check if it doesn’t already exist