Managing Data Storage

Table of contents

  1. Introduction
  2. Understanding Data Storage Concepts
  3. Creating a Data Storage Location
  4. Assigning Data Storage to Sessions
  5. Dynamic Data Linking
  6. Best Practices
  7. Troubleshooting Common Issues
  8. Integration with Analysis Workflows
  9. Next Steps

Introduction

Data storage in BrainSTEM provides a flexible way to link your metadata to actual data files stored on various platforms. This tutorial covers how to set up data storage locations, associate them with sessions, and configure dynamic data linking for seamless access to your raw data.

Understanding Data Storage Concepts

BrainSTEM’s data storage system consists of three key components:

  1. Data Storage Locations: Define where your data is physically stored (servers, cloud, local drives)
  2. Data Organization: How your files are structured within the storage location
  3. Data Protocols: How to access the data (paths, URLs, access methods)

Creating a Data Storage Location

Step 1: Navigate to Data Storage

  1. From the dashboard, go to Personal AttributesData storage in the left navigation menu
  2. Click the Add data storage button in the top right corner

Step 2: Configure Basic Information

Fill in the basic details:

FieldDescription
NameDescriptive name for your data storage (e.g., “Lab Server 01”, “Cloud Storage Main”)
Authenticated groupsGroups that can access this data storage (required)
DescriptionDetails about stored data types and access requirements
Public accessWhether this storage should be publicly accessible

Permissions are directly set on data storages. Data storage has four permission levels: membership (read access), contributors, managers, and owners.

Choose names that clearly identify the storage location and its purpose. Other lab members will see these names when creating sessions.

Step 3: Define Data Organization

Configure how your data is organized within the storage location. This defines the hierarchy of folders/directories:

Available organization elements:

  • Projects
  • Collections
  • Cohorts
  • Subjects
  • Sessions
  • Years

Example organization structures:

Subjects → Sessions
Projects → Subjects → Sessions

Add organization elements that match your lab’s file structure conventions using the available element types.

Step 4: Configure Data Storage Protocols

Set up how to access your data storage. You can configure multiple protocols for the same storage location, e.g.:

Storage TypeProtocolPath ExamplePublic
Local DriveLocal Storage/Users/researcher/data/No
Network/ServerSMB/CIFSsmb://uni.domain/group/data/No
Cloud StorageDropboxdata/myprojectNo
Public RepositoryHTTPS/Webhttps://dandiarchive.org/dandiset/123456/Yes

You can configure multiple protocols for the same data storage. This allows flexibility in how different users or systems access the same data.

Assigning Data Storage to Sessions

During Session Creation

  1. When creating a new session, locate the Data storage field
  2. Select from your configured data storage locations in the dropdown
  3. Optionally, specify a Name used in storage - this is the folder/file name used in your actual storage system

For Existing Sessions

  1. Navigate to the session you want to update
  2. Click the Edit button
  3. Select or change the Data storage field
  4. Update the Name used in storage if needed
  5. Save your changes

The “Name used in storage” field helps maintain consistent naming between BrainSTEM and your actual file system, making it easier to locate files programmatically. This is a session-level field, not a data storage field.

Dynamic Data Linking

BrainSTEM’s data storage system enables dynamic construction of file paths based on your metadata and organization structure.

How Dynamic Linking Works

When you associate a data storage with a session, BrainSTEM can automatically construct file paths based on:

  1. Data storage base path: The root location defined in your data storage protocols
  2. Organization structure: How you’ve defined data should be organized
  3. Session metadata: Project names, subject names, session names, dates

Example Dynamic Path Construction

Data Storage Configuration:

FieldValue
NameLab Server Main
Base path/Volumes/StorageName/Data/
OrganizationProjects → Subjects → Sessions

Dynamic Path:

/Volumes/StorageName/Data/{projects}/{subjects}/{sessions}/

Data files should be stored in the session folder based on this organization structure.

Session Information:

FieldValue
ProjectMemory_Study_2024
SubjectMouse_001
SessionBaseline_Recording
Name used in storageses01_baseline

Resulting Path:

/Volumes/StorageName/Data/Memory_Study_2024/Mouse_001/ses01_baseline/

API Access to Dynamic Paths

Use the Python or MATLAB API to programmatically access your data paths:

from brainstem_api_tools import BrainstemClient

client = BrainstemClient()

# Load session with expanded data storage records
# Use filters={'id': ...} to fetch a single session by UUID
session_response = client.load('session',
                               filters={'id': 'your-session-id'},
                               include=['datastorage'])

session_data = session_response.json()['sessions'][0]

# datastorage is a list of expanded objects when include=['datastorage'] is used
for storage in session_data['datastorage']:
    for protocol in storage['data_protocols']:
        print(f"Protocol: {protocol['protocol']}")
        print(f"Base path: {protocol['path']}")
        print(f"Public access: {protocol['is_public']}")

Constructing Full Paths in Your Analysis Code

def construct_data_path(session_data, subject_name=None):
    """
    Construct full path to session data based on BrainSTEM metadata.

    Requires the session to have been loaded with:
      include=['datastorage', 'projects']
    so that datastorage and projects are expanded to full objects.

    Sessions have no direct subject field — subjects are linked via behavior
    records. Pass subject_name explicitly when your organization includes
    Subjects (e.g. retrieved via client.load('behavior', ...)).
    """
    if not session_data.get('datastorage'):
        return None

    # datastorage is a list; use the first linked storage
    storage = session_data['datastorage'][0]

    # Use the first configured protocol by default.
    # Update this selection if your storage relies on a specific protocol.
    base_path = storage['data_protocols'][0]['path']

    # Extract organization elements
    organization = storage['data_organization']

    # Build path based on organization structure
    path_components = [base_path]

    for element in organization:
        if element['elements'] == 'Projects':
            # projects is expanded to full objects when include=['projects'] is used
            path_components.append(session_data['projects'][0]['name'])
        elif element['elements'] == 'Subjects':
            # Session has no direct subject field; provide subject_name separately.
            if subject_name:
                path_components.append(subject_name)
        elif element['elements'] == 'Sessions':
            storage_name = session_data.get('name_used_in_storage') or session_data['name']
            path_components.append(storage_name)

    return '/'.join(path_components)


# Load session with datastorage and projects expanded
session_response = client.load('session',
                               filters={'id': 'your-session-id'},
                               include=['datastorage', 'projects'])
session_data = session_response.json()['sessions'][0]

# If your organization includes Subjects, retrieve subject name via behaviors
behaviors = client.load('behavior', filters={'session': session_data['id']}).json()
subject_name = None
if behaviors.get('behaviors'):
    subject_id = behaviors['behaviors'][0]['subjects'][0]
    subject_resp = client.load('subject', filters={'id': subject_id}).json()
    subject_name = subject_resp['subjects'][0]['name']

full_path = construct_data_path(session_data, subject_name=subject_name)
print(f"Data location: {full_path}")

Best Practices

Naming Conventions

  • Data Storage Names: Use descriptive names that indicate location and purpose
  • Storage Names: Keep file/folder names consistent with your lab’s conventions
  • Avoid Special Characters: Use underscores or hyphens instead of spaces

Organization Strategies

  • Standardize Early: Establish organization patterns before accumulating lots of data
  • Document Conventions: Create lab protocols for file naming and organization
  • Plan for Growth: Design structures that scale with increasing data volumes

Security and Access

  • Authenticated Groups: Required - only users in these groups can access the data storage
  • Public/Private Settings: Configure based on data sensitivity needs
  • Group Matching: Data storage can only be added to sessions in projects that share at least one authenticated group

Troubleshooting Common Issues

Cannot Access Data Storage in Sessions

Problem: Data storage doesn’t appear in the session dropdown

Solutions:

  • Verify you’re a member of at least one authenticated group for the data storage

Path Construction Errors

Problem: Generated paths don’t match actual file locations

Solutions:

  • Review your data organization structure configuration
  • Verify the base paths in your protocols are correct
  • Check that “Name used in storage” matches your actual file/folder names

API Access Issues

Problem: Cannot access data storage information via API

Solutions:

  • Ensure you’re including ‘datastorage’ in your API include parameters
  • Verify your API authentication is working
  • Check that you have proper permissions to access the data storage
  • Confirm you’re using the correct session ID

Integration with Analysis Workflows

Loading Data in Python

import os
from brainstem_api_tools import BrainstemClient

def load_session_data(session_id):
    """
    Load session metadata and construct data paths.
    Uses include=['datastorage', 'projects'] so path construction works.
    """
    client = BrainstemClient()

    # Use filters={'id': ...} to fetch a single session by UUID
    response = client.load('session',
                           filters={'id': session_id},
                           include=['datastorage', 'projects', 'dataacquisition'])

    session = response.json()['sessions'][0]

    # Retrieve subject name via behavior records if organization includes Subjects
    behaviors = client.load('behavior', filters={'session': session_id}).json()
    subject_name = None
    if behaviors.get('behaviors'):
        subject_id = behaviors['behaviors'][0]['subjects'][0]
        subject_resp = client.load('subject', filters={'id': subject_id}).json()
        subject_name = subject_resp['subjects'][0]['name']

    # Construct data path
    data_path = construct_data_path(session, subject_name=subject_name)

    # Load actual data files
    data_files = []
    if data_path and os.path.exists(data_path):
        data_files = [f for f in os.listdir(data_path)
                      if f.endswith(('.dat', '.bin', '.h5'))]

    return {
        'session_metadata': session,
        'data_path': data_path,
        'data_files': data_files
    }

MATLAB Integration

function data_info = load_session_data(session_id)
    % Load session metadata and construct data paths.
    % Using include={'datastorage','projects'} expands those fields to full objects.
    client = BrainstemClient();

    % Fetch the session by UUID and expand datastorage and projects
    session_response = client.load('session', ...
                                   'id', session_id, ...
                                   'include', {'datastorage', 'projects', 'dataacquisition'});
    session_data = session_response.sessions{1};

    % Extract data storage information (first linked storage)
    storage = session_data.datastorage{1};

    % Use the first configured protocol; adjust if a specific protocol is required.
    base_path = storage.data_protocols{1}.path;

    % Project name is available when include={'projects'} is used
    project_name = session_data.projects{1}.name;

    % Session has no direct subject field; subjects are linked via behavior records.
    % Retrieve subject separately if your organization structure includes Subjects.
    behaviors = client.load('behavior', 'filter', {'session', session_id});
    if ~isempty(behaviors.behaviors)
        subject_id = behaviors.behaviors{1}.subjects{1};
        subject_resp = client.load('subject', 'id', subject_id);
        subject_name = subject_resp.subjects{1}.name;
    else
        subject_name = '';
    end

    storage_name = session_data.name_used_in_storage;

    % Build path — omit subject component if subject_name is empty
    if ~isempty(subject_name)
        full_path = fullfile(base_path, project_name, subject_name, storage_name);
    else
        full_path = fullfile(base_path, project_name, storage_name);
    end

    data_info.session_metadata = session_data;
    data_info.data_path = full_path;
    data_info.exists = exist(full_path, 'dir') == 7;
end

This allows users to access data via local mount, cloud backup, or public repository depending on their needs.

Custom Organization Patterns

Configure your organization structure to match your lab’s file hierarchy:

{
  "data_organization": [
    "Projects",
    "Subjects",
    "Sessions"
  ]
}

Example resulting path:

/Volumes/StorageName/Data/Memory_Study/Mouse_001/Baseline_Recording/

By following this tutorial, you can effectively manage data storage in BrainSTEM, creating seamless links between your metadata and actual data files for efficient analysis workflows.

Next Steps

With data storage configured, you can now effectively manage and analyze your research data:

  • Start experimental workflows: Follow complete experimental documentation with Electrophysiology Workflow tutorial that demonstrates data storage integration
  • Use API tools: Access your data programmatically with Python API tool or MATLAB API tool for seamless integration with analysis workflows
  • Organize experiments: Create structured Behavioral Assays that integrate with your data storage system for organized behavioral data
  • Enable collaboration: Set up proper Managing Projects to share your organized data storage with lab members and collaborators
  • Enable open data: Make your data publicly accessible through Sharing Project Publicly to promote open science and collaboration