Managing Data Storage

Introduction
Understanding Data Storage Concepts
Creating a Data Storage Location
Assigning Data Storage to Sessions
Dynamic Data Linking
Best Practices
Troubleshooting Common Issues
Integration with Analysis Workflows
Next Steps

Introduction

Data storage in BrainSTEM provides a flexible way to link your metadata to actual data files stored on various platforms. This tutorial covers how to set up data storage locations, associate them with sessions, and configure dynamic data linking for seamless access to your raw data.

Understanding Data Storage Concepts

BrainSTEM’s data storage system consists of three key components:

Data Storage Locations: Define where your data is physically stored (servers, cloud, local drives)
Data Organization: How your files are structured within the storage location
Data Protocols: How to access the data (paths, URLs, access methods)

Creating a Data Storage Location

Step 1: Navigate to Data Storage

From the dashboard, go to Personal Attributes → Data storage in the left navigation menu
Click the Add data storage button in the top right corner

Step 2: Configure Basic Information

Fill in the basic details:

Field	Description
Name	Descriptive name for your data storage (e.g., “Lab Server 01”, “Cloud Storage Main”)
Authenticated groups	Groups that can access this data storage (required)
Description	Details about stored data types and access requirements
Public access	Whether this storage should be publicly accessible

Permissions are directly set on data storages. Data storage has four permission levels: membership (read access), contributors, managers, and owners.

Choose names that clearly identify the storage location and its purpose. Other lab members will see these names when creating sessions.

Step 3: Define Data Organization

Configure how your data is organized within the storage location. This defines the hierarchy of folders/directories:

Available organization elements:

Projects
Collections
Cohorts
Subjects
Sessions
Years

Example organization structures:

Subjects → Sessions

Projects → Subjects → Sessions

Add organization elements that match your lab’s file structure conventions using the available element types.

Step 4: Configure Data Storage Protocols

Set up how to access your data storage. You can configure multiple protocols for the same storage location, e.g.:

Storage Type	Protocol	Path Example	Public
Local Drive	Local Storage	`/Users/researcher/data/`	No
Network/Server	SMB/CIFS	`smb://uni.domain/group/data/`	No
Cloud Storage	Dropbox	`data/myproject`	No
Public Repository	HTTPS/Web	`https://dandiarchive.org/dandiset/123456/`	Yes

You can configure multiple protocols for the same data storage. This allows flexibility in how different users or systems access the same data.

Assigning Data Storage to Sessions

During Session Creation

When creating a new session, locate the Data storage field
Select from your configured data storage locations in the dropdown
Optionally, specify a Name used in storage - this is the folder/file name used in your actual storage system

For Existing Sessions

Navigate to the session you want to update
Click the Edit button
Select or change the Data storage field
Update the Name used in storage if needed
Save your changes

The “Name used in storage” field helps maintain consistent naming between BrainSTEM and your actual file system, making it easier to locate files programmatically. This is a session-level field, not a data storage field.

Dynamic Data Linking

BrainSTEM’s data storage system enables dynamic construction of file paths based on your metadata and organization structure.

How Dynamic Linking Works

When you associate a data storage with a session, BrainSTEM can automatically construct file paths based on:

Data storage base path: The root location defined in your data storage protocols
Organization structure: How you’ve defined data should be organized
Session metadata: Project names, subject names, session names, dates

Example Dynamic Path Construction

Data Storage Configuration:

Field	Value
Name	Lab Server Main
Base path	`/Volumes/StorageName/Data/`
Organization	Projects → Subjects → Sessions

Dynamic Path:

/Volumes/StorageName/Data/{projects}/{subjects}/{sessions}/

Data files should be stored in the session folder based on this organization structure.

Session Information:

Field	Value
Project	Memory_Study_2024
Subject	Mouse_001
Session	Baseline_Recording
Name used in storage	ses01_baseline

Resulting Path:

/Volumes/StorageName/Data/Memory_Study_2024/Mouse_001/ses01_baseline/

API Access to Dynamic Paths

Use the Python or MATLAB API to programmatically access your data paths:

from brainstem_api_client import BrainstemClient

client = BrainstemClient()

# Load session with data storage information
session_response = client.load_model('session', 
                                   id='your-session-id', 
                                   include=['datastorage'])

session_data = session_response.json()['sessions'][0]
storage_info = session_data['datastorage']

# Access configured protocols and paths
for protocol in storage_info['data_protocols']:
    print(f"Protocol: {protocol['protocol']}")
    print(f"Base path: {protocol['path']}")
    print(f"Public access: {protocol['is_public']}")

Constructing Full Paths in Your Analysis Code

def construct_data_path(session_data):
    """
    Construct full path to session data based on BrainSTEM metadata
    """
    storage = session_data['datastorage']

    # Use the first configured protocol by default.
    # Update this selection if your storage relies on a specific protocol.
    base_path = storage['data_protocols'][0]['path']
    
    # Extract organization elements
    organization = storage['data_organization']
    
    # Build path based on organization structure
    path_components = [base_path]
    
    for element in organization:
        if element['elements'] == 'Projects':
            path_components.append(session_data['projects'][0]['name'])
        elif element['elements'] == 'Subjects':
            path_components.append(session_data['subject']['name'])
        elif element['elements'] == 'Sessions':
            storage_name = session_data.get('name_used_in_storage', 
                                          session_data['name'])
            path_components.append(storage_name)
    
    return '/'.join(path_components)

# Usage
full_path = construct_data_path(session_data)
print(f"Data location: {full_path}")

Best Practices

Naming Conventions

Data Storage Names: Use descriptive names that indicate location and purpose
Storage Names: Keep file/folder names consistent with your lab’s conventions
Avoid Special Characters: Use underscores or hyphens instead of spaces

Organization Strategies

Standardize Early: Establish organization patterns before accumulating lots of data
Document Conventions: Create lab protocols for file naming and organization
Plan for Growth: Design structures that scale with increasing data volumes

Security and Access

Authenticated Groups: Required - only users in these groups can access the data storage
Public/Private Settings: Configure based on data sensitivity needs
Group Matching: Data storage can only be added to sessions in projects that share at least one authenticated group

Troubleshooting Common Issues

Cannot Access Data Storage in Sessions

Problem: Data storage doesn’t appear in the session dropdown

Solutions:

Verify you’re a member of at least one authenticated group for the data storage

Path Construction Errors

Problem: Generated paths don’t match actual file locations

Solutions:

Review your data organization structure configuration
Verify the base paths in your protocols are correct
Check that “Name used in storage” matches your actual file/folder names

API Access Issues

Problem: Cannot access data storage information via API

Solutions:

Ensure you’re including ‘datastorage’ in your API include parameters
Verify your API authentication is working
Check that you have proper permissions to access the data storage
Confirm you’re using the correct session ID

Integration with Analysis Workflows

Loading Data in Python

import os
from brainstem_api_client import BrainstemClient

def load_session_data(session_id):
    """
    Load session metadata and construct data paths
    """
    client = BrainstemClient()
    
    # Get session with data storage info
    response = client.load_model('session', 
                               id=session_id, 
                               include=['datastorage', 'dataacquisition'])
    
    session = response.json()['sessions'][0]
    
    # Construct data path
    data_path = construct_data_path(session)
    
    # Load actual data files
    data_files = []
    if os.path.exists(data_path):
        data_files = [f for f in os.listdir(data_path) 
                     if f.endswith(('.dat', '.bin', '.h5'))]
    
    return {
        'session_metadata': session,
        'data_path': data_path,
        'data_files': data_files
    }

MATLAB Integration

function data_info = load_session_data(session_id)
    % Load session metadata and construct data paths
    
    % Get session with data storage info
    session_data = load_model('model', 'session', ...
                             'id', session_id, ...
                             'include', {'datastorage', 'dataacquisition'});
    
    % Extract data storage information
    storage = session_data.datastorage;

    % Use the first configured protocol; adjust if a specific protocol is required.
    base_path = storage.data_protocols{1}.path;
    
    % Construct full path (simplified example)
    project_name = session_data.projects{1}.name;
    subject_name = session_data.subject.name;
    storage_name = session_data.name_used_in_storage;
    
    full_path = fullfile(base_path, project_name, subject_name, storage_name);
    
    data_info.session_metadata = session_data;
    data_info.data_path = full_path;
    data_info.exists = exist(full_path, 'dir') == 7;
end

This allows users to access data via local mount, cloud backup, or public repository depending on their needs.

Custom Organization Patterns

Configure your organization structure to match your lab’s file hierarchy:

{
  "data_organization": [
    "Projects",
    "Subjects",
    "Sessions"
  ],
}

Example resulting path:

/Volumes/StorageName/Data/Memory_Study/Mouse_001/Baseline_Recording/

By following this tutorial, you can effectively manage data storage in BrainSTEM, creating seamless links between your metadata and actual data files for efficient analysis workflows.

Next Steps

With data storage configured, you can now effectively manage and analyze your research data:

Start experimental workflows: Follow complete experimental documentation with Electrophysiology Workflow tutorial that demonstrates data storage integration
Use API tools: Access your data programmatically with Python API tool or MATLAB API tool for seamless integration with analysis workflows
Organize experiments: Create structured Behavioral Paradigms that integrate with your data storage system for organized behavioral data
Enable collaboration: Set up proper Managing Projects to share your organized data storage with lab members and collaborators
Enable open data: Make your data publicly accessible through Sharing Project Publicly to promote open science and collaboration

Managing Data Storage

Table of contents

Introduction

Understanding Data Storage Concepts

Creating a Data Storage Location

Step 1: Navigate to Data Storage

Step 2: Configure Basic Information

Step 3: Define Data Organization

Step 4: Configure Data Storage Protocols

Assigning Data Storage to Sessions

During Session Creation

For Existing Sessions

Dynamic Data Linking

How Dynamic Linking Works

Example Dynamic Path Construction

API Access to Dynamic Paths

Constructing Full Paths in Your Analysis Code

Best Practices

Naming Conventions

Organization Strategies

Security and Access

Troubleshooting Common Issues

Cannot Access Data Storage in Sessions

Path Construction Errors

API Access Issues

Integration with Analysis Workflows

Loading Data in Python

MATLAB Integration

Custom Organization Patterns

Next Steps