Managing Data Storage

Table of contents

  1. Introduction
  2. Understanding Data Storage Concepts
  3. Creating a Data Storage Location
  4. Assigning Data Storage to Sessions
  5. Dynamic Data Linking
  6. Best Practices
  7. Troubleshooting Common Issues
  8. Integration with Analysis Workflows
  9. Next Steps

Introduction

Data storage in BrainSTEM provides a flexible way to link your metadata to actual data files stored on various platforms. This tutorial covers how to set up data storage locations, associate them with sessions, and configure dynamic data linking for seamless access to your raw data.

Understanding Data Storage Concepts

BrainSTEM’s data storage system consists of three key components:

  1. Data Storage Locations: Define where your data is physically stored (servers, cloud, local drives)
  2. Data Organization: How your files are structured within the storage location
  3. Data Protocols: How to access the data (paths, URLs, access methods)

Creating a Data Storage Location

Step 1: Navigate to Data Storage

  1. From the dashboard, go to Personal AttributesData storage in the left navigation menu
  2. Click the Add data storage button in the top right corner

Step 2: Configure Basic Information

Fill in the basic details:

FieldDescription
NameDescriptive name for your data storage (e.g., “Lab Server 01”, “Cloud Storage Main”)
Authenticated groupsGroups that can access this data storage (required)
DescriptionDetails about stored data types and access requirements
Public accessWhether this storage should be publicly accessible

Permissions are directly set on data storages. Data storage has four permission levels: membership (read access), contributors, managers, and owners.

Choose names that clearly identify the storage location and its purpose. Other lab members will see these names when creating sessions.

Step 3: Define Data Organization

Configure how your data is organized within the storage location. This defines the hierarchy of folders/directories:

Available organization elements:

  • Projects
  • Collections
  • Cohorts
  • Subjects
  • Sessions
  • Years

Example organization structures:

Subjects → Sessions
Projects → Subjects → Sessions

Add organization elements that match your lab’s file structure conventions using the available element types.

Step 4: Configure Data Storage Protocols

Set up how to access your data storage. You can configure multiple protocols for the same storage location, e.g.:

Storage TypeProtocolPath ExamplePublic
Local DriveLocal Storage/Users/researcher/data/No
Network/ServerSMB/CIFSsmb://uni.domain/group/data/No
Cloud StorageDropboxdata/myprojectNo
Public RepositoryHTTPS/Webhttps://dandiarchive.org/dandiset/123456/Yes

You can configure multiple protocols for the same data storage. This allows flexibility in how different users or systems access the same data.

Assigning Data Storage to Sessions

During Session Creation

  1. When creating a new session, locate the Data storage field
  2. Select from your configured data storage locations in the dropdown
  3. Optionally, specify a Name used in storage - this is the folder/file name used in your actual storage system

For Existing Sessions

  1. Navigate to the session you want to update
  2. Click the Edit button
  3. Select or change the Data storage field
  4. Update the Name used in storage if needed
  5. Save your changes

The “Name used in storage” field helps maintain consistent naming between BrainSTEM and your actual file system, making it easier to locate files programmatically. This is a session-level field, not a data storage field.

Dynamic Data Linking

BrainSTEM’s data storage system enables dynamic construction of file paths based on your metadata and organization structure.

How Dynamic Linking Works

When you associate a data storage with a session, BrainSTEM can automatically construct file paths based on:

  1. Data storage base path: The root location defined in your data storage protocols
  2. Organization structure: How you’ve defined data should be organized
  3. Session metadata: Project names, subject names, session names, dates

Example Dynamic Path Construction

Data Storage Configuration:

FieldValue
NameLab Server Main
Base path/Volumes/StorageName/Data/
OrganizationProjects → Subjects → Sessions

Dynamic Path:

/Volumes/StorageName/Data/{projects}/{subjects}/{sessions}/

Data files should be stored in the session folder based on this organization structure.

Session Information:

FieldValue
ProjectMemory_Study_2024
SubjectMouse_001
SessionBaseline_Recording
Name used in storageses01_baseline

Resulting Path:

/Volumes/StorageName/Data/Memory_Study_2024/Mouse_001/ses01_baseline/

API Access to Dynamic Paths

Use the Python or MATLAB API to programmatically access your data paths:

from brainstem_api_client import BrainstemClient

client = BrainstemClient()

# Load session with data storage information
session_response = client.load_model('session', 
                                   id='your-session-id', 
                                   include=['datastorage'])

session_data = session_response.json()['sessions'][0]
storage_info = session_data['datastorage']

# Access configured protocols and paths
for protocol in storage_info['data_protocols']:
    print(f"Protocol: {protocol['protocol']}")
    print(f"Base path: {protocol['path']}")
    print(f"Public access: {protocol['is_public']}")

Constructing Full Paths in Your Analysis Code

def construct_data_path(session_data):
    """
    Construct full path to session data based on BrainSTEM metadata
    """
    storage = session_data['datastorage']

    # Use the first configured protocol by default.
    # Update this selection if your storage relies on a specific protocol.
    base_path = storage['data_protocols'][0]['path']
    
    # Extract organization elements
    organization = storage['data_organization']
    
    # Build path based on organization structure
    path_components = [base_path]
    
    for element in organization:
        if element['elements'] == 'Projects':
            path_components.append(session_data['projects'][0]['name'])
        elif element['elements'] == 'Subjects':
            path_components.append(session_data['subject']['name'])
        elif element['elements'] == 'Sessions':
            storage_name = session_data.get('name_used_in_storage', 
                                          session_data['name'])
            path_components.append(storage_name)
    
    return '/'.join(path_components)

# Usage
full_path = construct_data_path(session_data)
print(f"Data location: {full_path}")

Best Practices

Naming Conventions

  • Data Storage Names: Use descriptive names that indicate location and purpose
  • Storage Names: Keep file/folder names consistent with your lab’s conventions
  • Avoid Special Characters: Use underscores or hyphens instead of spaces

Organization Strategies

  • Standardize Early: Establish organization patterns before accumulating lots of data
  • Document Conventions: Create lab protocols for file naming and organization
  • Plan for Growth: Design structures that scale with increasing data volumes

Security and Access

  • Authenticated Groups: Required - only users in these groups can access the data storage
  • Public/Private Settings: Configure based on data sensitivity needs
  • Group Matching: Data storage can only be added to sessions in projects that share at least one authenticated group

Troubleshooting Common Issues

Cannot Access Data Storage in Sessions

Problem: Data storage doesn’t appear in the session dropdown

Solutions:

  • Verify you’re a member of at least one authenticated group for the data storage

Path Construction Errors

Problem: Generated paths don’t match actual file locations

Solutions:

  • Review your data organization structure configuration
  • Verify the base paths in your protocols are correct
  • Check that “Name used in storage” matches your actual file/folder names

API Access Issues

Problem: Cannot access data storage information via API

Solutions:

  • Ensure you’re including ‘datastorage’ in your API include parameters
  • Verify your API authentication is working
  • Check that you have proper permissions to access the data storage
  • Confirm you’re using the correct session ID

Integration with Analysis Workflows

Loading Data in Python

import os
from brainstem_api_client import BrainstemClient

def load_session_data(session_id):
    """
    Load session metadata and construct data paths
    """
    client = BrainstemClient()
    
    # Get session with data storage info
    response = client.load_model('session', 
                               id=session_id, 
                               include=['datastorage', 'dataacquisition'])
    
    session = response.json()['sessions'][0]
    
    # Construct data path
    data_path = construct_data_path(session)
    
    # Load actual data files
    data_files = []
    if os.path.exists(data_path):
        data_files = [f for f in os.listdir(data_path) 
                     if f.endswith(('.dat', '.bin', '.h5'))]
    
    return {
        'session_metadata': session,
        'data_path': data_path,
        'data_files': data_files
    }

MATLAB Integration

function data_info = load_session_data(session_id)
    % Load session metadata and construct data paths
    
    % Get session with data storage info
    session_data = load_model('model', 'session', ...
                             'id', session_id, ...
                             'include', {'datastorage', 'dataacquisition'});
    
    % Extract data storage information
    storage = session_data.datastorage;

    % Use the first configured protocol; adjust if a specific protocol is required.
    base_path = storage.data_protocols{1}.path;
    
    % Construct full path (simplified example)
    project_name = session_data.projects{1}.name;
    subject_name = session_data.subject.name;
    storage_name = session_data.name_used_in_storage;
    
    full_path = fullfile(base_path, project_name, subject_name, storage_name);
    
    data_info.session_metadata = session_data;
    data_info.data_path = full_path;
    data_info.exists = exist(full_path, 'dir') == 7;
end

This allows users to access data via local mount, cloud backup, or public repository depending on their needs.

Custom Organization Patterns

Configure your organization structure to match your lab’s file hierarchy:

{
  "data_organization": [
    "Projects",
    "Subjects",
    "Sessions"
  ],
}

Example resulting path:

/Volumes/StorageName/Data/Memory_Study/Mouse_001/Baseline_Recording/

By following this tutorial, you can effectively manage data storage in BrainSTEM, creating seamless links between your metadata and actual data files for efficient analysis workflows.

Next Steps

With data storage configured, you can now effectively manage and analyze your research data:

  • Start experimental workflows: Follow complete experimental documentation with Electrophysiology Workflow tutorial that demonstrates data storage integration
  • Use API tools: Access your data programmatically with Python API tool or MATLAB API tool for seamless integration with analysis workflows
  • Organize experiments: Create structured Behavioral Paradigms that integrate with your data storage system for organized behavioral data
  • Enable collaboration: Set up proper Managing Projects to share your organized data storage with lab members and collaborators
  • Enable open data: Make your data publicly accessible through Sharing Project Publicly to promote open science and collaboration