Managing Data Storage
Table of contents
- Introduction
- Understanding Data Storage Concepts
- Creating a Data Storage Location
- Assigning Data Storage to Sessions
- Dynamic Data Linking
- Best Practices
- Troubleshooting Common Issues
- Integration with Analysis Workflows
- Next Steps
Introduction
Data storage in BrainSTEM provides a flexible way to link your metadata to actual data files stored on various platforms. This tutorial covers how to set up data storage locations, associate them with sessions, and configure dynamic data linking for seamless access to your raw data.
Understanding Data Storage Concepts
BrainSTEM’s data storage system consists of three key components:
- Data Storage Locations: Define where your data is physically stored (servers, cloud, local drives)
- Data Organization: How your files are structured within the storage location
- Data Protocols: How to access the data (paths, URLs, access methods)
Creating a Data Storage Location
Step 1: Navigate to Data Storage
- From the dashboard, go to Personal Attributes → Data storage in the left navigation menu
- Click the Add data storage button in the top right corner
Step 2: Configure Basic Information
Fill in the basic details:
Field | Description |
---|---|
Name | Descriptive name for your data storage (e.g., “Lab Server 01”, “Cloud Storage Main”) |
Authenticated groups | Groups that can access this data storage (required) |
Description | Details about stored data types and access requirements |
Public access | Whether this storage should be publicly accessible |
Permissions are directly set on data storages. Data storage has four permission levels: membership (read access), contributors, managers, and owners.
Choose names that clearly identify the storage location and its purpose. Other lab members will see these names when creating sessions.
Step 3: Define Data Organization
Configure how your data is organized within the storage location. This defines the hierarchy of folders/directories:
Available organization elements:
- Projects
- Collections
- Cohorts
- Subjects
- Sessions
- Years
Example organization structures:
Subjects → Sessions
Projects → Subjects → Sessions
Add organization elements that match your lab’s file structure conventions using the available element types.
Step 4: Configure Data Storage Protocols
Set up how to access your data storage. You can configure multiple protocols for the same storage location, e.g.:
Storage Type | Protocol | Path Example | Public |
---|---|---|---|
Local Drive | Local Storage | /Users/researcher/data/ | No |
Network/Server | SMB/CIFS | smb://uni.domain/group/data/ | No |
Cloud Storage | Dropbox | data/myproject | No |
Public Repository | HTTPS/Web | https://dandiarchive.org/dandiset/123456/ | Yes |
You can configure multiple protocols for the same data storage. This allows flexibility in how different users or systems access the same data.
Assigning Data Storage to Sessions
During Session Creation
- When creating a new session, locate the Data storage field
- Select from your configured data storage locations in the dropdown
- Optionally, specify a Name used in storage - this is the folder/file name used in your actual storage system
For Existing Sessions
- Navigate to the session you want to update
- Click the Edit button
- Select or change the Data storage field
- Update the Name used in storage if needed
- Save your changes
The “Name used in storage” field helps maintain consistent naming between BrainSTEM and your actual file system, making it easier to locate files programmatically. This is a session-level field, not a data storage field.
Dynamic Data Linking
BrainSTEM’s data storage system enables dynamic construction of file paths based on your metadata and organization structure.
How Dynamic Linking Works
When you associate a data storage with a session, BrainSTEM can automatically construct file paths based on:
- Data storage base path: The root location defined in your data storage protocols
- Organization structure: How you’ve defined data should be organized
- Session metadata: Project names, subject names, session names, dates
Example Dynamic Path Construction
Data Storage Configuration:
Field | Value |
---|---|
Name | Lab Server Main |
Base path | /Volumes/StorageName/Data/ |
Organization | Projects → Subjects → Sessions |
Dynamic Path:
/Volumes/StorageName/Data/{projects}/{subjects}/{sessions}/
Data files should be stored in the session folder based on this organization structure.
Session Information:
Field | Value |
---|---|
Project | Memory_Study_2024 |
Subject | Mouse_001 |
Session | Baseline_Recording |
Name used in storage | ses01_baseline |
Resulting Path:
/Volumes/StorageName/Data/Memory_Study_2024/Mouse_001/ses01_baseline/
API Access to Dynamic Paths
Use the Python or MATLAB API to programmatically access your data paths:
from brainstem_api_client import BrainstemClient
client = BrainstemClient()
# Load session with data storage information
session_response = client.load_model('session',
id='your-session-id',
include=['datastorage'])
session_data = session_response.json()['sessions'][0]
storage_info = session_data['datastorage']
# Access configured protocols and paths
for protocol in storage_info['data_protocols']:
print(f"Protocol: {protocol['protocol']}")
print(f"Base path: {protocol['path']}")
print(f"Public access: {protocol['is_public']}")
Constructing Full Paths in Your Analysis Code
def construct_data_path(session_data):
"""
Construct full path to session data based on BrainSTEM metadata
"""
storage = session_data['datastorage']
# Use the first configured protocol by default.
# Update this selection if your storage relies on a specific protocol.
base_path = storage['data_protocols'][0]['path']
# Extract organization elements
organization = storage['data_organization']
# Build path based on organization structure
path_components = [base_path]
for element in organization:
if element['elements'] == 'Projects':
path_components.append(session_data['projects'][0]['name'])
elif element['elements'] == 'Subjects':
path_components.append(session_data['subject']['name'])
elif element['elements'] == 'Sessions':
storage_name = session_data.get('name_used_in_storage',
session_data['name'])
path_components.append(storage_name)
return '/'.join(path_components)
# Usage
full_path = construct_data_path(session_data)
print(f"Data location: {full_path}")
Best Practices
Naming Conventions
- Data Storage Names: Use descriptive names that indicate location and purpose
- Storage Names: Keep file/folder names consistent with your lab’s conventions
- Avoid Special Characters: Use underscores or hyphens instead of spaces
Organization Strategies
- Standardize Early: Establish organization patterns before accumulating lots of data
- Document Conventions: Create lab protocols for file naming and organization
- Plan for Growth: Design structures that scale with increasing data volumes
Security and Access
- Authenticated Groups: Required - only users in these groups can access the data storage
- Public/Private Settings: Configure based on data sensitivity needs
- Group Matching: Data storage can only be added to sessions in projects that share at least one authenticated group
Troubleshooting Common Issues
Cannot Access Data Storage in Sessions
Problem: Data storage doesn’t appear in the session dropdown
Solutions:
- Verify you’re a member of at least one authenticated group for the data storage
Path Construction Errors
Problem: Generated paths don’t match actual file locations
Solutions:
- Review your data organization structure configuration
- Verify the base paths in your protocols are correct
- Check that “Name used in storage” matches your actual file/folder names
API Access Issues
Problem: Cannot access data storage information via API
Solutions:
- Ensure you’re including ‘datastorage’ in your API include parameters
- Verify your API authentication is working
- Check that you have proper permissions to access the data storage
- Confirm you’re using the correct session ID
Integration with Analysis Workflows
Loading Data in Python
import os
from brainstem_api_client import BrainstemClient
def load_session_data(session_id):
"""
Load session metadata and construct data paths
"""
client = BrainstemClient()
# Get session with data storage info
response = client.load_model('session',
id=session_id,
include=['datastorage', 'dataacquisition'])
session = response.json()['sessions'][0]
# Construct data path
data_path = construct_data_path(session)
# Load actual data files
data_files = []
if os.path.exists(data_path):
data_files = [f for f in os.listdir(data_path)
if f.endswith(('.dat', '.bin', '.h5'))]
return {
'session_metadata': session,
'data_path': data_path,
'data_files': data_files
}
MATLAB Integration
function data_info = load_session_data(session_id)
% Load session metadata and construct data paths
% Get session with data storage info
session_data = load_model('model', 'session', ...
'id', session_id, ...
'include', {'datastorage', 'dataacquisition'});
% Extract data storage information
storage = session_data.datastorage;
% Use the first configured protocol; adjust if a specific protocol is required.
base_path = storage.data_protocols{1}.path;
% Construct full path (simplified example)
project_name = session_data.projects{1}.name;
subject_name = session_data.subject.name;
storage_name = session_data.name_used_in_storage;
full_path = fullfile(base_path, project_name, subject_name, storage_name);
data_info.session_metadata = session_data;
data_info.data_path = full_path;
data_info.exists = exist(full_path, 'dir') == 7;
end
This allows users to access data via local mount, cloud backup, or public repository depending on their needs.
Custom Organization Patterns
Configure your organization structure to match your lab’s file hierarchy:
{
"data_organization": [
"Projects",
"Subjects",
"Sessions"
],
}
Example resulting path:
/Volumes/StorageName/Data/Memory_Study/Mouse_001/Baseline_Recording/
By following this tutorial, you can effectively manage data storage in BrainSTEM, creating seamless links between your metadata and actual data files for efficient analysis workflows.
Next Steps
With data storage configured, you can now effectively manage and analyze your research data:
- Start experimental workflows: Follow complete experimental documentation with Electrophysiology Workflow tutorial that demonstrates data storage integration
- Use API tools: Access your data programmatically with Python API tool or MATLAB API tool for seamless integration with analysis workflows
- Organize experiments: Create structured Behavioral Paradigms that integrate with your data storage system for organized behavioral data
- Enable collaboration: Set up proper Managing Projects to share your organized data storage with lab members and collaborators
- Enable open data: Make your data publicly accessible through Sharing Project Publicly to promote open science and collaboration