Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Storage Providers

Storage Providers form the data persistence layer of the SCIM Server architecture, providing a clean abstraction between SCIM protocol logic and data storage implementation. They handle pure data operations on JSON resources while remaining completely agnostic to SCIM semantics.

See the StorageProvider API documentation for complete details.

Value Proposition

Storage Providers deliver focused data persistence capabilities:

  • Clean Separation: Pure data operations isolated from SCIM business logic
  • Storage Agnostic: Unified interface works with any storage backend
  • Simple Operations: Focused on PUT/GET/DELETE with minimal complexity
  • Tenant Isolation: Built-in support for multi-tenant data organization
  • Performance Optimized: Direct storage operations without protocol overhead
  • Pluggable Backends: Easy to swap storage implementations

Architecture Overview

Storage Providers operate at the lowest level of the SCIM Server stack:

Resource Provider (Business Logic)
    ↓
Storage Provider (Data Persistence)
├── PUT/GET/DELETE Operations
├── JSON Document Storage
├── Tenant Key Organization
├── Basic Querying & Filtering
└── Backend Implementation
    ↓ (examples)
├── InMemoryStorage
├── SqliteStorage
└── CustomStorage

Design Philosophy

The storage layer follows a fundamental principle: at the storage level, CREATE and UPDATE are the same operation. You're simply putting data at a location. The distinction between "create" vs "update" is business logic that belongs in the Resource Provider layer.

This design provides several benefits:

  • Simplicity: Fewer operations to implement and understand
  • Consistency: Same operation semantics regardless of whether data exists
  • Performance: No need to check existence before operations
  • Flexibility: Storage backends can optimize PUT operations as needed

Core Interface

The StorageProvider trait defines the contract for data persistence:

#![allow(unused)]
fn main() {
pub trait StorageProvider: Send + Sync {
    type Error: std::error::Error + Send + Sync + 'static;

    // Core data operations
    async fn put(&self, key: StorageKey, data: Value) -> Result<Value, Self::Error>;
    async fn get(&self, key: StorageKey) -> Result<Option<Value>, Self::Error>;
    async fn delete(&self, key: StorageKey) -> Result<bool, Self::Error>;
    
    // Query operations
    async fn list(&self, prefix: StoragePrefix, start_index: usize, count: usize) 
        -> Result<Vec<(StorageKey, Value)>, Self::Error>;
    async fn find_by_attribute(&self, prefix: StoragePrefix, 
        attribute_name: &str, attribute_value: &str) 
        -> Result<Vec<(StorageKey, Value)>, Self::Error>;
    
    // Utility operations
    async fn exists(&self, key: StorageKey) -> Result<bool, Self::Error>;
    async fn count(&self, prefix: StoragePrefix) -> Result<usize, Self::Error>;
    async fn clear(&self) -> Result<(), Self::Error>;
    
    // Discovery operations
    async fn list_tenants(&self) -> Result<Vec<String>, Self::Error>;
    async fn list_resource_types(&self, tenant_id: &str) -> Result<Vec<String>, Self::Error>;
    async fn list_all_resource_types(&self) -> Result<Vec<String>, Self::Error>;
    
    // Statistics
    async fn stats(&self) -> Result<StorageStats, Self::Error>;
}
}

Key Organization

Storage uses hierarchical keys for tenant and resource type isolation:

#![allow(unused)]
fn main() {
pub struct StorageKey {
    tenant_id: String,      // "tenant-123"
    resource_type: String,  // "User" or "Group"
    resource_id: String,    // "user-456"
}

// Examples:
// tenant-123/User/user-456
// tenant-123/Group/group-789
// default/User/admin-user
}

Available Implementations

1. InMemoryStorage

Perfect for development, testing, and proof-of-concepts

#![allow(unused)]
fn main() {
use scim_server::storage::InMemoryStorage;

let storage = InMemoryStorage::new();

// Benefits:
// - Instant startup
// - No external dependencies
// - Perfect for unit tests
// - High performance

// Limitations:
// - Data lost on restart
// - Memory usage grows with data
// - Single-process only
}

Use Cases:

  • Unit and integration testing
  • Development environments
  • Demos and prototypes
  • Temporary data scenarios

2. SqliteStorage

Production-ready persistence with zero configuration

#![allow(unused)]
fn main() {
use scim_server::storage::SqliteStorage;

let storage = SqliteStorage::new("scim_data.db").await?;

// Benefits:
// - File-based persistence
// - No server setup required
// - ACID transactions
// - Excellent performance
// - Cross-platform

// Limitations:
// - Single-writer concurrency
// - File-system dependent
// - Size limitations for very large datasets
}

Use Cases:

  • Single-server deployments
  • Small to medium scale applications
  • Desktop applications
  • Edge computing scenarios

3. Custom Storage Implementations

Extend to any backend you need

#![allow(unused)]
fn main() {
use scim_server::storage::StorageProvider;

pub struct RedisStorage {
    client: redis::Client,
}

impl StorageProvider for RedisStorage {
    type Error = redis::RedisError;

    async fn put(&self, key: StorageKey, data: Value) -> Result<Value, Self::Error> {
        let redis_key = format!("{}/{}/{}", key.tenant_id(), key.resource_type(), key.resource_id());
        let json_string = data.to_string();
        
        self.client.set(&redis_key, &json_string).await?;
        Ok(data) // Return what was stored
    }

    // ... implement other methods
}
}

Use Cases

1. Development and Testing

Rapid iteration with in-memory storage

#![allow(unused)]
fn main() {
use scim_server::storage::InMemoryStorage;
use scim_server::providers::StandardResourceProvider;

#[tokio::test]
async fn test_user_operations() {
    // Setup - instant, no external dependencies
    let storage = InMemoryStorage::new();
    let provider = StandardResourceProvider::new(storage);
    
    // Test operations
    let context = RequestContext::with_generated_id();
    let user = provider.create_resource("User", user_data, &context).await?;
    
    // Clean slate for each test
    provider.clear().await;
    
    assert_eq!(provider.get_stats().await.total_resources, 0);
}
}

Benefits: Fast test execution, isolated test environments, no cleanup required.

2. Single-Server Production

Persistent storage without infrastructure complexity

use scim_server::storage::SqliteStorage;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Production-ready storage with single file
    let storage = SqliteStorage::new("/var/lib/scim/users.db").await?;
    let provider = StandardResourceProvider::new(storage);
    
    // Data persists across application restarts
    let server = ScimServer::new(provider);
    server.run("0.0.0.0:8080").await?;
    
    Ok(())
}

Benefits: Data persistence, ACID guarantees, simple deployment.

3. Distributed Systems

Custom storage for scalability

#![allow(unused)]
fn main() {
pub struct CassandraStorage {
    session: Arc<Session>,
    keyspace: String,
}

impl StorageProvider for CassandraStorage {
    async fn put(&self, key: StorageKey, data: Value) -> Result<Value, Self::Error> {
        let cql = "INSERT INTO resources (tenant_id, resource_type, resource_id, data) VALUES (?, ?, ?, ?)";
        
        self.session.query(cql, (
            key.tenant_id(),
            key.resource_type(), 
            key.resource_id(),
            data.to_string()
        )).await?;
        
        Ok(data)
    }
    
    async fn get(&self, key: StorageKey) -> Result<Option<Value>, Self::Error> {
        let cql = "SELECT data FROM resources WHERE tenant_id = ? AND resource_type = ? AND resource_id = ?";
        
        match self.session.query(cql, (key.tenant_id(), key.resource_type(), key.resource_id())).await? {
            Some(row) => {
                let json_str: String = row.get("data")?;
                Ok(Some(serde_json::from_str(&json_str)?))
            }
            None => Ok(None)
        }
    }
}
}

Benefits: Horizontal scalability, high availability, geographic distribution.

4. Hybrid Storage Strategies

Different backends for different use cases

#![allow(unused)]
fn main() {
pub struct HybridStorage {
    hot_storage: InMemoryStorage,    // Recently accessed data
    cold_storage: SqliteStorage,     // Persistent storage
}

impl StorageProvider for HybridStorage {
    async fn get(&self, key: StorageKey) -> Result<Option<Value>, Self::Error> {
        // Try hot storage first
        if let Some(data) = self.hot_storage.get(key.clone()).await? {
            return Ok(Some(data));
        }
        
        // Fall back to cold storage and warm cache
        if let Some(data) = self.cold_storage.get(key.clone()).await? {
            self.hot_storage.put(key, data.clone()).await?;
            return Ok(Some(data));
        }
        
        Ok(None)
    }
}
}

Benefits: Performance optimization, cost efficiency, flexible data lifecycle.

Data Organization Patterns

Tenant Isolation

Storage automatically isolates data by tenant:

Storage Layout:
├── tenant-a/
│   ├── User/
│   │   ├── user-1 → {"id": "user-1", "userName": "alice", ...}
│   │   └── user-2 → {"id": "user-2", "userName": "bob", ...}
│   └── Group/
│       └── group-1 → {"id": "group-1", "displayName": "Admins", ...}
├── tenant-b/
│   └── User/
│       └── user-3 → {"id": "user-3", "userName": "charlie", ...}
└── default/
    └── User/
        └── admin → {"id": "admin", "userName": "admin", ...}

Efficient Querying

Storage providers optimize common query patterns:

#![allow(unused)]
fn main() {
// List all users in a tenant
let prefix = StorageKey::prefix("tenant-123", "User");
let users = storage.list(prefix, 0, 100).await?;

// Find user by username
let matches = storage.find_by_attribute(prefix, "userName", "alice").await?;

// Count resources for capacity planning
let user_count = storage.count(prefix).await?;
}

Performance Considerations

1. Storage Selection by Scale

ScaleRecommended StorageReasoning
< 1K resourcesInMemoryStorageMaximum performance, simple setup
1K - 100K resourcesSqliteStorageBalanced performance, persistence
100K+ resourcesCustom (Postgres/Cassandra)Scalability, advanced features

2. Key Design Impact

Efficient key structure enables fast operations:

#![allow(unused)]
fn main() {
// Good: Hierarchical keys enable prefix operations
let key = StorageKey::new("tenant-123", "User", "user-456");

// Good: Batch operations on prefixes
let prefix = StorageKey::prefix("tenant-123", "User");
let all_users = storage.list(prefix, 0, usize::MAX).await?;
}

3. JSON Storage Optimization

Storage providers work with JSON documents:

#![allow(unused)]
fn main() {
// Storage receives fully-formed JSON
let user_json = json!({
    "id": "user-123",
    "userName": "alice",
    "meta": {
        "resourceType": "User",
        "created": "2023-01-01T00:00:00Z",
        "version": "v1"
    }
});

// Storage doesn't parse or validate - just stores
storage.put(key, user_json).await?;
}

Best Practices

1. Choose Storage Based on Requirements

#![allow(unused)]
fn main() {
// Development: Fast iteration, no persistence needed
let storage = InMemoryStorage::new();

// Production: Small scale, simple deployment
let storage = SqliteStorage::new("app.db").await?;

// Enterprise: High scale, distributed
let storage = CustomDistributedStorage::new().await?;
}

2. Handle Errors Appropriately

#![allow(unused)]
fn main() {
// Good: Specific error handling
match storage.get(key).await {
    Ok(Some(data)) => process_data(data),
    Ok(None) => handle_not_found(),
    Err(e) => log_storage_error(e),
}

// Avoid: Ignoring storage errors
let data = storage.get(key).await.unwrap(); // Can panic!
}

3. Use Efficient Queries

#![allow(unused)]
fn main() {
// Good: Use prefix queries for lists
let prefix = StorageKey::prefix(tenant_id, resource_type);
let resources = storage.list(prefix, start, count).await?;

// Avoid: Individual gets for list operations
for id in resource_ids {
    let key = StorageKey::new(tenant_id, resource_type, id);
    let resource = storage.get(key).await?; // N+1 queries!
}
}

4. Monitor Storage Performance

#![allow(unused)]
fn main() {
// Get insights into storage usage
let stats = storage.stats().await?;
println!("Tenants: {}, Resources: {}", 
         stats.tenant_count, stats.total_resources);

// Use for capacity planning and optimization
if stats.total_resources > 10000 {
    consider_scaling_storage();
}
}

Integration Patterns

Factory Pattern

Create storage based on configuration:

#![allow(unused)]
fn main() {
pub fn create_storage(config: &StorageConfig) -> Box<dyn StorageProvider> {
    match config.storage_type {
        StorageType::Memory => Box::new(InMemoryStorage::new()),
        StorageType::Sqlite { path } => Box::new(SqliteStorage::new(path).await.unwrap()),
        StorageType::Custom { .. } => Box::new(CustomStorage::new(config)),
    }
}
}

Migration Support

Move between storage backends:

#![allow(unused)]
fn main() {
pub async fn migrate_storage<F, T>(from: F, to: T) -> Result<(), Box<dyn std::error::Error>>
where
    F: StorageProvider,
    T: StorageProvider,
{
    let tenants = from.list_tenants().await?;
    
    for tenant_id in tenants {
        let resource_types = from.list_resource_types(&tenant_id).await?;
        
        for resource_type in resource_types {
            let prefix = StorageKey::prefix(&tenant_id, &resource_type);
            let resources = from.list(prefix, 0, usize::MAX).await?;
            
            for (key, data) in resources {
                to.put(key, data).await?;
            }
        }
    }
    
    Ok(())
}
}

The Storage Provider layer enables SCIM Server to work with any data backend while maintaining clean separation between storage concerns and SCIM protocol logic. Whether you need the simplicity of in-memory storage for testing or the scalability of distributed databases for production, the unified interface makes it seamless to switch between implementations.