Operations Guide#

System administration and maintenance guide for VAULT03.

System Overview#

flowchart TB
    subgraph Internet
        Users[Users]
    end

    subgraph Server["VAULT03 Server"]
        NGINX[nginx<br/>reverse proxy]
        API[vault03 API<br/>Go binary]
        DB[(SQLite<br/>Database)]
        FILES[(File<br/>Storage)]
    end

    subgraph Workers["Background Workers"]
        EMAIL[Email Worker]
        SMS[SMS Worker]
        POSTAL[Postal Worker]
    end

    subgraph External["External Services"]
        BREVO[Brevo API]
        TELEGRAM[Telegram]
    end

    Users -->|HTTPS:443| NGINX
    NGINX -->|proxy :8041| API
    API --> DB
    API --> FILES

    API -->|queue| EMAIL
    API -->|queue| SMS
    API -->|queue| POSTAL

    EMAIL --> BREVO
    SMS --> BREVO
    API --> TELEGRAM

    style NGINX fill:#16a34a,color:#fff
    style API fill:#3b82f6,color:#fff
    style DB fill:#1e3a5f,color:#fff

Daily Operations#

Health Monitoring#

Check system health:

# Service status
sudo systemctl status vault03
sudo systemctl status nginx

# Health endpoint
curl https://vault03.yourdomain.com/api/health

# Check disk usage
df -h /opt/vault03/data

# Memory usage
free -h

# Process info
ps aux | grep vault03

Log Review#

Application logs:

# Real-time logs
sudo journalctl -u vault03 -f

# Last 100 lines
sudo journalctl -u vault03 -n 100

# Today's errors
sudo journalctl -u vault03 --since today --priority=err

# Application log file
tail -f /opt/vault03/logs/vault03.log

nginx logs:

# Access log
tail -f /var/log/nginx/vault03-access.log

# Error log
tail -f /var/log/nginx/vault03-error.log

# Failed login attempts
grep "401" /var/log/nginx/vault03-access.log

Database Maintenance#

Check database size:

ls -lh /opt/vault03/data/data.db
sqlite3 /opt/vault03/data/data.db "SELECT page_count * page_size / 1024 / 1024.0 as size_mb FROM pragma_page_count(), pragma_page_size();"

Vacuum database (reclaim space):

# Stop service first
sudo systemctl stop vault03

# Vacuum
sqlite3 /opt/vault03/data/data.db "VACUUM;"

# Restart service
sudo systemctl start vault03

Database integrity check:

sqlite3 /opt/vault03/data/data.db "PRAGMA integrity_check;"

Backup and Restore#

VAULT03 implements a two-tier backup strategy:

  1. Local backups - Fast rollback during deployments (tar archives)
  2. Off-site backups - Disaster recovery using restic (encrypted, deduplicated)
flowchart LR
    subgraph Production["Production Data"]
        DB[(data.db)]
        FILES[/files/]
        CONFIG[config.yaml]
        KEYS[Encryption Keys]
    end

    subgraph LocalBackup["Local Backup (Deployment)"]
        TAR[Tar Archives<br/>/apps/_backups/]
    end

    subgraph OffsiteBackup["Off-site Backup (Restic)"]
        RESTIC[Encrypted Snapshots<br/>/backups/restic/]
    end

    DB --> TAR
    FILES --> TAR

    DB --> RESTIC
    FILES --> RESTIC
    CONFIG --> RESTIC

    TAR -->|Deployment rollback| Production
    RESTIC -->|Disaster recovery| Production

    KEYS -.->|Manual<br/>secure storage| OFFLINE[(Offline<br/>Storage)]

    style KEYS fill:#dc2626,color:#fff
    style OFFLINE fill:#dc2626,color:#fff
    style RESTIC fill:#16a34a,color:#fff

Backup Strategy#

What is backed up:

DataLocal BackupOff-site BackupFrequency
DatabaseYes (sqlite3 .backup)Yes (snapshot)Every deployment / Daily 3 AM
Vault filesYes (tarball)Yes (deduplicated)Every deployment / Daily 3 AM
ConfigurationPreserved (not overwritten)YesDaily 3 AM
Encryption keysNo (manual offline storage)NoManual

Retention policy (off-site):

  • 7 daily snapshots
  • 4 weekly snapshots
  • 6 monthly snapshots
  • 2 yearly snapshots

Off-site Backups with Restic#

VAULT03 uses restic for encrypted, deduplicated off-site backups. Restic is configured automatically during installation.

Key features:

  • Encrypted at rest - All backup data is encrypted with your repository password
  • Deduplicated - Only changed data is stored, saving space
  • Verified - Weekly integrity checks ensure backup validity
  • Automated - Daily backups run via cron at 3:00 AM

Configuration Files#

FilePurpose
/etc/vault03/restic.envRepository location
/etc/vault03/restic-passwordRepository password (root-only, mode 600)
/usr/local/bin/vault03-backupBackup script
/var/log/vault03-backup.logBackup logs

Manual Backup Commands#

# Run backup manually
sudo /usr/local/bin/vault03-backup

# List snapshots
sudo RESTIC_REPOSITORY=/backups/restic \
     RESTIC_PASSWORD_FILE=/etc/vault03/restic-password \
     restic snapshots

# Check backup integrity
sudo RESTIC_REPOSITORY=/backups/restic \
     RESTIC_PASSWORD_FILE=/etc/vault03/restic-password \
     restic check

# View backup statistics
sudo RESTIC_REPOSITORY=/backups/restic \
     RESTIC_PASSWORD_FILE=/etc/vault03/restic-password \
     restic stats

Restore from Restic#

# Stop service
sudo systemctl stop vault03

# List available snapshots
sudo RESTIC_REPOSITORY=/backups/restic \
     RESTIC_PASSWORD_FILE=/etc/vault03/restic-password \
     restic snapshots

# Restore specific snapshot (replace SNAPSHOT_ID)
sudo RESTIC_REPOSITORY=/backups/restic \
     RESTIC_PASSWORD_FILE=/etc/vault03/restic-password \
     restic restore SNAPSHOT_ID --target /tmp/restore

# Copy restored files to application directory
sudo cp /tmp/restore/tmp/vault03-db-snapshot.db /apps/vault03/data/data.db
sudo cp -r /tmp/restore/apps/vault03/data/files/* /apps/vault03/data/files/
sudo cp /tmp/restore/apps/vault03/config.yaml /apps/vault03/config.yaml

# Fix permissions
sudo chown -R vault03:vault03 /apps/vault03

# Start service
sudo systemctl start vault03

For true disaster recovery, configure restic to use a remote repository:

# Edit /etc/vault03/restic.env
# Examples:
# SFTP: RESTIC_REPOSITORY=sftp:backup-user@backup-server:/vault03-backups
# S3:   RESTIC_REPOSITORY=s3:s3.amazonaws.com/bucket-name/vault03
# REST: RESTIC_REPOSITORY=rest:https://backup-server:8000/vault03

After changing the repository, reinitialise:

sudo RESTIC_REPOSITORY=<new-repo> \
     RESTIC_PASSWORD_FILE=/etc/vault03/restic-password \
     restic init

Local Deployment Backups#

Local backups are created automatically during each deployment:

Database backup (/apps/_backups/db/):

  • Created using sqlite3 .backup for consistency
  • Includes WAL and SHM journal files
  • Keeps last 30 backups

Full backup (/apps/_backups/):

  • Tarball of entire /apps/vault03 directory
  • Keeps last 10 backups

Restore from Local Backup#

# Stop service
sudo systemctl stop vault03

# Restore database
sudo cp /apps/_backups/db/YYYY-MM-DD__HH-MM-SS_data.db /apps/vault03/data/data.db

# Or restore full backup
sudo tar -xzf /apps/_backups/YYYY-MM-DD__HH-MM-SS_vault03.tar.gz -C /apps

# Fix permissions
sudo chown -R vault03:vault03 /apps/vault03

# Start service
sudo systemctl start vault03

Restore Procedure#

Restore database:

# Stop service
sudo systemctl stop vault03

# Restore database backup
cp /opt/vault03/backup/data-YYYYMMDD-HHMMSS.db /opt/vault03/data/data.db

# Set permissions
chown vault03:vault03 /opt/vault03/data/data.db
chmod 640 /opt/vault03/data/data.db

# Start service
sudo systemctl start vault03

Restore files:

# Stop service
sudo systemctl stop vault03

# Extract backup
tar -xzf /opt/vault03/backup/files-YYYYMMDD-HHMMSS.tar.gz -C /opt/vault03/data/

# Set permissions
chown -R vault03:vault03 /opt/vault03/data/files
chmod -R 750 /opt/vault03/data/files

# Start service
sudo systemctl start vault03

Log Rotation#

Configure logrotate for VAULT03:

Create /etc/logrotate.d/vault03:

/opt/vault03/logs/*.log {
    daily
    rotate 30
    compress
    delaycompress
    missingok
    notifempty
    create 0640 vault03 vault03
    sharedscripts
    postrotate
        systemctl reload vault03 > /dev/null 2>&1 || true
    endscript
}

/var/log/nginx/vault03-*.log {
    daily
    rotate 30
    compress
    delaycompress
    missingok
    notifempty
    create 0640 www-data adm
    sharedscripts
    postrotate
        systemctl reload nginx > /dev/null 2>&1 || true
    endscript
}

Test logrotate:

sudo logrotate -d /etc/logrotate.d/vault03

Updates and Upgrades#

Application Updates#

  1. Backup everything (see backup procedure above)

  2. Build new version:

# On development machine
cd api
GOOS=linux GOARCH=amd64 go build -o vault03 .

cd ../ui
pnpm build
  1. Stop service:
sudo systemctl stop vault03
  1. Deploy new version:
# Copy new binary
scp api/vault03 vault03@server:/opt/vault03/api/vault03.new
ssh vault03@server "mv /opt/vault03/api/vault03.new /opt/vault03/api/vault03"

# Copy new UI
scp -r ui/dist/* vault03@server:/opt/vault03/ui/
  1. Run database migrations:
# Migrations run automatically on startup
# Check logs to verify
  1. Start service:
sudo systemctl start vault03
  1. Verify:
# Check logs
sudo journalctl -u vault03 -n 50

# Test health endpoint
curl https://vault03.yourdomain.com/api/health

# Test login

System Updates#

# Update packages
sudo apt-get update
sudo apt-get upgrade -y

# Reboot if kernel updated
sudo reboot

SSL Certificate Renewal#

Let’s Encrypt certificates auto-renew via certbot. Verify:

# Test renewal
sudo certbot renew --dry-run

# Check expiration
sudo certbot certificates

Performance Tuning#

Database Optimization#

Enable WAL mode (better concurrency):

sqlite3 /opt/vault03/data/data.db "PRAGMA journal_mode=WAL;"

Optimize query planner:

sqlite3 /opt/vault03/data/data.db "ANALYZE;"

nginx Tuning#

Edit /etc/nginx/nginx.conf:

worker_processes auto;
worker_rlimit_nofile 65535;

events {
    worker_connections 2048;
    use epoll;
    multi_accept on;
}

http {
    # Connection pooling
    keepalive_timeout 65;
    keepalive_requests 100;

    # Gzip compression
    gzip on;
    gzip_vary on;
    gzip_comp_level 6;
    gzip_types text/plain text/css text/xml text/javascript
               application/json application/javascript application/xml+rss;

    # File descriptor cache
    open_file_cache max=10000 inactive=60s;
    open_file_cache_valid 120s;
    open_file_cache_min_uses 2;
    open_file_cache_errors on;
}

System Tuning#

Edit /etc/sysctl.conf:

# Network tuning
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 8192
net.ipv4.ip_local_port_range = 1024 65535

# File descriptors
fs.file-max = 2097152

# Apply changes
sudo sysctl -p

Monitoring and Alerting#

flowchart TB
    subgraph Checks["Health Checks"]
        DISK[Disk Space<br/>check-disk.sh]
        SVC[Service Status<br/>check-service.sh]
        HEALTH[Health Endpoint<br/>/api/health]
        LOGS[Log Monitoring<br/>journalctl]
    end

    subgraph Schedule["Scheduled via Cron"]
        CRON[crontab]
    end

    subgraph Alerts["Alert Conditions"]
        A1[Disk > 80%]
        A2[Service Down]
        A3[Health != 200]
        A4[Errors > 10/hr]
    end

    subgraph Actions["Alert Actions"]
        EMAIL[Send Email]
        LOG[System Log]
        RESTART[Auto-restart]
    end

    CRON -->|every 15 min| DISK & SVC & HEALTH & LOGS

    DISK --> A1
    SVC --> A2
    HEALTH --> A3
    LOGS --> A4

    A1 --> EMAIL & LOG
    A2 --> RESTART & EMAIL & LOG
    A3 --> EMAIL & LOG
    A4 --> EMAIL & LOG

    style A1 fill:#f59e0b,color:#000
    style A2 fill:#dc2626,color:#fff
    style A3 fill:#dc2626,color:#fff
    style A4 fill:#f59e0b,color:#000

Disk Space Monitoring#

Create /opt/vault03/scripts/check-disk.sh:

#!/bin/bash
THRESHOLD=80
USAGE=$(df /opt/vault03/data | tail -1 | awk '{print $5}' | sed 's/%//')

if [ $USAGE -gt $THRESHOLD ]; then
    echo "WARNING: Disk usage is ${USAGE}% (threshold: ${THRESHOLD}%)"
    logger -t vault03-monitor "Disk space critical: ${USAGE}%"
    # Send alert (email, SMS, etc.)
fi

Schedule with cron:

*/15 * * * * /opt/vault03/scripts/check-disk.sh

Service Monitoring#

Create /opt/vault03/scripts/check-service.sh:

#!/bin/bash

# Check if service is running
if ! systemctl is-active --quiet vault03; then
    echo "CRITICAL: vault03 service is not running"
    logger -t vault03-monitor "Service down, attempting restart"
    sudo systemctl start vault03
    # Send alert
fi

# Check health endpoint
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" https://vault03.yourdomain.com/api/health)
if [ "$HTTP_CODE" != "200" ]; then
    echo "CRITICAL: Health check failed (HTTP $HTTP_CODE)"
    logger -t vault03-monitor "Health check failed: $HTTP_CODE"
    # Send alert
fi

Log Monitoring#

Monitor for errors:

# Real-time error monitoring
sudo journalctl -u vault03 -f | grep -i "error\|panic\|fatal"

Create alert for repeated errors:

# Count errors in last hour
ERROR_COUNT=$(sudo journalctl -u vault03 --since "1 hour ago" | grep -c "ERROR")
if [ $ERROR_COUNT -gt 10 ]; then
    echo "WARNING: $ERROR_COUNT errors in the last hour"
    # Send alert
fi

Troubleshooting#

High Memory Usage#

# Check process memory
ps aux --sort=-%mem | grep vault03

# Restart service to clear memory
sudo systemctl restart vault03

Database Locked Errors#

# Check for long-running queries
fuser /opt/vault03/data/data.db

# Enable WAL mode for better concurrency
sqlite3 /opt/vault03/data/data.db "PRAGMA journal_mode=WAL;"

File Upload Failures#

# Check disk space
df -h /opt/vault03/data

# Check permissions
ls -la /opt/vault03/data/files

# Check nginx buffer settings
grep client_max_body_size /etc/nginx/sites-available/vault03

SSL Certificate Issues#

# Check certificate
sudo certbot certificates

# Renew manually
sudo certbot renew --force-renewal

# Reload nginx
sudo systemctl reload nginx

Security Operations#

Audit Log Review#

# Access database
sqlite3 /opt/vault03/data/data.db

# Recent audit entries
SELECT * FROM audit_logs ORDER BY changed_at DESC LIMIT 50;

# Failed login attempts
SELECT * FROM audit_logs
WHERE entity_type = 'users'
AND action = 'u'
ORDER BY changed_at DESC;

# Admin actions
SELECT * FROM audit_logs
WHERE changed_by IS NOT NULL
AND entity_type IN ('customers', 'users')
ORDER BY changed_at DESC;

User Management#

List all users:

sqlite3 /opt/vault03/data/data.db "SELECT id, name, email, role, two_factor_enabled FROM users;"

Disable user account:

sqlite3 /opt/vault03/data/data.db "UPDATE users SET deleted_at = datetime('now') WHERE email = 'user@example.com';"

Reset user 2FA:

sqlite3 /opt/vault03/data/data.db "UPDATE users SET two_factor_enabled = 0, totp_secret = NULL WHERE email = 'user@example.com';"

Incident Response#

flowchart TD
    A[Incident Detected] --> B[Assess Severity]

    B -->|Critical| C[Isolate System]
    B -->|High| D[Investigate]
    B -->|Low| E[Monitor & Log]

    C --> F[Block Traffic<br/>ufw deny in]
    F --> G[Capture Evidence]
    G --> H[Review Audit Logs]

    D --> H
    H --> I{Breach Confirmed?}

    I -->|Yes| J[Rotate Keys]
    I -->|No| K[Document & Close]

    J --> L[Force Password Reset]
    L --> M[Patch Vulnerability]
    M --> N[Restore Service]
    N --> O[Post-mortem]

    E --> K

    style C fill:#dc2626,color:#fff
    style J fill:#dc2626,color:#fff
    style L fill:#f59e0b,color:#000

If compromise suspected:

  1. Isolate the system:
# Block incoming traffic
sudo ufw deny in
  1. Capture evidence:
# Dump current connections
netstat -antp > /tmp/connections.txt

# Copy logs
cp /opt/vault03/logs/*.log /tmp/incident-logs/
cp /var/log/nginx/*.log /tmp/incident-logs/
  1. Review audit logs for unauthorized access

  2. Change all encryption keys (requires re-sharing all vaults)

  3. Force password reset for all users

  4. Review and patch security vulnerabilities

Maintenance Windows#

Planned Downtime#

  1. Notify users at least 24 hours in advance
  2. Backup everything
  3. Perform maintenance
  4. Test thoroughly
  5. Restore service
  6. Verify functionality
  7. Notify users service restored

Emergency Maintenance#

  1. Assess impact and urgency
  2. Quick backup (if safe to do so)
  3. Fix issue
  4. Test
  5. Restore service
  6. Document incident for post-mortem

Contact and Support#