Hypertext Rails
Documentation
System Documentation
- Heartbeat System
- Performance Metrics System
- Project Auto-Provisioning System
-
Communication Center
- Communication Center ERD
- Automation Workflow Complete Flow - Complete guide covering both scheduled and trigger-based workflows
Quick Links
Kiosk Heartbeat System Documentation
Table of Contents
- Overview
- System Architecture
- Data Flow
- API Documentation
- Database Schema
- Models and Methods
- Heatmap Visualization
- Status Calculation
- Offline Detection Logic
- Deployment Validation
- Security & Rate Limiting
- Performance & Scalability
- Monitoring & Maintenance
- Troubleshooting
Overview
The Kiosk Heartbeat System is a real-time monitoring solution designed to track kiosk connectivity and calculate downtime periods. The system uses a sophisticated dual-table approach with Heartbeat and MissedHeartbeat models to provide comprehensive offline tracking and historical analysis.
Key Features
- Real-time monitoring of kiosk connectivity
- Retroactive offline detection when connectivity resumes
- Granular tracking with 30-second intervals
- Visual heatmap with color-coded status indicators
- Comprehensive tooltips with detailed statistics
- Token-based authentication for security
- Asynchronous processing for optimal performance
- Precomputed calculations for fast heatmap rendering
- Serial number support for device identification
- Flexible kiosk data handling - processes heartbeats even when kiosk data isn't found
- Deployment validation - only displays heartbeats from properly deployed kiosks
- Credential matching - validates machine_id/token and serial number matches
System Architecture
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Kiosk │ │ Heartbeat │ │MissedHeartbeat│
├─────────────┤ ├──────────────┤ ├─────────────┤
│ id │◄───┤ project_id │ │ project_id │
│ project_id │ │ kiosk_id │◄───┤ kiosk_id │
│ name │ │ machine_id │ │ expected_at │
│ hex_node_ │ │ kiosk_name │ │ detected_at │
│ machine_id │ │ serial_number│ │ duration │
└─────────────┘ │ sent_at │ └─────────────┘
│ └──────────────┘
│ │
└────────────────────┼────────────────────┘
│
┌────────▼────────┐
│ Heatmap │
│ Visualization │
└─────────────────┘
Data Flow
1. Heartbeat Reception Process
API Endpoint: POST /api/v1/heartbeats
Request Flow:
Kiosk App → API → Controller → Model → Database
Step-by-Step Process:
Request Validation
- Extract
machine_idortoken_idfrom parameters - Validate at least one identifier is present
- Process
serial_numberparameter (supports various formats)
- Extract
Kiosk Lookup (Optional)
- Query
CheckInLinktable usingmachine_idortoken_id - Extract
project_idandkiosk_idfrom checkin link - If kiosk data not found, continue processing with null values
- Query
Serial Number Processing
- Handle hex strings (like
\x12345%): Decode to decimal and store as string - Handle numeric strings: Store as string
- Handle alphanumeric strings (like
"R9TX80AR2GL"): Store as string directly
- Handle hex strings (like
Heartbeat Creation
- Create
Heartbeatrecord with:project_id,kiosk_id(can be null if kiosk data not found)machine_id,serial_numberkiosk_name(stored for search functionality)sent_at = Time.current(server timestamp)
- Create
Missed Heartbeat Detection (Asynchronous, only if kiosk data found)
- Schedule background job to detect gaps only if both
project_idandkiosk_idare present - Find previous heartbeat for same kiosk
- Calculate gap between heartbeats
- Create
MissedHeartbeatrecords for each 30-second interval in the gap
- Schedule background job to detect gaps only if both
2. Asynchronous Processing
Background Job Processing: ```ruby
When a heartbeat is received, schedule background processing only if kiosk data exists
def schedulemissedheartbeatdetection # Only schedule missed heartbeat detection if we have both projectid and kioskid return unless projectid && kioskid self.delay(queue: 'heartbeatprocessing', priority: 10).detectmissedheartbeats_async end
Background job processes missed heartbeats
def detectmissedheartbeats_async # Find previous heartbeat and calculate gaps # Create missed heartbeat records in bulk # Clear cache to ensure fresh data end ```
Architecture Comparison: Dual-Table vs Single-Table Approaches
This section compares two possible approaches for offline detection and explains why the dual-table architecture was chosen.
Approach 1: Dual-Table Architecture (Current Implementation)
Tables Used: heartbeats + missed_heartbeats
Data Flow Diagram
┌─────────────┐ ┌──────────────┐ ┌─────────────────┐ ┌──────────────┐
│ Kiosk │ │ Heartbeat │ │ DelayedJob │ │MissedHeartbeat│
│ Sends │───▶│ Created │───▶│ Scheduled │───▶│ Records │
│ Heartbeat │ │ │ │ │ │ Created │
└─────────────┘ └──────────────┘ └─────────────────┘ └──────────────┘
│ │ │ │
│ │ │ │
▼ ▼ ▼ ▼
200 OK Fast Response Background Job Gap Analysis
(< 50ms) Processing Complete
(Async)
Step-by-Step Process:
Heartbeat Reception (Synchronous - Fast)
- Kiosk sends heartbeat
- Controller validates and creates
Heartbeatrecord - Returns 200 OK immediately (~5-50ms)
- Schedules background job (only if kiosk data exists)
Gap Detection (Asynchronous - Background)
- DelayedJob worker picks up the task
- Finds previous heartbeat for same kiosk
- Calculates time gap between heartbeats
- If gap > 40 seconds, creates
MissedHeartbeatrecords - Each record represents 30 seconds of missed time
Heatmap Rendering (Fast Queries)
- Query pre-computed
MissedHeartbeatrecords - Simple aggregation:
COUNT(*) * 0.5 minutes per hour - Display results immediately
- Query pre-computed
Example Data:
-- Heartbeats table
| id | kiosk_id | machine_id | serial_number | sent_at |
|----|----------|------------|---------------|---------------------|
| 1 | 148 | abc123 | R9TX80AR2GL | 2025-01-22 03:18:36 |
| 2 | 148 | abc123 | R9TX80AR2GL | 2025-01-22 03:25:54 | -- 7-minute gap
| 3 | 148 | abc123 | R9TX80AR2GL | 2025-01-22 03:31:46 | -- 6-minute gap
-- MissedHeartbeats table (auto-generated)
| id | kiosk_id | expected_at | duration |
|----|----------|---------------------|----------|
| 1 | 148 | 2025-01-22 03:19:06 | 30 |
| 2 | 148 | 2025-01-22 03:19:36 | 30 |
| 3 | 148 | 2025-01-22 03:20:06 | 30 |
| .. | 148 | ... | 30 |
| 26 | 148 | 2025-01-22 03:31:16 | 30 |
Approach 2: Single-Table Architecture (Alternative)
Tables Used: heartbeats only
Data Flow Diagram
┌─────────────┐ ┌──────────────┐ ┌─────────────────┐
│ Kiosk │ │ Heartbeat │ │ Heatmap │
│ Sends │───▶│ Created │───▶│ Calculation │
│ Heartbeat │ │ │ │ (Real-time) │
└─────────────┘ └──────────────┘ └─────────────────┘
│ │ │
│ │ │
▼ ▼ ▼
200 OK Fast Response Gap Analysis
(< 50ms) On-Demand
(Page Load)
Step-by-Step Process:
Heartbeat Reception (Synchronous - Fast)
- Kiosk sends heartbeat
- Controller validates and creates
Heartbeatrecord - Returns 200 OK immediately (~5-50ms)
Gap Detection (On-Demand - Page Load)
- Load all heartbeats for kiosk for the day
- Use
each_cons(2)to analyze consecutive pairs - Calculate gaps > 40 seconds in real-time
- Distribute gap time across hourly buckets
Heatmap Rendering (Computational)
- Query all heartbeats for visible kiosks
- Perform gap calculations in memory
- Display results after computation
Detailed Comparison
| Aspect | Dual-Table (Current) | Single-Table (Alternative) |
|---|---|---|
| Performance | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| Real-time Accuracy | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Scalability | ⭐⭐⭐⭐⭐ | ⭐⭐ |
| Complexity | ⭐⭐ | ⭐⭐⭐⭐⭐ |
| Historical Analysis | ⭐⭐⭐⭐⭐ | ⭐⭐ |
| Development Speed | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
Performance Analysis
Dual-Table Approach
Heartbeat Reception: 5-50ms (Fast - single INSERT)
Background Processing: 100-500ms (Async - doesn't block)
Heatmap Query: 10-50ms (Fast - pre-computed data)
Page Load Total: 15-100ms (Excellent user experience)
Scalability: ✅ O(1) for page loads, O(n) for background processing
Memory Usage: ✅ Low - only processes one heartbeat at a time
Database Load: ✅ Distributed - writes are async, reads are fast
Single-Table Approach
Heartbeat Reception: 5-50ms (Fast - single INSERT)
Gap Calculation: 50-2000ms (Depends on heartbeat count)
Page Load Total: 55-2050ms (Slower with more data)
Scalability: ❌ O(n²) for gap calculations with many heartbeats
Memory Usage: ❌ High - loads all heartbeats into memory
Database Load: ❌ Heavy reads during page loads
Why Dual-Table Architecture Was Chosen
1. Production Scalability
- Page loads remain fast regardless of historical data size
- Background processing doesn't impact user experience
- Horizontal scaling possible with job queue distribution
2. Rich Analytics Capabilities
-- Easy queries with dual-table approach
SELECT COUNT(*) as offline_periods
FROM missed_heartbeats
WHERE kiosk_id = 148 AND DATE(expected_at) = '2025-01-22';
-- Complex analysis possible
SELECT HOUR(expected_at) as hour, COUNT(*) * 0.5 as offline_minutes
FROM missed_heartbeats
WHERE kiosk_id = 148 AND DATE(expected_at) = '2025-01-22'
GROUP BY HOUR(expected_at);
3. Enterprise Features
- Alerting systems can query
missed_heartbeatsdirectly - SLA reporting with pre-computed downtime data
- Historical trend analysis over months/years
- Audit trails of when gaps were detected
Trade-offs Summary
Dual-Table Advantages ✅
- 🚀 Faster page loads at scale
- 📊 Rich historical analytics
- 🔔 Alerting system ready
- 📈 SLA tracking capabilities
- 🏗️ Enterprise-grade architecture
- 🔄 Background job processing
Dual-Table Disadvantages ❌
- 🔧 More complex setup (DelayedJob workers)
- ⏰ Slight delay in gap visibility
- 🗄️ Additional storage for missed heartbeats
- 🐛 More components to debug
Single-Table Advantages ✅
- ⚡ Real-time gap detection
- 🎯 Simpler architecture
- 🛠️ Easier development setup
- 📦 Single source of truth
Single-Table Disadvantages ❌
- 🐌 Slower page loads with historical data
- 📉 Limited analytics capabilities
- 🚫 No alerting infrastructure
- 💾 High memory usage for calculations
Conclusion
The dual-table architecture was chosen because it provides:
- Production-ready performance that scales with business growth
- Enterprise capabilities for monitoring and alerting
- Maintainable complexity with clear separation of concerns
- Future-proof design that supports advanced features
While the single-table approach is simpler for development, the dual-table approach ensures the system remains performant and feature-rich as it scales from monitoring 10 kiosks to 10,000 kiosks.
API Documentation
POST /api/v1/heartbeats
Purpose: Receive heartbeat data from kiosk devices
Request Parameters:
- machine_id (optional): Unique identifier for the kiosk device
- machineId (alternative): Alternative parameter name
- imei (alternative): Alternative parameter name
- token_id (optional): Alternative identifier for the kiosk device
- tokenId (alternative): Alternative parameter name
- token (alternative): Alternative parameter name
- serial_number (required): Device serial number
- serialNumber (alternative): Alternative parameter name
- serial (alternative): Alternative parameter name
- sent_at (optional): Timestamp when heartbeat was sent (defaults to current time)
Serial Number Formats Supported:
- Hex strings: \x12345% → Decoded to decimal and stored as string
- Numeric strings: "12345" → Stored as string
- Alphanumeric strings: "R9TX80AR2GL" → Stored as string directly
Request Example:
bash
curl -X POST "https://your-domain.com/api/v1/heartbeats" \
-H "Content-Type: application/json" \
-d '{
"machine_id": "3f9d974b-3b78-481f-90b2-ad7ffc0e2720",
"token": "3f9d974b-3b78-481f-90b2-ad7ffc0e2720",
"serial": "R9TX80AR2GL"
}'
Response Examples:
Success (200 OK) - Kiosk Data Found:
json
{
"status": "success",
"machine_id": "3f9d974b-3b78-481f-90b2-ad7ffc0e2720",
"serial_number": "R9TX80AR2GL",
"received_serial_raw": "R9TX80AR2GL",
"kiosk_name": "New York",
"project_id": 21,
"kiosk_id": "148"
}
Success (200 OK) - Kiosk Data Not Found:
json
{
"status": "success",
"machine_id": "unknown-device-id",
"serial_number": "R9TX80AR2GL",
"received_serial_raw": "R9TX80AR2GL",
"message": "Heartbeat processed but kiosk data not found"
}
Error Responses: ```json // Missing identifiers (400 Bad Request) { "error": "machineid or tokenid is required" }
// Missing serial number (400 Bad Request) { "error": "serial_number is required" }
// Internal server error (500 Internal Server Error) { "error": "Internal server error" } ```
Database Schema
Heartbeat Table
CREATE TABLE heartbeats (
id INTEGER PRIMARY KEY,
project_id INTEGER, -- Can be NULL if kiosk data not found
kiosk_id STRING, -- Can be NULL if kiosk data not found
machine_id STRING NOT NULL,
kiosk_name STRING,
serial_number STRING NOT NULL, -- Required field for device serial numbers
sent_at DATETIME NOT NULL,
created_at DATETIME,
updated_at DATETIME
);
-- Indexes for optimal performance
CREATE INDEX index_heartbeats_on_project_id_and_kiosk_id ON heartbeats(project_id, kiosk_id);
CREATE INDEX index_heartbeats_on_sent_at ON heartbeats(sent_at);
CREATE INDEX index_heartbeats_on_kiosk_name ON heartbeats(kiosk_name);
CREATE INDEX index_heartbeats_on_serial_number ON heartbeats(serial_number);
MissedHeartbeat Table
CREATE TABLE missed_heartbeats (
id INTEGER PRIMARY KEY,
project_id INTEGER NOT NULL,
kiosk_id STRING NOT NULL,
expected_at DATETIME NOT NULL,
detected_at DATETIME NOT NULL,
duration INTEGER NOT NULL,
created_at DATETIME,
updated_at DATETIME
);
-- Indexes for optimal performance
CREATE INDEX index_missed_heartbeats_on_project_id_and_kiosk_id ON missed_heartbeats(project_id, kiosk_id);
CREATE INDEX index_missed_heartbeats_on_expected_at ON missed_heartbeats(expected_at);
CREATE INDEX index_missed_heartbeats_on_detected_at ON missed_heartbeats(detected_at);
Models and Methods
Heartbeat Model
Key Methods:
# Check if heartbeat is from a properly deployed kiosk
def confirmed?
return false unless kiosk_id && project_id && serial_number.present?
# Get the kiosk's expected serial number
kiosk = Kiosk.find_by(id: kiosk_id)
return false unless kiosk
# Check if serial numbers match using the 'serial' column
kiosk_serial = kiosk.serial
return false unless kiosk_serial.present?
kiosk_serial == serial_number
end
# Current status (online/offline)
def status
sent_at >= 90.seconds.ago ? 'online' : 'offline'
end
# Format offline duration
def offline_duration
return nil if status == 'online'
duration = Time.current - sent_at
hours = (duration / 1.hour).floor
minutes = ((duration % 1.hour) / 1.minute).floor
if hours > 24
days = (hours / 24).floor
days > 1 ? "#{days} days" : "1 day"
elsif hours > 0
"#{hours} hour#{hours > 1 ? 's' : ''} #{minutes} minute#{minutes > 1 ? 's' : ''}"
else
"#{minutes} minute#{minutes > 1 ? 's' : ''}"
end
end
# Null-safe project name
def project_name
return "Unknown Project" unless project_id
Project.find_by(id: project_id)&.name || "Unknown Project"
end
# Null-safe kiosk name
def kiosk_name
return "Unknown Kiosk" unless kiosk_id
Kiosk.find_by(id: kiosk_id)&.name || "Unknown Kiosk"
end
# Null-safe project reference
def project
return nil unless project_id
Project.find_by(id: project_id)
end
# Null-safe kiosk reference
def kiosk
return nil unless kiosk_id
Kiosk.find_by(id: kiosk_id)
end
# Calculate hourly offline minutes for heatmap (precomputed in controller)
# Note: Caching has been removed for simplicity and reliability
def self.hourly_offline_minutes(kiosk_id, date)
return Array.new(24, 0) unless kiosk_id
# Calculate offline minutes for each hour directly from missed heartbeats
# Returns array of 24 integers representing offline minutes per hour
# This is now precomputed in the admin controller to prevent N+1 queries
end
# Calculate total offline time (computed directly)
def self.total_offline_time(kiosk_id, date = Date.current)
return 0 unless kiosk_id
# Calculate total offline seconds for the day directly from database
# Includes initial offline period and missed heartbeat periods
# Precomputed in admin controller for performance
end
# Group consecutive missed heartbeats into periods (computed directly)
def self.offline_periods(kiosk_id, date = Date.current)
return [] unless kiosk_id
# Group consecutive missed heartbeats into offline periods directly from database
# Precomputed in admin controller for performance
# Returns array of hashes with started_at, ended_at, and duration
end
Heatmap Visualization
Color Coding System
| Color | Status | Offline Percentage | Description |
|---|---|---|---|
bg-green-500 |
Fully Online | 0% | No downtime in this hour |
bg-green-300 |
Mostly Online | <25% | Minimal downtime |
bg-yellow-300 |
Partially Offline | 25-50% | Moderate downtime |
bg-red-300 |
Mostly Offline | 50-90% | Significant downtime |
bg-red-500 |
Fully Offline | ≥90% or no heartbeats | Complete downtime |
bg-gray-200 |
No Data | N/A | Future hours or no historical data |
Tooltip Information
When hovering over a heatmap cell, the tooltip displays:
For Past Hours: - Project name - Kiosk name - Current status (Online/Offline) - Offline duration for this hour - Online duration for this hour - Offline percentage - Currently offline duration (if applicable) - Total offline time today - Number of offline periods in this hour
For Future Hours: - Project name - Kiosk name - Status: "No Data (Future Hour)"
Status Calculation
Online/Offline Determination
A kiosk is considered: - Online: If the last heartbeat was received within the last 90 seconds - Offline: If the last heartbeat was received more than 90 seconds ago
def status
sent_at >= 90.seconds.ago ? 'online' : 'offline'
end
Offline Duration Formatting
The system formats offline duration in a human-readable format:
def offline_duration
return nil if status == 'online'
duration = Time.current - sent_at
hours = (duration / 1.hour).floor
minutes = ((duration % 1.hour) / 1.minute).floor
if hours > 24
days = (hours / 24).floor
days > 1 ? "#{days} days" : "1 day"
elsif hours > 0
"#{hours} hour#{hours > 1 ? 's' : ''} #{minutes} minute#{minutes > 1 ? 's' : ''}"
else
"#{minutes} minute#{minutes > 1 ? 's' : ''}"
end
end
Offline Detection Logic
Gap Detection Algorithm
- Find Previous Heartbeat: Locate the most recent heartbeat for the same kiosk
- Calculate Gap: Determine the time difference between expected and actual heartbeat
- Apply Tolerance Window: Only create missed heartbeats if gap > 40 seconds (30s expected + 10s tolerance)
- Create Missed Records: Generate
MissedHeartbeatrecords for each 30-second interval in the gap
Tolerance Window
The system implements a 10-second tolerance window to account for: - Minor network delays - Processing variations - Clock synchronization differences - Temporary connectivity hiccups
Tolerance Rules: - Expected heartbeat interval: 30 seconds - Tolerance window: +10 seconds - Effective threshold: 40 seconds - Result: Only gaps > 40 seconds create missed heartbeat records
This prevents false positives from minor delays while still accurately tracking genuine connectivity issues.
def detect_missed_heartbeats_async
previous_heartbeat = Heartbeat.where(
project_id: project_id,
kiosk_id: kiosk_id
).where('sent_at < ?', sent_at).order(:sent_at).last
if previous_heartbeat
gap_start = previous_heartbeat.sent_at + 30.seconds
gap_end = sent_at
# Add tolerance window: only create missed heartbeats if gap > 40 seconds
# This accounts for minor network delays and processing variations
tolerance_window = 10.seconds
effective_gap_start = gap_start + tolerance_window
# Only create missed heartbeats if gap is more than 40 seconds (30s + 10s tolerance)
if gap_end > effective_gap_start
# Create missed heartbeat records in bulk for better performance
missed_records = []
current_time = gap_start # Start from original 30s mark
while current_time < gap_end
missed_records << {
project_id: project_id,
kiosk_id: kiosk_id,
expected_at: current_time,
detected_at: sent_at,
duration: 30,
created_at: Time.current,
updated_at: Time.current
}
current_time += 30.seconds
end
# Bulk insert for optimal performance
if missed_records.any?
MissedHeartbeat.insert_all(missed_records)
# Note: Cache clearing no longer needed - system now uses direct database queries
# for better reliability and simpler maintenance
end
end
end
end
Deployment Validation
The heartbeat system now includes deployment validation to ensure only properly deployed kiosks with matching credentials are displayed and affect system status. This prevents heartbeats from unknown or mismatched devices from appearing in the admin interface.
Overview
Problem Solved: - Heartbeats from unknown devices were being processed and displayed - Kiosks without proper deployment data appeared in the system - Serial number mismatches could affect real kiosk status
Solution Implemented: - Confirmation system that validates device credentials - Separate tracking of confirmed vs unconfirmed heartbeats - Admin interface filtering to show only confirmed heartbeats - No validation errors - graceful handling of mismatches
Validation Logic
A heartbeat is considered confirmed when:
- Machine ID/Token Match: The
machine_idfrom the request matches aCheckInLink.token - Serial Number Match: The
serial_numberfrom the request matches the kiosk'sserialfield - Kiosk Data Exists: Valid
kiosk_idandproject_idare found
def confirmed?
return false unless kiosk_id && project_id && serial_number.present?
# Get the kiosk's expected serial number
kiosk = Kiosk.find_by(id: kiosk_id)
return false unless kiosk
# Check if serial numbers match using the 'serial' column
kiosk_serial = kiosk.serial
return false unless kiosk_serial.present?
kiosk_serial == serial_number
end
Scopes and Filtering
The system provides efficient scopes for filtering heartbeats:
# Scope for confirmed heartbeats only
scope :confirmed, -> {
all.select(&:confirmed?)
}
# Scope for unconfirmed heartbeats only
scope :unconfirmed, -> {
all.reject(&:confirmed?)
}
# Scope for latest confirmed heartbeat per kiosk (production behavior)
scope :latest_confirmed_per_kiosk, -> {
confirmed.group_by(&:kiosk_id).map do |kiosk_id, heartbeats|
heartbeats.max_by(&:sent_at)
end.sort_by(&:sent_at).reverse
}
Admin Interface Behavior
What Gets Displayed: - ✅ Only confirmed heartbeats appear in the admin interface - ✅ Latest confirmed heartbeat per kiosk (production behavior) - ✅ Confirmation status shown in the interface - ✅ Unconfirmed heartbeats are filtered out completely - ✅ Test kiosks with serial mismatches are automatically hidden from view
What Gets Counted: - ✅ Only confirmed heartbeats affect online/offline status - ✅ Only confirmed heartbeats are used for heatmap calculations - ✅ Only confirmed heartbeats appear in statistics
API Response Enhancement
The API now includes confirmation status in responses:
{
"status": "success",
"machine_id": "3f9d974b-3b78-481f-90b2-ad7ffc0e2720",
"serial_number": "R9TX80AR2GL",
"received_serial_raw": "R9TX80AR2GL",
"confirmed": true,
"kiosk_name": "New York",
"project_id": 21,
"kiosk_id": "148"
}
Testing Scenarios
Valid Deployment (Confirmed):
json
{
"machine_id": "3f9d974b-3b78-481f-90b2-ad7ffc0e2720",
"serial_number": "R9TX80AR2GL"
}
- ✅ Result: confirmed: true
- ✅ Displayed: In admin interface
- ✅ Counted: In status calculations
Mismatched Serial (Unconfirmed):
json
{
"machine_id": "3f9d974b-3b78-481f-90b2-ad7ffc0e2720",
"serial_number": "WRONG-SERIAL"
}
- ❌ Result: confirmed: false
- ❌ Displayed: Not in admin interface
- ❌ Counted: Not in status calculations
Unknown Device (Unconfirmed):
json
{
"machine_id": "unknown-device-id",
"serial_number": "R9TX80AR2GL"
}
- ❌ Result: confirmed: false
- ❌ Displayed: Not in admin interface
- ❌ Counted: Not in status calculations
Benefits
- Security: Only properly deployed kiosks affect system status
- Data Integrity: Prevents false data from unknown devices
- Clean Interface: Admin interface shows only relevant data
- Graceful Handling: No errors for mismatched credentials
- Production Ready: Latest confirmed heartbeat per kiosk behavior
- Scalable: Efficient filtering with database scopes
Implementation Details
Files Modified:
- app/models/heartbeat.rb - Added confirmation logic and scopes
- app/admin/heartbeats.rb - Updated to use confirmed heartbeats
- app/views/admin/heartbeats/_content.html.erb - Added confirmation status column
- app/controllers/api/v1/heartbeats_controller.rb - Added confirmation status to API response
Database Requirements:
- Kiosk table must have serial column (production has this)
- CheckInLink table must have token column (already exists)
- Heartbeat table must have serial_number column (already exists)
Security & Rate Limiting
Rate Limiting (Removed)
Note: Application-level rate limiting has been removed for simplicity.
Recommended alternatives: - Nginx/Load Balancer Rate Limiting - Configure at the infrastructure level - Cloudflare Rate Limiting - If using CDN services - Monitor logs - Watch for unusual API usage patterns
The heartbeat API is now protected by: - Token-based authentication (CheckInLink validation) - CORS restrictions to allowed domains - Input validation and error handling
Validation Rules
# Model validations (updated for required fields)
validates :machine_id, presence: true
validates :serial_number, presence: true
validates :sent_at, presence: true
# Note: project_id and kiosk_id are optional to support unknown devices
# Prevent future timestamps
validate :sent_at_not_in_future
def sent_at_not_in_future
if sent_at.present? && sent_at > Time.current + 1.minute
errors.add(:sent_at, "cannot be in the future")
end
end
Performance & Scalability
Current System Capabilities
- Real-time monitoring of kiosk connectivity
- Retroactive offline detection when connectivity resumes
- Granular tracking with 30-second intervals
- Visual heatmap with color-coded status indicators
- Comprehensive tooltips with detailed statistics
- Token-based authentication for security
- Asynchronous processing for missed heartbeat detection
- Precomputed heatmap calculations for performance
- Bulk database operations for efficiency
- Background job monitoring and management
- Serial number support for device identification
- Flexible kiosk data handling - processes heartbeats even when kiosk data isn't found
Performance Optimizations
- ✅ Database Indexes: Proper indexes on frequently queried columns
- ✅ Caching: Heatmap data cached for 5 minutes
- ✅ Batch Processing: Bulk inserts for missed heartbeats
- ✅ Asynchronous Processing: Delayed Job for missed heartbeat detection
- ✅ Connection Pool Management: Optimized database connections
- ✅ Monitoring Tools: Rake tasks for performance tracking
- ✅ Serial Number Processing: Efficient handling of various serial number formats
Scalability Analysis
Performance Metrics: - 100 kiosks × 30-second intervals = 200 heartbeats/minute = 3.3 heartbeats/second - 1000 kiosks × 30-second intervals = 2000 heartbeats/minute = 33 heartbeats/second - Current system can easily handle 1000+ concurrent heartbeats/second
Why Current Approach Scales Well: - Heartbeat creation: ~5ms per request (synchronous but fast) - Missed heartbeat detection: Asynchronous (doesn't block API) - Database load: Minimal (1 INSERT per heartbeat) - Memory usage: Low (no complex calculations in request) - Serial number processing: Efficient string handling
Monitoring & Maintenance
Heartbeat Processing Rake Tasks
The heartbeat_processing.rake file provides essential monitoring and maintenance tools for the heartbeat system. These rake tasks help system administrators monitor health, troubleshoot issues, and maintain optimal performance.
Available Commands
# Check heartbeat processing job status
rails heartbeat_processing:status
# Clear failed heartbeat processing jobs
rails heartbeat_processing:clear_failed
# Clear heartbeat cache
rails heartbeat_processing:clear_cache
# Show performance statistics
rails heartbeat_processing:stats
Detailed Task Descriptions
1. rails heartbeat_processing:status
- Purpose: Monitor background job health and system status
- Shows: Total, running, failed, and pending jobs
- Lists: Recent heartbeat activity by kiosk
- Use Case: Daily monitoring to ensure background processing is working correctly
Example Output: ``` === Heartbeat Processing Jobs Status === Total heartbeat jobs: 0 Running jobs: 0 Failed jobs: 0 Pending jobs: 0
=== Recent Heartbeat Activity === Kiosk 87 (1st Floor Hallway) - 03:35:24 Kiosk 87 (1st Floor Hallway) - 03:36:24 ```
2. rails heartbeat_processing:clear_failed
- Purpose: Clean up failed background jobs
- Action: Removes stuck/failed Delayed Job records
- Benefit: Frees up the job queue and prevents blocking
- Use Case: When jobs are failing and blocking the processing queue
3. rails heartbeat_processing:clear_cache
- Purpose: Clear cached heatmap data
- Action: Removes cached calculations for all kiosks
- Benefit: Forces fresh data calculation and resolves stale data issues
- Use Case: When heatmap shows outdated information
4. rails heartbeat_processing:stats
- Purpose: Performance monitoring and analytics
- Shows: Today's heartbeat statistics, hourly distribution, missed heartbeat analysis
- Includes: Top offline kiosks ranking and total offline time
- Use Case: Performance analysis and identifying problematic kiosks
Example Output: ``` === Heartbeat Processing Performance Stats === Today's heartbeats: 1,440 Active kiosks today: 1
=== Hourly Distribution === 00:00 - 60 heartbeats 01:00 - 60 heartbeats ...
=== Missed Heartbeats Today === Total missed heartbeats: 0 Total offline time: 0m ```
Admin Interface
Monitor background jobs at: /admin/delayed_jobs
New Features: - Serial Number Column: View device serial numbers in the heartbeat list - Enhanced Search: Search by serial number, kiosk name, ID, or machine ID - Flexible Data Display: Shows "N/A" for missing serial numbers
When to Use Each Command
Daily Monitoring:
bash
rails heartbeat_processing:status # Check if system is healthy
Troubleshooting:
bash
rails heartbeat_processing:clear_failed # If jobs are stuck
rails heartbeat_processing:clear_cache # If heatmap shows old data
Performance Analysis:
bash
rails heartbeat_processing:stats # Get detailed analytics
Production Usage
These commands are essential for production monitoring: - System administrators use them to monitor health - DevOps teams use them for troubleshooting - Support teams use them to investigate issues - Operations teams use them for performance analysis
Debugging Commands
# Check recent heartbeats with serial numbers
Heartbeat.order(:sent_at).last(10).each do |h|
puts "#{h.kiosk_name} - Serial: #{h.serial_number || 'N/A'} - #{h.sent_at}"
end
# Check missed heartbeats for a kiosk
MissedHeartbeat.where(kiosk_id: "69").order(:expected_at)
# Calculate total offline time
Heartbeat.total_offline_time_cached("69", Date.current)
# Check offline periods
Heartbeat.offline_periods_cached("69", Date.current)
# Search heartbeats by serial number
Heartbeat.where("serial_number LIKE ?", "%R9TX%")
Troubleshooting
Common Issues
1. Rate Limit Exceeded
Problem: Receiving "Rate limit exceeded" error Solution: Check if multiple requests are being sent from the same IP within 1 minute
2. Kiosk Not Found
Problem: "Kiosk not found for machineid" error
Solution: Verify the `machineidmatches ahexnodemachine_id` in the Kiosk table
3. Future Timestamp Error
Problem: "sent_at cannot be in the future" error Solution: Ensure the kiosk is sending current timestamps, not future ones
4. Heatmap Not Updating
Problem: Heatmap shows old data Solution: Check if new heartbeats are being received and processed correctly
5. Background Job Issues
Problem: Missed heartbeats not being processed Solution: Check Delayed Job queue and clear failed jobs
6. Serial Number Issues
Problem: Serial number showing as 0 or not displaying correctly
Solution:
- Verify the database field is string type (not integer)
- Check that the serial number is being sent in the correct format
- Ensure the controller is processing the serial number correctly
7. Unknown Device Heartbeats
Problem: Heartbeats from unknown devices not being processed
Solution: The system now supports processing heartbeats even when kiosk data isn't found. Check that:
- At least one identifier (machine_id or token_id) is provided
- The heartbeat is being created successfully (check logs)
- The response indicates "Heartbeat processed but kiosk data not found"
8. Scalability Concerns
Problem: Worried about system load with hundreds of kiosks Solution: Current implementation is optimized for scale
9. Deployment Validation Issues
Problem: Heartbeats not appearing in admin interface Solution: Check if heartbeats are confirmed by verifying: - Machine ID/token matches CheckInLink.token - Serial number matches kiosk.serial field - Kiosk data exists (kioskid and projectid are present)
Problem: All heartbeats showing as unconfirmed
Solution:
- Verify kiosk table has serial column populated
- Check that serial numbers are being sent correctly in API requests
- Ensure CheckInLink records exist and are active
Problem: Confirmed heartbeats not displaying
Solution:
- Check if using latest_confirmed_per_kiosk scope in admin controller
- Verify that confirmed heartbeats exist in database
- Test confirmation logic in Rails console: Heartbeat.last.confirmed?
New Troubleshooting Commands
# Check serial number processing
Heartbeat.where.not(serial_number: nil).last(5).each do |h|
puts "ID: #{h.id}, Serial: #{h.serial_number}, Kiosk: #{h.kiosk_name}"
end
# Check heartbeats without kiosk data
Heartbeat.where(kiosk_id: nil).count
# Search by serial number
Heartbeat.where("serial_number LIKE ?", "%R9TX%").count
# Check for hex-encoded serial numbers
Heartbeat.where("serial_number LIKE ?", "%\\x%").count
# Check deployment validation status
Heartbeat.order(:sent_at).last(10).each do |h|
puts "ID: #{h.id}, Kiosk: #{h.kiosk_id}, Serial: #{h.serial_number}, Confirmed: #{h.confirmed?}"
end
# Check confirmed vs unconfirmed heartbeats
puts "Confirmed: #{Heartbeat.confirmed.count}"
puts "Unconfirmed: #{Heartbeat.unconfirmed.count}"
# Check latest confirmed heartbeats per kiosk
Heartbeat.latest_confirmed_per_kiosk.each do |h|
puts "Kiosk #{h.kiosk_id}: #{h.kiosk_name} - #{h.sent_at} - Confirmed: #{h.confirmed?}"
end
Optimized Query System (Version 3.1)
Overview
The heartbeat system uses optimized direct SQL queries on the core heartbeats and missed_heartbeats tables to provide fast dashboard performance for both real-time and historical data views.
Performance Architecture
approach: - real-time queries: direct aggregation on heartbeats and missed_heartbeats tables - bulk operations: single optimized queries replace multiple individual queries - database-specific optimizations: postgresql and sqlite specific query patterns - intelligent caching: data reuse throughout request lifecycle
performance characteristics: - today's dashboard: fast real-time queries with hour-by-hour breakdown - historical views: optimized cte queries with minimal memory usage - scalability: linear performance scaling with kiosk count - memory efficiency: structured data loading reduces object allocation
Data Flow
real-time (today's data)
heartbeat → missed heartbeat detection → optimized sql aggregation → dashboard display
historical (date ranges)
heartbeat data → bulk cte queries → aggregated results → dashboard display
query optimization strategies
1. bulk heartbeat fetching
-- uses subquery with max(id) to find latest heartbeat per kiosk
-- replaces hundreds of individual queries with single optimized query
select * from heartbeats
where id in (
select max(id) from heartbeats
where kiosk_id in ('148', '149', '150', ...)
group by kiosk_id
)
2. hourly aggregation (postgresql)
-- complex cte that aggregates heartbeats and missed heartbeats by hour
-- eliminates redundant table scans by processing all data in single query
with target_kiosks as (...),
heartbeats_today as (...),
missed_heartbeats_today as (...)
select kiosk_id, hour_num, has_heartbeats, offline_minutes
from aggregated_data
3. daily aggregation (date ranges)
-- optimized queries for historical data with database-specific adaptations
-- uses different strategies for small (≤7 days) vs large (>7 days) date ranges
implementation features
- database compatibility: optimized queries for postgresql and sqlite
- intelligent data reuse: status filtering reuses already-fetched heartbeat data
- bulk operations: single queries handle entire kiosk sets
- memory efficiency: minimal object allocation through structured processing
- scalable architecture: performance scales linearly with data volume
performance metrics
- query reduction: 10-100x fewer database queries for large datasets
- response time: significantly faster page loads for multi-kiosk views
- memory usage: reduced object allocation through efficient processing
- scalability: linear performance scaling with kiosk count
Dashboard Controller Architecture (Version 3.1)
Overview
The heartbeats dashboard controller has been completely refactored following dry principles and best practices. this refactoring improves maintainability, performance, and code clarity while providing comprehensive documentation for critical data fetching operations.
controller architecture improvements
1. structured execution flow
the main index action now follows a clear 9-step execution pattern:
def index
# 1. extract and validate request parameters
initialize_request_parameters
# 2. determine view type based on date range (hourly for today, daily for ranges)
@view_type = determine_view_type
# 3. build optimized kiosk query with joins and filters
base_kiosk_query = build_base_kiosk_query
# 4. fetch kiosks and heartbeat data using efficient bulk operations
all_kiosks, latest_heartbeats = fetch_kiosks_and_heartbeats(base_kiosk_query)
# 5. apply status filtering if requested (reuses already-fetched heartbeat data)
all_kiosks, latest_heartbeats = apply_status_filtering(all_kiosks, latest_heartbeats)
# 6. build heartbeat objects and calculate pagination metrics
sorted_heartbeat_objects = build_and_sort_heartbeat_objects(all_kiosks, latest_heartbeats)
# 7. paginate results and calculate display metrics
setup_pagination_and_counts(sorted_heartbeat_objects)
# 8. load supplementary data for current page (check-ins, projects)
load_supplementary_data
# 9. load time-series data based on view type (hourly vs daily)
load_time_series_data
end
2. bulk data fetching optimizations
critical data fetching operations now use sophisticated bulk queries to minimize database round trips:
- latest heartbeats: single query using subquery with max(id) strategy
- hourly data: complex cte queries aggregating multiple tables at once
- daily data: optimized queries with database-specific adaptations
- check-in submissions: eager loading to prevent n+1 queries
3. data reuse patterns
the controller implements intelligent data reuse to avoid redundant queries:
- status filtering: reuses already-fetched heartbeat data instead of querying again
- sorting operations: leverages pre-loaded project and heartbeat data
- pagination: calculates metrics from already-processed objects
4. database adapter compatibility
the controller provides robust database compatibility with optimized queries for different adapters:
- postgresql: uses advanced cte features and array operations
- sqlite: uses json_each and recursive cte patterns
- fallback handling: defaults to postgresql syntax for unknown adapters
performance characteristics
bulk query optimization results:
- before: hundreds of individual queries for large datasets
- after: single optimized queries for entire kiosk sets
- improvement: 10-100x reduction in database queries
data reuse optimization results:
- before: multiple queries for status filtering and sorting
- after: single dataset used for multiple operations
- improvement: eliminates redundant database calls
memory efficiency improvements:
- structured data loading: loads only required data for each operation
- paginated processing: processes only current page data for expensive operations
- bulk operations: reduces object allocation overhead
code organization
modular method structure:
the controller is now organized into logical sections with clear responsibilities:
- parameter handling: validation and safe parsing
- query building: optimized database query construction
- bulk operations: efficient data fetching strategies
- data processing: transformation and sorting logic
- view preparation: pagination and display metrics
comprehensive documentation:
all critical methods now include detailed comments explaining:
- purpose: what the method accomplishes
- strategy: how the optimization works
- performance: why this approach was chosen
- database queries: what sql operations are performed
dry principle implementation:
- extracted constants: business logic values defined at class level
- reusable methods: common operations extracted into dedicated methods
- consistent patterns: similar operations follow the same structure
- eliminated duplication: removed redundant code paths
architectural benefits
1. maintainability
- clear method responsibilities and organized code sections
- comprehensive documentation for complex operations
- consistent naming and structure patterns
2. performance
- optimized bulk queries for all major operations
- intelligent data reuse throughout the request lifecycle
- minimal database round trips and memory allocation
3. scalability
- efficient handling of large kiosk datasets
- optimized queries that scale with data volume
- intelligent query strategies based on data size
4. reliability
- robust error handling and safe parameter parsing
- database adapter compatibility for different environments
- comprehensive validation and fallback mechanisms
implementation details
files modified:
app/controllers/heartbeats/heartbeats_dashboard_controller.rb- complete refactoring
key improvements:
- 9-step structured execution flow for clear operation sequence
- bulk data fetching using optimized database queries
- intelligent data reuse to eliminate redundant operations
- comprehensive documentation for all critical methods
- database compatibility with adapter-specific optimizations
performance metrics:
- query reduction: 10-100x fewer database queries for large datasets
- response time: significantly faster page loads for multi-kiosk views
- memory usage: reduced object allocation through efficient processing
- scalability: linear performance scaling with kiosk count
Future Enhancement Opportunities
Completed Enhancements ✅
- Optimized Query System: Direct SQL queries with bulk operations for massive performance gains
- Historical Analysis: 30-day+ views now practical and fast using optimized CTE queries
- Scalable Architecture: System handles millions of heartbeats efficiently with linear scaling
- Controller Refactoring: Implemented dry principles and comprehensive documentation
Potential Additions (Not Yet Implemented):
- Alerting System: Notify administrators when kiosks go offline
- Data Retention Policy: Automatic cleanup of old records
- Health Check Endpoint: System health monitoring API
- Webhook Integration: Send notifications to external systems
- Mobile App: Mobile monitoring interface
- Serial Number Analytics: Track device performance by serial number
- Unknown Device Management: Interface for managing unknown devices
- Real-time Summary Updates: Live summary updates for current day
Last Updated: Sep 2025 Version: 3.1 - Dashboard Controller Refactoring + Optimized Query System Maintainer: Tonic Labs Ltd Development Team