Hypertext Rails

Documentation

Getting Started

Communication Center

Procore / Project Groups

Other Features

Heartbeats Dashboard - Complete Guide

Overview

The heartbeats dashboard monitors kiosk connectivity with 400+ kiosks pinging every 5 seconds (~6.9M heartbeats/day). This document covers query analysis, optimizations, sample data generation, and implementation details.


Problem Statement

Original Issue: Dashboard taking 30+ seconds to load due to: 1. Loading ALL kiosks upfront (not paginated) 2. Loading ALL check-in submissions (298ms bottleneck) 3. Multiple scans of large heartbeat tables 4. Inefficient latest heartbeat queries


Optimizations Implemented

1. ✅ Optimized Latest Heartbeat Query

File: app/services/heartbeats/kiosk_data_fetcher.rb

Before: MAX(id) per kiosk_id, then WHERE IN (slow on millions of rows)

After: - PostgreSQL: DISTINCT ON (kiosk_id) - fastest approach - SQLite: Window function ROW_NUMBER() OVER (PARTITION BY kiosk_id ORDER BY id DESC)

Performance: 10-50x faster (seconds → <100ms)

2. ✅ Fixed Check-In Submissions Query

File: app/controllers/heartbeats/heartbeats_dashboard_controller.rb

Before: Load ALL submissions → group in Ruby → take last (298ms)

After: Get MAX(created_at) per link → fetch only those records

Performance: ~30x faster (298ms → <10ms)

3. ✅ Last-seen store (kiosklastheartbeats)

Files: db/migrate/*_create_kiosk_last_heartbeats.rb, app/models/kiosk_last_heartbeat.rb, app/models/heartbeat.rb, app/services/heartbeats/kiosk_data_fetcher.rb

Problem: On a large heartbeats table (e.g. 20M+ rows), the query “latest heartbeat per kiosk for all kiosks” can take 60–80+ seconds on cold load, even with indexes.

Solution: A small last-seen store table kiosk_last_heartbeats: one row per kiosk (kiosk_id, last_heartbeat_id, sent_at, updated_at). It is updated on every new heartbeat (Heartbeat after_create callback). The dashboard reads from this table for “latest per kiosk” instead of scanning heartbeats.

  • On heartbeat create: Upsert into kiosk_last_heartbeats for that kiosk_id (one extra write per heartbeat).
  • On dashboard load: KioskDataFetcher reads from kiosk_last_heartbeats for counts and for the current page’s latest heartbeats. If the table is missing or empty, it falls back to querying the heartbeats table (slower).
  • One-time backfill: After adding the feature or after bulk-importing heartbeats, run rake heartbeat_processing:backfill_kiosk_last_heartbeats to populate the store from existing heartbeats (can be slow on very large tables).

Performance: Dashboard load no longer depends on scanning the big heartbeats table for “latest per kiosk”; it reads a small table (~300 rows) so cold and warm loads stay fast.

4. Expected Results

  • Before (original): 30+ seconds page load; with large tables, 60–80+ seconds on cold load.
  • After: Fast first load and refresh when the last-seen store is populated; accurate online/offline/never-connected counts for all kiosks.

Original vs Current (Difference from commit 7d84ae3)

Commit 7d84ae3 ("Added Optimization on heartbeats feature") is the before state. Here’s what changed.

1. KioskDataFetcher

Aspect Original (7d84ae3) Current
What it loads Always loads all kiosks: total_count = base_query.count, all_kiosks = base_query.to_a, then latest heartbeats for all kiosk IDs. Fast path (no status filter, sort by projectname/kioskid/kioskname/serialnumber): loads only the current page of kiosks (e.g. 10) with LIMIT/OFFSET, latest heartbeats for those 10 only, plus a separate lightweight query that gets latest sent_at for all kiosks to compute accurate counts. Full path (status filter or sort by status/lastseen/offlineduration): same as original (load all).
Return value [all_kiosks, latest_heartbeats, total_count, total_pages, kiosks_data] (5 values). Fast path: [page_kiosks, latest_heartbeats, total_count, total_pages, kiosks_data, status_counts] (6 values). Full path: same 6, with status_counts = nil.
Counts No status_counts; controller and PaginationHandler derive counts from the full in-memory list. Fast path: status_counts from dedicated query (all kiosks, latest sent_at only). Full path: still from full list.
Online/offline rule Not in fetcher; applied later on heartbeat objects. In compute_accurate_status_counts: online only if sent_at >= 90.seconds.ago && sent_at <= Time.current (no future-dated), plus timezone normalization for sent_at.
Extra logic None. use_page_only_fetch?, fetch_page_only_fast, compute_accurate_status_counts, fetch_latest_heartbeat_times. When the last-seen store (kiosk_last_heartbeats) exists and has data, latest heartbeats and counts are read from it; otherwise fallback to querying heartbeats (lightweight: IDs + sentat only, or full rows for page). Timezone normalization for sentat.

2. HeartbeatsDashboardController

Aspect Original (7d84ae3) Current
Fetch result Unpacks 5 values; no status_counts. Unpacks 6 values; may receive status_counts.
Status filter Always runs apply_status_filtering; then may set @total_count / @total_pages from filtered result. Runs only when status_counts is nil (full path). When status_counts present (fast path), skip status filter step for the fetch (filtering not requested on fast path).
Pagination handle_pagination(sorted_heartbeat_objects) — 1 arg. handle_pagination(sorted_heartbeat_objects, status_counts) — passes precomputed counts when available.
Heatmap Always load_time_series_data(latest_heartbeats). Same: always load heatmap (hourly or 30d) to monitor all kiosks.
Cache No cache headers. before_action :disable_browser_caching (no-store, no-cache).
Params

3. PaginationHandler

Aspect Original (7d84ae3) Current
Constructor (sorted_objects, current_page, per_page, total_count) — 4 args. Adds optional 5th: status_counts = nil.
Paginated list Always @sorted_objects[offset, @per_page]. If status_counts present: sorted_objects is already the current page, use as-is; else slice as before.
Counts Always from full @sorted_objects: `count { h

4. Heartbeat model / StatusFilter / TimeSeriesDataLoader (online rule)

Aspect Original (7d84ae3) Current
Online sent_at >= 90.seconds.ago (or sent_at > 90.seconds.ago in StatusFilter). Future-dated sent_at could count as online. Online only if sent_at >= threshold && sent_at <= Time.current (same in Heartbeat#status, StatusFilter, TimeSeriesDataLoader). Future-dated = offline.
Timezone No explicit normalization. sent_at normalized to Time.zone where we compare (e.g. in fetcher’s count query and fetch_latest_heartbeat_times).

Summary

  • Original: One path only — load all kiosks and all their latest heartbeats, filter/sort in memory, paginate for display. Counts were correct because they were taken from the full sorted_objects list. Cost: slow when there are many kiosks (e.g. 300+), especially on first load in an iframe.
  • Current: Two paths. Fast path: load only the current page of kiosks and their heartbeats, plus a cheap “latest sent_at per kiosk” query to get accurate online/offline/never/total for all kiosks — so faster load and still accurate counts. Full path: same as original when status filter or non-DB sort is used. Additional fixes: future-dated heartbeats no longer count as online, timezone handling for counts, optional heatmap skip, and no-cache headers for the dashboard.

What We Fixed (Summary)

Area Fix
Load speed Fast path: load only the current page of kiosks (e.g. 10) instead of all 300+. DB-level LIMIT/OFFSET on kiosks.
Accurate filter counts Counts (online / offline / never connected / total) are for all kiosks (matching filters), not just the current page. A separate lightweight query fetches latest sent_at per kiosk and counts in Ruby.
Future-dated heartbeats sent_at in the future (wrong clock / bad data) no longer counted as "online". Rule: online only if sent_at is within last 90 seconds and ≤ now.
Timezone Latest sent_at values are normalized to Time.zone before comparing to 90.seconds.ago, so dev/prod timezone differences don’t skew counts.
Browser cache Dashboard sends Cache-Control: no-store, no-cache so the page always loads fresh.
Heatmap Heatmap is always loaded to monitor all kiosks.

Result: Faster first load (especially in iframe) and accurate readings for online, offline, total, and never connected.


Page Load Process (Step-by-Step)

When you open the heartbeats dashboard (e.g. in an iframe):

  1. Params & view type
    Parse page, per_page, sort_by, sort_direction, project_id, search, status, start_date, end_date. Decide view type: hourly (today) vs daily.

  2. Fetch kiosks and heartbeats (KioskDataFetcher)

    • Last-seen store: When the kiosk_last_heartbeats table exists and has rows, the fetcher reads from it for “latest heartbeat per kiosk” (both for the current page and for counts). No scan of the large heartbeats table. If the store is missing or empty, it falls back to querying heartbeats (slower on cold load).
    • Fast path (no status filter, sort by projectname / kioskid / kioskname / serialnumber):
      • Load only current page of kiosks (e.g. 10) with LIMIT/OFFSET from PepCore DB.
      • Fetch latest heartbeats for those 10 from last-seen store (or fallback: main DB heartbeats).
      • Run accurate count query: all kiosk IDs (same filters) → latest sent_at per kiosk from last-seen store (or fallback) → count online / offline / never_connected (90s rule + no future).
      • Total kiosk count for pagination (count query or from last page).
    • Full path (status filter or sort by status/lastseen/offlineduration):
      • Load all kiosks (filtered), latest heartbeats for all (from store or fallback), then filter/sort in memory; counts from that full set.
  3. Status filter (full path only)
    If user chose "Online" / "Offline" / "Never connected", filter the kiosks and heartbeats in memory and update total count/pages.

  4. Build and sort
    Build heartbeat display objects (with status, last seen, etc.), sort by chosen column. Fast path: list is already one page and DB-ordered.

  5. Pagination
    Slice to current page (full path) or use existing page (fast path). Set @online_count, @offline_count, @never_connected_count from the precomputed counts (fast path) or from the full sorted set (full path).

  6. Supplementary data
    Load projects dropdown, check-in links and latest check-in submissions for the current page kiosks only.

  7. Time-series / heatmap
    Load hourly or daily heatmap and missed-heartbeat data for the current page kiosks.

  8. Render
    Layout uses bare layout and allows iframe; response sends no-cache headers so the next open is fresh.


Query Flow (Legacy / Full Path)

  1. Load ALL kiosks with joins (Query 1)
  2. Find latest heartbeat IDs for all kiosks (Query 2) - OPTIMIZED
  3. Load projects (Query 3)
  4. Filter by status (in-memory)
  5. Build & sort heartbeat objects (in-memory)
  6. Paginate (in-memory, slice array)
  7. Load check-in links for current page (Query 5)
  8. Load latest check-in submissions (Query 6) - OPTIMIZED
  9. Load time-series data via CTE (Query 8-10)

Database Indexes Required

-- Heartbeats (CRITICAL)
CREATE INDEX idx_heartbeats_kiosk_id_id ON heartbeats(kiosk_id, id DESC);
CREATE INDEX idx_heartbeats_kiosk_sent_at ON heartbeats(kiosk_id, sent_at);

-- Check-In Submissions (CRITICAL)
CREATE INDEX idx_check_in_submissions_link_created ON check_in_submissions(check_in_link_id, created_at DESC);

-- Missed Heartbeats
CREATE INDEX idx_missed_heartbeats_kiosk_expected ON missed_heartbeats(kiosk_id, expected_at);

Sample Data Generation

Usage

# Standard: 30 days, 30-second intervals
rails heartbeat_processing:generate_sample_data[30,30]

# Fast (raw SQL): 30 days, 30-second intervals
rails heartbeat_processing:generate_sample_data_fast[30,30]

# Production-like: 30 days, 5-second intervals (WARNING: ~34M records!)
rails heartbeat_processing:generate_sample_data[30,5]

What Gets Created

  • Heartbeat records in heartbeats table
  • One heartbeat per kiosk per interval (e.g., every 30 seconds)
  • Each record: kiosk_id, project_id, sent_at, kiosk_name, machine_id, serial_number
  • Random offline periods: 5% chance per day, 1-4 hours duration

Performance

  • Standard method: ~10,000-50,000 records/second
  • Fast method: ~50,000-100,000 records/second
  • Skips callbacks: insert_all automatically skips after_create callback

Will It Show on Dashboard?

Yes! The dashboard shows: 1. Latest heartbeat per kiosk — determines online/offline status (from last-seen store when populated, else from heartbeats). 2. Heatmap — hourly/daily data from all heartbeats. 3. Summary counts — online/offline/never_connected based on latest heartbeats.

Note: Sample data generated with insert_all (e.g. generate_sample_data_fast) does not run Heartbeat callbacks, so kiosk_last_heartbeats is not updated. After generating sample data, run rake heartbeat_processing:backfill_kiosk_last_heartbeats once so the dashboard uses the last-seen store.

Clear Data

rails heartbeat_processing:clear_sample_data

Summary Counts (Current Behavior)

Accurate counts for ALL kiosks

Summary counts (Online / Offline / Never Connected) are calculated from all kiosks that match the current filters (project, search), not just the 10 on the current page.

Fast path (no status filter, sort by projectname / kioskid / kioskname / serialnumber): 1. Load only the current page of kiosks (e.g. 10) from the DB. 2. Fetch latest heartbeats for those 10 kiosks from the last-seen store kiosk_last_heartbeats (or fallback to heartbeats if store empty). 3. Counts: Read latest sent_at per kiosk for all kiosks (matching filters) from the last-seen store (or fallback) → Online / Offline / Never connected reflect the full set (e.g. out of 300 kiosks), not just the current page. 4. Pagination uses total kiosk count.

Full path (status filter or sort by status/lastseen/offlineduration): 1. Load all kiosks, fetch latest heartbeats for all, filter/sort in memory, then paginate. Counts come from the full filtered set.

Online/offline rule (consistent everywhere): - Online: latest sent_at is within the last 90 seconds and not in the future (future-dated sent_at = bad data / wrong kiosk clock → counted as offline). - Offline: latest sent_at older than 90 seconds, or in the future. - Never connected: no heartbeat for that kiosk.


Last-seen store (kiosklastheartbeats)

What it is

  • Table: kiosk_last_heartbeats — one row per kiosk: kiosk_id (PK), last_heartbeat_id, sent_at, updated_at.
  • Purpose: Answer “what is the latest heartbeat per kiosk?” without scanning the large heartbeats table. Same pattern used by many large apps (e.g. Redis or a small “last seen” table updated on every event).

How it’s updated

  • On every Heartbeat create: Heartbeat has after_create :update_kiosk_last_heartbeat_store, which upserts into kiosk_last_heartbeats for that kiosk_id (one extra write per heartbeat). If the table doesn’t exist or the upsert fails, the callback rescues and logs; the heartbeat is still saved.

How the dashboard uses it

  • KioskDataFetcher checks KioskLastHeartbeat.table_exists?. If the table exists:
    • Counts: fetch_latest_heartbeat_times(kiosk_ids) reads kiosk_id, sent_at from kiosk_last_heartbeats and classifies online/offline/never in Ruby.
    • Current page heartbeats: fetch_latest_heartbeats_bulk(kiosk_ids) reads kiosk_id, last_heartbeat_id from the store, then loads full Heartbeat rows by those IDs.
  • If the table is missing or has no rows for the requested kiosks, the fetcher falls back to querying the heartbeats table (same logic as before; can be slow on cold load with a large table).

Backfill (one-time)

After adding the feature or after bulk-importing heartbeats (e.g. sample data that skipped callbacks), populate the store once:

rake heartbeat_processing:backfill_kiosk_last_heartbeats

This runs a single query that inserts/updates kiosk_last_heartbeats from “latest heartbeat per kiosk” in heartbeats. On very large heartbeats tables it may take a minute or more.


Files Modified

  1. db/migrate/*_create_kiosk_last_heartbeats.rb — Creates last-seen store table.
  2. app/models/kiosk_last_heartbeat.rb — Model for kiosk_last_heartbeats (PK = kiosk_id).
  3. app/models/heartbeat.rbafter_create :update_kiosk_last_heartbeat_store; online rule (90s + not future).
  4. app/services/heartbeats/kiosk_data_fetcher.rb — Reads from last-seen store when available; fallbacks to heartbeats; fast path + accurate counts.
  5. app/controllers/heartbeats/heartbeats_dashboard_controller.rb — No-cache headers; 6-value fetch; heatmap always loaded.
  6. app/services/heartbeats/pagination_handler.rb — Optional status_counts; uses precomputed counts when present.
  7. app/services/heartbeats/status_filter.rb — Online rule (90s + not future).
  8. app/services/heartbeats/time_series_data_loader.rb — Same online rule for heatmap.
  9. lib/tasks/heartbeat_processing.rake — Sample data, purge, backfillkiosklast_heartbeats, etc.

Testing

  1. Generate sample data: rails heartbeat_processing:generate_sample_data[7,60,50]
  2. Test dashboard performance
  3. Verify indexes exist
  4. Monitor query times in logs

Why 300 Kiosks Every 5 Seconds Can Cause 5+ Minute Loads

Ingestion load: - 300 kiosks × 1 heartbeat every 5s ≈ 60 heartbeat INSERTs per second - Each insert triggers after_create :schedule_missed_heartbeat_detection60 delayed jobs enqueued per second - Each job runs detect_missed_heartbeats_async: queries for "previous heartbeat" and may bulk-insert into missed_heartbeats - So: 60 writes/sec to heartbeats + 60 job executions/sec (reads + possible writes to missed_heartbeats)

Table growth: - ~60 rows/sec × 86,400 sec/day ≈ 5.2M heartbeats per day with no purging - After weeks/months the heartbeats table has tens of millions of rows → "latest per kiosk" and "previous heartbeat" queries slow down - Dashboard and jobs both read this table → everything gets slower

Optimizations implemented: 1. DB-level pagination (app/services/heartbeats/kiosk_data_fetcher.rb): When no status filter and sort is DB-based (projectname, kioskid, etc.), load ONLY the current page (10 kiosks) from PepCore DB, then fetch their heartbeats. Avoids loading all 300 kiosks on every iframe visit. 2. Debounce missed-heartbeat jobs (app/models/heartbeat.rb): At most one job per kiosk per 90 seconds (Rails.cache). Cuts job rate from ~60/sec to ~3/sec. 3. Purge old data (lib/tasks/heartbeat_processing.rake): Run rails heartbeat_processing:purge_old[30] daily (e.g. cron 0 2 * * *) to delete heartbeats and missed_heartbeats older than 30 days. Keeps table size bounded so queries stay fast.

Why not single SQL query? - The app uses TWO databases: PepCore DB (kiosk, project) and Main DB (heartbeats) - Cross-database JOINs don't work in PostgreSQL/SQLite without special setup - Current approach queries each DB separately and joins in Ruby (fast with page-only optimization)


Future Optimizations

  1. Materialized view for latest heartbeats (refresh every 30s)
  2. Heartbeat summary table for time-series data
  3. Database-level pagination (requires SQL-based status filtering)
  4. Caching for project/check-in link data