Hypertext Rails

Documentation

Getting Started

Readme

Communication Center

Automation Workflow Flow
Trigger Events and Cadence Rules
Fallback Channel Implementation
Fallback Channel Testing (dev)
Twilio SMS Integration Guide
Email Tracking Setup (sent, delivered, failed, open, click)
SMS Tracking & Twilio Free Tier
AWS SES Delivery Tracking (console setup for delivery webhook)
Compiled Template Guide (layout, components, variables)
Compiled Template — Layout Spec (layout structure, triggers, data sources)
Compiled Template — Gap Analysis (new designs vs current app)
Workflow & Template Features (project-driven recipients, multi-project format)

Procore / Project Groups

Procore Integration — Complete Guide (installation single/grouped, project groups, token storage, why no migration)

Analytics Database

Analytics Database — Reference (tables, how they’re created and populated, formulas and variables for PM and developers)
Analytics DB — Duplication Incident Report (root cause, PK / upsert safeguards)
Analytics DB — Sync Stoppage Incident & Resilience Fix (April 6, 2026 stall; per-step rescue + guaranteed reschedule)

Other Features

Heartbeats Dashboard (kiosk connectivity, queries, sample data)

Analytics DB — Duplication Incident Report

Date: April 15, 2026

Status: Root cause confirmed; safeguards implemented in the codebase; production investigation and duplicate-day audit documented below. No analytics DB reset or data repair has been run in production yet — that remains after merge/deploy and operational sign-off.

1. Summary

32 out of 122 projects (26.9%) have duplicated checkin_submissions records in their analytics databases. The remaining 87 are clean, and 3 have no analytics DB. For example, Project 58 had 634,210 rows in analytics but only 16,596 in source — an inflation of 38x. The worst-affected project (Project 58) had 617,619 duplicate rows.

The duplication was caused by repeated analytics sync operations (admin batch actions and scheduled background jobs) writing to analytics tables that were missing a unique constraint on the id column. Without this constraint, the intended upsert logic silently degraded into repeated inserts.

The duplication traces back to a code refactor on April 15, 2025 (commit cc4c6bf) that changed the populate logic from a safe check-then-insert pattern to Sequel's batch import with on_duplicate_key_update: { conflict_target: [:id] } — without adding the unique constraint this approach requires. Projects that had active scheduled jobs during or shortly after this transition accumulated duplicates before the last_synced_at delta mechanism stabilized. See Section 2.5 for full timeline.

2. Root Cause Analysis

2.1 How the analytics pipeline works

Pepcore Source DB                    Analytics DB (per project)
┌──────────────────────┐             ┌──────────────────────────┐
│ check_in_submission   │  ──sync──▶ │ checkin_submissions       │
│ (PK on id, unique)    │             │ (no PK, no unique index) │
└──────────────────────┘             └──────────────────────────┘

Each project has an analytics database (e.g. analytics_44) with a checkin_submissions table.
Data is copied from the Pepcore source (check_in_submission table) into analytics via a sync/populate process.
The populate code uses Sequel's import with on_duplicate_key_update: { conflict_target: [:id] } — an upsert that requires a unique constraint on id to detect conflicts.

2.2 The defect

When analytics tables were created via init_analytics_db, the id column was added as a plain nullable integer — no primary key, no unique index, no constraints at all.

File: app/models/db_credential.rb — create_checkin_submissions_attributes

connection.alter_table :checkin_submissions do
  add_column :id, :integer unless attributes.include?(:id)
  # ... other columns ...
end
# ← No unique index or PK was created here

2.3 The trigger

Without a unique constraint, conflict_target: [:id] has nothing to conflict against. Every populate run inserted all fetched rows as new records instead of updating existing ones.

Three code paths trigger populate:

Trigger	File	Method
Admin batch action: "Populate Analytics Data"	`app/admin/projects.rb`	`populate_analytics_data`
Admin batch action: "Kickoff Data Fetching"	`app/admin/projects.rb`	`kickoff_data_fetching`
Recurring Delayed Job (hourly)	`app/models/db_credential.rb`	`fetch_data_and_schedule_next_fetch`
ProjectGroup Delayed Job (hourly)	`app/models/project_group.rb`	`fetch_data_and_schedule_next_fetch`

Each execution added duplicate copies of the same source records. Over days/weeks, this compounded into massive inflation.

2.4 Why not all projects were affected equally

Production audit confirmed: both affected and clean projects have last_synced_at set to today (April 16, 2026) and active scheduled jobs. The duplication is historical, not ongoing. The hourly jobs now fetch only delta records (usually empty), so no new duplicates are being created — but old duplicates remain.

Latest production audit (Section 3.3): 119 DbCredential rows checked — 32 duplicated, 87 clean, 0 errors. (122 directory projects minus 3 without a credential / analytics link in that run.)

2.5 Root cause timeline (from git history)

Date	Event	Impact
Mar 14, 2025	Analytics DB feature created (`65e4512`). Populate used safe check-then-insert: `if existing_record.present? → update, else → insert`.	No unique constraint needed. No duplicates possible.
Mar 17, 2025	Most DB credentials created. Jobs kicked off for early projects.	Safe pattern still in use.
Apr 15, 2025	Refactor commit `cc4c6bf`: populate switched to batch `import` with `on_duplicate_key_update: { conflict_target: [:id] }`. `last_synced_at` column added (defaults to `nil` for all existing credentials).	Breaking change: batch upsert requires unique constraint on `id`, which was never added.
Apr 15, 2025 onward	First job runs after deploy: `last_synced_at = nil` → fetches ALL source records → `import` runs against table with no unique constraint → duplicates inserted.	32 projects with active jobs accumulated duplicates during this window.
Eventually	`last_synced_at` stabilized for most projects → subsequent runs fetch empty delta → no new duplicates.	Duplication stopped growing, but existing duplicates remained.

Why only 32 of 82 projects with jobs were affected:

The original check-then-insert code (if existing_record.present?) was safe without a unique constraint. The April 15 refactor removed this safety. Projects that had active hourly jobs during the transition window re-inserted all records before last_synced_at could stabilize. Projects whose first post-refactor populate completed cleanly (setting last_synced_at) only ever fetched empty deltas afterward, avoiding duplication.

3. Evidence (sampled April 15, 2026)

3.1 Project 44 (originally reported)

Metric	Value
Source DB (`check_in_submission`) rows	3,154 (all unique)
Analytics DB (`checkin_submissions`) total rows	55,513
Analytics DB distinct IDs	3,308
Duplicate rows	52,205
Inflation ratio	16.8x
Duplicate window	March 24 – April 2, 2026
Data after April 2	Clean (dup factor 1.0)
Last data timestamp	April 10 (no data since April 11)
Top duplicated ID (447565)	1,189 copies

Note: The duplicate-heavy calendar days for Project 44 are 2026-03-24 through 2026-04-02 (8 days). That is not the same as the April 15, 2025 engineering date when the unsafe batch upsert shipped — see Section 3.4 for all projects’ duplicate-day ranges.

3.2 Random 10-project sample

Project ID	Project Name	Source Count	Analytics Total	Distinct IDs	Duplicate Rows	Ratio
86	Team ML08x	2,388	171,357	2,388	168,969	71.8x
79	Team FRA 134/135	11,435	442,778	11,430	431,348	38.7x
72	Team Newport	9,055	259,518	9,050	250,468	28.7x
51	Mace Pure Dub01	3,702	52,435	3,700	48,735	14.2x
113	Yondr Ops DC FRA1	704	656	656	0	0.93x
111	Equinix/DPR DC17 Virginia	1,055	971	971	0	0.92x
100	Project Advance	1,386	1,274	1,274	0	0.92x
112	Yondr Ops DC NOV1	943	803	803	0	0.85x
110	STO QTS DFW2 DC4	74	53	53	0	0.72x
122	Yondr Collen Building C	0	—	—	—	N/A

4 out of 9 projects with analytics DBs had severe duplication (initial sample).

3.3 Full production audit (counts)

Metric	Value
Total projects (directory)	122
`DbCredential` records checked (prod console)	119
Clean (`total_rows == distinct_ids`)	87
Duplicated	32 (26.9% of checked credentials)
No `checkin_submissions` table / other	0 (in this run)
Errors	0

Three projects have no DbCredential in this run (122 − 119 = 3), e.g. not yet linked to an analytics DB.

Top 10 most affected projects:

Project ID	Project Name	Source Count	Analytics Total	Duplicate Rows	Ratio
58	Team GBL6	16,596	634,210	617,619	38.2x
25	SigmaRoc	20,601	483,924	460,713	23.5x
79	Team FRA 134/135	11,435	442,778	431,348	38.7x
74	Team Madrid	14,930	436,722	421,799	29.3x
52	SAM3A Team	8,495	396,996	388,502	46.7x
73	Team Ersbo	23,743	406,843	383,110	17.1x
50	Team Powergate	44,615	417,783	373,172	9.4x
43	Team PA14	3,891	329,637	325,748	84.7x
71	IPS Rahway MACS	8,890	296,270	287,382	33.3x
72	Team Newport	9,057	259,521	250,468	28.7x

All 32 duplicated projects (sorted by duplicate rows): see Section 3.4 for per-project duplicate-day windows.

3.4 Per-project duplicate-day windows (production)

Method: For each project’s analytics DB, days where COUNT(*) > COUNT(DISTINCT id) on checkin_submissions grouped by created_at::date (duplicate-heavy calendar days in analytics).

Executive summary

March 20, 2026 onward — majority of the volume: In the production audit, 20 of 32 duplicated projects show their first duplicate-heavy day on 2026-03-20 exactly; several others show their first such day a few days later in the same window (2026-03-23–2026-03-30, e.g. Projects 44, 41, 53). Almost all heavy duplication calendar days (where total_rows > distinct_ids per day) fall between 2026-03-20 and 2026-04-06. That pattern is consistent with a bulk operational action around that time — in particular Admin → Projects → batch “Populate Analytics Data” (or equivalent) run across many projects: each run re-fetched a large slice of rows and inserted again because id had no unique constraint, so the same source IDs appear many times on the same created_at dates.
Earlier traces do not contradict that story: 7 of 32 projects have at least one duplicate-heavy day before 2026-03-20 (e.g. 12, 16, 18, 21, 71, 78, 1 — dates in 2024–2025 or scattered). Those are expected from the same structural bug since the April 2025 batch-upsert change: any earlier hourly job or populate that re-fetched overlapping rows could create duplicate-heavy days on whatever calendar dates those check-ins were stamped. So: the big March–April 2026 spike aligns with a deliberate bulk populate; smaller or older spikes show the defect was exploitable before that action as well.
Project 44 (narrower window): Duplicate-heavy days 2026-03-24 through 2026-04-02 only (8 days) — first dup day after 2026-03-20, so not every project participated on the exact same first day.
Minimal / single-day (2026): Project 38 (2026-03-24 only), Project 2 (2026-04-02 only), Project 53 (2026-03-23–24).

Evidence vs. proof: The calendar alignment across dozens of projects is strong evidence for a shared trigger (bulk populate). Server logs / admin audit trails would be the definitive proof of who ran which batch action when; the database alone cannot name the user.

Per-project summary (first duplicate-heavy day → last; count of such days)

Project ID	Name	Dup days	First dup day	Last dup day
58	Team GBL6	14	2026-03-20	2026-04-04
25	SigmaRoc	16	2026-03-20	2026-04-04
79	Team FRA 134/135	14	2026-03-20	2026-04-05
74	Team Madrid	11	2026-03-20	2026-04-01
52	SAM3A Team	14	2026-03-20	2026-04-04
73	Team Ersbo	18	2026-03-20	2026-04-06
50	Team Powergate	17	2026-03-20	2026-04-06
43	Team PA14	14	2026-03-20	2026-04-04
71	IPS Rahway MACS	17	2025-05-05	2026-04-04
72	Team Newport	14	2026-03-20	2026-04-04
32	Team GRQ	18	2026-03-20	2026-04-06
86	Team ML08x	13	2026-03-20	2026-04-03
42	Team LD14	15	2026-03-20	2026-04-04
81	Project NVA05D	11	2026-03-20	2026-04-02
75	Team Frankfurt	14	2026-03-20	2026-04-05
21	Peps Peeps	16	2025-01-23	2026-04-02
48	StructureTone NTT TX4	13	2026-03-20	2026-04-04
44	Team FR8.2	8	2026-03-24	2026-04-02
51	Mace Pure Dub01	12	2026-03-20	2026-04-03
80	Holder SVY01C-F	8	2026-03-20	2026-04-04
76	Team Harlow	9	2026-03-20	2026-04-02
77	DPR Expansion Team	9	2026-03-20	2026-04-03
53	Lebanon Peptide	2	2026-03-23	2026-03-24
14	Lebanon Gen Site	5	2026-03-20	2026-04-03
41	AIB	4	2026-03-30	2026-04-02
16	Yondr Redkite	100	2025-02-13	2025-05-31
38	StructureTone Hawk	1	2026-03-24	2026-03-24
2	CS Demos	1	2026-04-02	2026-04-02
78	PepTalk Live	31	2025-03-20	2026-01-22
12	Team PepTalk	46	2023-11-10	2025-04-01
18	John Paul Drogheda	8	2025-06-04	2025-06-12
1	Legacy	1	2024-03-06	2024-03-06

Full day lists for each project are in the production console output archive (same order as above).

3.5 Source DB integrity confirmed

check_in_submission in Pepcore Base has a primary key (check_in_submission_pkey) and a unique index (check_in_submission_couchdb_id_unique).
For Project 44: COUNT(*) == COUNT(DISTINCT id) — source IDs are fully unique.
The CheckInSubmission model is read-only (def readonly?; true; end) and no write paths exist in this codebase.
Conclusion: source data is clean. Duplication is entirely analytics-side.

4. Safeguards Implemented

4.1 Shared concern: `AnalyticsUpsertSafety`

File: app/models/concerns/analytics_upsert_safety.rb

A reusable module included by both DbCredential and ProjectGroup that:

Checks if a unique index or primary key already exists on the table's id column.
If missing and no duplicates exist: creates the unique index automatically.
If missing and duplicates exist: logs a warning and returns false so the caller skips populate gracefully (no crash, no further duplication).

# Returns true  → safe to upsert
# Returns false → duplicates found, skip populate, log warning
def ensure_unique_id!(connection, table)

4.2 Guard applied to all upsert paths

Both DbCredential and ProjectGroup now call ensure_unique_id! before every upsert on: - checkin_submissions - morale_data

If the guard returns false, the populate step is skipped for that table — the process continues with remaining tables and other projects.

4.3 Error isolation in admin batch actions

File: app/admin/projects.rb

All four batch actions (init_analytics_db, populate_analytics_data, reset_analytics_data, kickoff_data_fetching) now wrap each project in rescue => e: - Failed projects are skipped (not crash the batch). - Admin sees which projects were skipped and why. - Remaining projects proceed normally.

5. Remediation Steps

Step 1: Deploy the safeguard code

Push the changes to production. Files changed:

app/models/concerns/analytics_upsert_safety.rb (new)
app/models/db_credential.rb (updated)
app/models/project_group.rb (updated)
app/admin/projects.rb (updated)

Step 2: Reset all analytics databases

SSH into production, open rails console, and run:

errors = []

DbCredential.includes(:project).find_each do |cred|
  project = cred.project
  next unless project

  puts "Resetting project #{project.id} (#{project.name})..."
  cred.reset_analytics_data
  puts "  Done."
rescue => e
  errors << { project_id: project&.id, error: e.message }
  puts "  SKIPPED: #{e.message}"
end

puts "\n=== COMPLETE ==="
puts "Errors: #{errors.size}"
errors.each { |e| puts "  Project #{e[:project_id]}: #{e[:error]}" }

What reset_analytics_data does per project: 1. Deletes all rows from all 6 analytics tables. 2. Resets last_synced_at to nil (so next populate fetches all source records). 3. Calls populate_database_with_morale_and_checkin_submissions which: - Runs ensure_unique_id! → creates unique index on the now-empty table. - Upserts all source records (now properly deduplicated by the index).

Step 3: Verify all projects are clean

clean = 0; dirty = 0

DbCredential.includes(:project).find_each do |cred|
  next unless cred.project
  cred.with_analytics_connection do |conn|
    next unless conn.table_exists?(:checkin_submissions)
    row = conn.fetch("SELECT COUNT(*) AS total, COUNT(DISTINCT id) AS distinct_ids FROM checkin_submissions").first
    dupes = row[:total].to_i - row[:distinct_ids].to_i
    if dupes > 0
      dirty += 1
      puts "DIRTY project #{cred.project_id}: #{dupes} duplicates remain"
    else
      clean += 1
    end
  end
rescue => e
  puts "ERROR project #{cred.project_id}: #{e.message}"
end

puts "\nClean: #{clean}, Dirty: #{dirty}"

Expected result: Dirty: 0

Step 4: Verify unique index exists

DbCredential.includes(:project).find_each do |cred|
  next unless cred.project
  cred.with_analytics_connection do |conn|
    next unless conn.table_exists?(:checkin_submissions)
    idx = conn.fetch("SELECT indexname FROM pg_indexes WHERE tablename='checkin_submissions' AND indexname='idx_checkin_submissions_unique_id'").first
    status = idx ? "HAS INDEX" : "MISSING INDEX"
    puts "Project #{cred.project_id}: #{status}"
  end
rescue => e
  puts "Project #{cred.project_id}: ERROR #{e.message}"
end

Expected result: all projects show HAS INDEX.

6. Prevention (going forward)

Risk	Mitigation
Repeated populate creating duplicates	Unique index on `id` enforced before every upsert
Dirty DB silently getting worse	`ensure_unique_id!` detects existing duplicates and skips populate
One project error killing entire batch	Per-project rescue in all admin batch actions
Same bug in `morale_data` table	Same safeguard applied to `morale_data` upserts
Same bug in `ProjectGroup` DB	Same concern included in `ProjectGroup` model

7. Files Changed

File	Change
`app/models/concerns/analytics_upsert_safety.rb`	New. Shared concern for unique constraint enforcement.
`app/models/db_credential.rb`	Include concern; replace old guard; add morale_data guard; use `next unless` pattern.
`app/models/project_group.rb`	Include concern; replace old guard; add morale_data guard; use `next unless` pattern.
`app/admin/projects.rb`	Add per-project error handling in all 4 batch actions.