AI Email Intelligence & CRM: 103K Emails, Gemini Triage, Generated Columns

The Challenge

The client had a email_maildir_ingest table with 103K+ email records — messages ingested from multiple IMAP mailboxes over time. The table was being used for both storage and as the backbone of a customer intelligence system: which contacts were emailing us, what were they asking about, and which ones needed follow-up.

The CRM dashboard was loading slowly. Individual contact lookups were taking 2+ seconds. The AI classification pipeline was processing emails but the results weren't being surfaced efficiently. Each problem had a distinct technical root cause.

The Domain Lookup Problem: 2.2 Seconds

The contact discovery feature works by finding email senders from the same domain as known contacts. For example: if a contact at acme.com is in the CRM, find all emails from other acme.com addresses in the archive.

The initial query used LIKE '%@acme.com' — a leading wildcard pattern that prevents any index from being used. MySQL has to read every row and evaluate the pattern on each one:

Before — 2.2s, full scan of 722K rows

SELECT * FROM contacts
WHERE email
LIKE '%@acme.com'

-- EXPLAIN:
-- type: ALL
-- rows: 722,431
-- Extra: Using where

After — <1ms, index lookup

SELECT * FROM contacts
WHERE email_domain
= 'acme.com'

-- EXPLAIN:
-- type: ref
-- rows: 3
-- Extra: Using index

Stored Generated Column

The solution is a MySQL stored generated column. Instead of computing the domain from the email address at query time, MySQL computes it once when each row is inserted or updated — and stores the result. Combined with a regular index on the generated column, lookups become instant:

MySQL — Stored generated column for email domain

-- Add the generated column (STORED = computed once, saved to disk)
ALTER TABLE email_maildir_ingest
  ADD COLUMN email_domain VARCHAR(100)
  GENERATED ALWAYS AS (
    SUBSTRING_INDEX(`from_address`, '@', -1)
  ) STORED,
  ADD INDEX idx_email_domain (email_domain)
  ALGORITHM=INPLACE LOCK=NONE;

-- The same column on contacts table
ALTER TABLE contacts
  ADD COLUMN email_domain VARCHAR(100)
  GENERATED ALWAYS AS (
    SUBSTRING_INDEX(email, '@', -1)
  ) STORED,
  ADD INDEX idx_sender_domain (email_domain)
  ALGORITHM=INPLACE LOCK=NONE;

-- Domain lookup: 2.2s → 0.0003s (one index row vs 722K full scan)

STORED vs VIRTUAL: We chose STORED (persisted to disk) over VIRTUAL (computed on read) because this column is used in an index and in JOIN operations. Stored generated columns can be indexed; virtual ones generally cannot. The storage cost is negligible — VARCHAR(100) per row on 103K rows is ~10MB.

Covering Index for Dashboard Stats

The CRM dashboard shows per-employee email statistics: total emails, unread counts, emails by label, emails by urgency. These queries run on the email_labels table, which had 92K rows for a single employee (emp_id=3).

The original query was doing a full table scan — MySQL's EXPLAIN showed Using where with 92,000 rows examined. The fix was a covering index: an index that includes all the columns the query needs, so MySQL never has to touch the actual table rows at all. EXPLAIN then shows Using index — the entire query is answered from the index B-tree:

MySQL — Covering index for dashboard stats

-- Dashboard stats query
SELECT label, COUNT(*) as cnt
FROM email_labels
WHERE emp_id = 3
GROUP BY label;

-- Before: full scan, 92K rows examined
-- EXPLAIN: type=ALL, rows=92000, Extra=Using where

-- Covering index: includes all columns the query touches
CREATE INDEX idx_labels_cover
ON email_labels (emp_id, label)
ALGORITHM=INPLACE LOCK=NONE;

-- After: index-only scan
-- EXPLAIN: type=ref, rows=1, Extra=Using index

BEFORE: type=ALL key=NULL rows=92,431 Extra=Using where
AFTER: type=ref key=idx_labels_cover rows=1 Extra=Using index

Contact Discovery

With the domain index in place, we built the contact discovery feature: when viewing an email from an unknown sender, the system automatically checks if any other senders from the same domain are in the contacts table. If yes, it suggests adding this person to the existing company record.

The query is now a simple join on two indexed email_domain columns — millisecond response time even as both tables grow:

MySQL — Contact discovery from email domain

-- Find unknown senders from same domain as existing contacts
SELECT DISTINCT
  e.from_address,
  e.from_name,
  c.company_name
FROM email_maildir_ingest e
JOIN contacts c
  ON e.email_domain = c.email_domain -- both indexed
LEFT JOIN contacts known
  ON e.from_address = known.email
WHERE known.contact_id IS NULL -- not already in CRM
  AND e.email_domain = 'acme.com';

-- Returns unknown senders from known companies instantly

Gemini AI Classification

Every new email ingested into the system is sent to Gemini for classification. The AI extracts:

Category: Sales inquiry, support request, invoice, spam, partnership, complaint, etc.
Urgency: critical / high / medium / low
Sentiment: positive / neutral / negative
Action needed: Free-text description of what response is required
Key entities: Order numbers, product names, dollar amounts extracted automatically

Classification results are stored in a separate email_ai_labels table linked to the original email. This allows re-classification with updated models without modifying the source data.

Body Cleanup for Display

Emails stored in the archive are raw MIME — HTML bodies with full CSS stylesheets, tracking pixels, marketing layouts. Displaying raw HTML in the CRM would be a security risk (XSS) and would look terrible. Processing it for the AI with all the markup noise would waste tokens and reduce accuracy.

The cleanup pipeline strips all CSS and HTML, converting the email to clean plain text for both display and AI processing. A DOMParser-based approach handles well-formed HTML; a regex fallback handles malformed markup from older email clients.

Results

Metric	Before	After
Domain lookup query	2.2s (722K scan)	<1ms (1 row via index)
Dashboard stats query	Table scan, 92K rows	Using index, 1 row
Contact discovery	Not possible	Instant, both tables indexed
Email classification	Manual	AI-automated (Gemini)
EXPLAIN type (labels)	ALL	ref
EXPLAIN Extra (labels)	Using where	Using index
Email body in UI	Raw HTML (broken)	Clean plain text

Tech Stack

Gemini 2.0 Flash API • MySQL 8.0 (InnoDB generated columns) • PHP 8.1 • IMAP • MIME parser • DOMParser • JavaScript (vanilla) • EXPLAIN ANALYZE

‹ Previous

Server & MySQL Optimization

All Case Studies ›

View All Six Case Studies

103,000 Emails, Zero Manual Triage:
AI-Powered Email Intelligence

The Challenge

The Domain Lookup Problem: 2.2 Seconds

Stored Generated Column

Covering Index for Dashboard Stats

Contact Discovery

Gemini AI Classification

Body Cleanup for Display

Results

Tech Stack

Sitting on a Mountain of Unread Emails?

103,000 Emails, Zero Manual Triage:AI-Powered Email Intelligence

The Challenge

The Domain Lookup Problem: 2.2 Seconds

Stored Generated Column

Covering Index for Dashboard Stats

Contact Discovery

Gemini AI Classification

Body Cleanup for Display

Results

Tech Stack

Sitting on a Mountain of Unread Emails?

103,000 Emails, Zero Manual Triage:
AI-Powered Email Intelligence