The Challenge
For years, customer emails and phone numbers were typed into QuickBooks invoice descriptions instead of standard contact fields. The data existed, but no system could trust it.
Standard CSV reports truncated the critical description lines, so the contact details that mattered most never made it into any clean export. With roughly 20,000 records, opening invoices one by one to move details into the right fields would have taken months of manual work.
Without a single source of truth, teams argued over which contact list was correct, and dormant customers could not be reactivated at scale. The practical result was that 19 years of contact history was effectively unreachable.
- Contact details lived in free-text invoice notes, not standard fields
- Standard CSV exports truncated the description lines that held the data
- Roughly 20,000 records made manual cleanup a months-long effort
- No single source of truth, so dormant customers could not be reactivated
What We Built
We built a repeatable pipeline that moves customer contact data from unstructured invoice notes to a governed dataset that systems and AI agents can rely on. It works in three layers: recovery, normalization, and governance.
Recovery Layer. We tested multiple export methods and identified the one that kept full description content, including the phone and email lines that typical CSV exports dropped. This confirmed the contact details were present in the underlying data and proved the issue was extraction and modeling, not data loss. Nothing had to be recreated from scratch.
Normalization Layer. We ingested the exported data into a processing environment built for parsing and cleanup at scale, parsed invoice descriptions using consistent patterns to pull email and phone values out of free text, and normalized and deduped contact details while preserving record linkage so each contact stayed tied to the correct customer. We then prepared import-ready outputs to update the official contact fields.
Governance Layer. We documented the process and definitions so 'how many reachable customers do we have' becomes a single, repeatable answer, and delivered a governed single source of truth ready for CRM sync and AI agents, not a one-time list dump. Clear ownership and record structure mean outreach, reporting, and future automations all reference the same data.
- Export-path validation that preserved full invoice description content
- Pattern-based parsing to extract email and phone values from free text
- Normalization and dedupe with preserved record linkage
- Import-ready field mapping plus documented data definitions and ownership
How It Was Delivered
We built it like a platform, not a one-off cleanup, using a two-phase approach: prove the data is recoverable, then build the reusable asset.
Phase 1, Prove Feasibility. We tested multiple export paths and verified that emails and phone numbers were present in the underlying data even when standard reports failed to surface them, then identified the one export method that preserved full description content. This de-risked the project before any heavy build, shifting the question from 'is this data lost?' to 'how fast can we structure it?'
Phase 2, Clean, Structure, Govern. We built an automation workflow to ingest, parse, normalize, and dedupe contact details while preserving record linkage, produced import-ready outputs, and documented the definitions so the dataset stays trustworthy. End to end in roughly three weeks, roughly 20,000 records became a governed contact foundation ready for CRM sync, outreach, and AI workflows.
The Outcome
Contact history became usable. Data that was trapped in invoice descriptions became exportable, searchable, and structured into clean email and phone fields. Nearly two decades of customer relationships are now reachable instead of effectively written off, so dormant customers can be reactivated at scale.
One version of the truth. Outreach, reporting, and future AI all reference the same governed dataset of roughly 20,000 structured records. There is no more arguing over which contact list is correct, and every downstream initiative starts from data the business can trust.
AI became realistic. In roughly three weeks, quoting assistants, outreach automations, and account workflows gained a consistent customer identity to run on. The foundation does not need to be rebuilt when tools change, so the work compounds instead of stalling on bad inputs.
- 19 years of contact history made usable and reachable
- Roughly 20,000 records unified into one governed source of truth
- A reusable, AI-ready data foundation delivered in about 3 weeks
This is not just cleaning up old data. It is building the trusted customer foundation that every future automation depends on.
Why It Matters
ROI starts before AI. If customer identity and contactability are unreliable, every downstream initiative slows down and becomes harder to measure. Recovering trapped contact history is not a cleanup chore, it is the prerequisite that makes outreach, reporting, and AI worth investing in.
A governed dataset is the real deliverable. Clean data by itself is a snapshot that drifts. Definitions, ownership, and structure mean the single source of truth survives tool changes, staff changes, and the next wave of automation.
A foundation for every future AI workflow. The same governed contact foundation can power quoting assistants, outreach automations, account workflows, and reporting, all on one consistent customer identity. Solve the data trust problem once, then stack AI capabilities on top.