PHILOSOPHY
We open-sourced our financial data schema. Here's why.
Two years ago we started building a treasury product. We spent the first three months writing normalization code: taking Plaid's snake_case responses, Stripe's amount-in-cents, QuickBooks' journal entry trees, and Xero's invoice objects, and turning them into a shape our application could reason about.
We shipped it. It worked. We moved on to building the actual treasury features.
Six months later we started building an FP&A product. On day one, our team opened a new repository and started writing the same normalization code.
Nine months later we started building a month-end close product. On day one, we opened another new repository and — you can see where this is going — started writing the same normalization code.
At some point during that third implementation, we realized we were solving the same problem for the fourth time across four different codebases. Worse, we were introducing the same five bugs each time. The sign flip. The pending-to-posted transition. The currency drift. The category assumptions. The type/subtype confusion. We wrote about those bugs in a separate post because they're universal — every fintech team hits them.
That's when we extracted the normalization layer, cleaned it up, published it under MIT, and called it ClareMesh.
This post explains why we open-sourced it rather than keeping it as proprietary infrastructure, and why we think more fintech infrastructure should follow the same pattern.
The closed-source era of fintech data
For fifteen years, the default architecture for financial data integration has been closed.
Plaid is closed-source. Their normalization logic is their moat. If your application needs Plaid data, you pay Plaid, and you get their shape.
Stripe is closed-source in the same way. Their data model is their data model. You adapt.
Merge.dev is closed-source by design. They unify multiple providers behind a single API. Their unification logic is the product. You rent access.
Codat, Rutter, Finch — all closed-source. All built on the premise that financial data unification is valuable enough to be a paid SaaS service and proprietary enough to be a defensible moat.
This architecture worked. It still works for many use cases. We're not going to argue it was wrong.
Why closed worked
Three conditions made closed-source fintech data unification defensible from 2010 to roughly 2022:
Integration complexity was a real moat. Building and maintaining connections to 20+ banks through fragile scraping APIs was genuinely hard. Plaid's value was that you didn't have to do it. Most development teams couldn't build what Plaid offered even if they wanted to.
Data shape was proprietary. Each provider returned something slightly different, and the intermediate shape a unification service produced was genuinely novel. There was intellectual property in the mapping logic.
Trust was transactional. Customers accepted that if they wanted Plaid's data, they'd send data through Plaid's servers. The zero-egress architecture wasn't yet an expectation.
For most of a decade, those three conditions held, and closed-source unification was the rational default.
What changed
Three things have shifted since 2022.
The integrations became commoditized. Plaid, Teller, MX, Finicity, and increasingly open banking APIs (FDX in the US, PSD2 in Europe) have largely stabilized the ability to connect to banks. The raw connection is no longer the moat. What you do with the data after you have it — that's become the harder problem.
AI-native finance teams arrived. The first generation of fintech applications was spreadsheet replacements. The new generation is AI copilots, autonomous agents, continuous reconciliation engines. These systems don't just read data — they reason over it, detect patterns, generate forecasts, write journal entries. They need data in a canonical, typed, validated shape. Not Plaid's shape. Not Stripe's shape. A shape designed for AI consumption.
Data residency became table stakes. A decade ago, customers shrugged at "your data passes through our servers." Today, fintech CISOs reject that architecture on slide two. GDPR, CCPA, banking regulations across six jurisdictions, and enterprise data governance teams have all converged on the same requirement: if you want to process our financial data, do it on our infrastructure.
Each of these shifts weakens the closed-source unification model. The first commoditizes the raw capability. The second demands a shape the closed vendors don't ship. The third rejects the zero-egress architecture entirely.
The open-source thesis
In this new environment, we believe the right architecture for financial data infrastructure looks like this:
The schema is open. The shape financial data takes — Account, Transaction, Entity, Balance, Forecast — should be a shared standard, not a trade secret. Every fintech team should be able to inspect, audit, extend, and fork the schema. If the canonical shape of a financial transaction is owned by one company, every downstream application is at the mercy of that company's product decisions.
The transforms are open. The logic that turns Plaid's API response into a canonical Transaction object has no legitimate reason to be closed. It's plumbing. It's the kind of code that benefits from dozens of engineers fixing edge cases rather than five engineers at one company guessing at them. Keeping transforms closed wastes human effort across the industry.
The runtime is self-hosted. The software that reads your customer's financial data should run on their infrastructure. Not on the vendor's. Not in the vendor's "secure cloud." On the customer's servers, under their compliance posture, under their data residency controls, under their audit logs.
The commercial layer sits above infrastructure. There's still room for paid services — hosted sync, conflict resolution at scale, compliance dashboards, enterprise support. But these should ride on top of the open foundation, not replace it.
This is the architecture we've bet on. The schema (@claremesh/schema) is MIT licensed. The transforms (@claremesh/transforms) are MIT licensed. The sync layer is hosted on the customer's own Supabase project — we operate the control plane, but we never touch the data. The paid tier wraps all of this with customer-grade conveniences: a dashboard, scheduled jobs, audit exports, compliance artifacts.
Why this is better, concretely
For developers:
- You can read the code. Every transform. Every edge case. Every assumption. When Plaid changes their API, you can see exactly what we changed in response. When we get something wrong, you can file a PR.
- You can fork it. If we shut down, go in a direction you disagree with, or get acquired by someone who raises the price, you have the source. Your business doesn't depend on our business.
- You can extend it. Adding a provider we don't support yet is a 200-line PR, not a feature request that takes six months.
For customers:
- Your data doesn't leave your infrastructure. The transforms run in your Supabase edge functions. Our servers never see your customer's bank transactions.
- You own the compliance posture. You can show an auditor every line of code that processes customer data. You can host in whatever region regulators require. You can configure retention policies to whatever your DPA commitments say.
- Your costs don't scale with integration count. In the closed model, adding a new provider usually means a price tier bump. With open transforms, adding a provider is just configuration.
For the industry:
- The normalization layer stops being a secret that every team rediscovers. When one team finds an edge case in Stripe's handling of disputes, every team gets the fix. Compound progress.
- The shape of financial data becomes a shared language. If four different applications read the same schema, integrating them is trivial. If the schema is proprietary, integration is a quarterly planning meeting.
- Innovation moves up the stack. Engineers stop writing normalization code and start building actual products — AI copilots, autonomous agents, better treasury tools, better close workflows. The commodity layer becomes commodity, and the value creation moves where it should.
Why we still have a business
Some readers will ask: if the schema and transforms are free, how do you make money?
The answer is that unification logic is necessary but not sufficient. Production fintech infrastructure also needs:
- Bi-directional sync — pushing normalized data back to providers (updating QuickBooks from ClareMesh, writing to Xero invoices from an external system) is substantially harder than reading. It requires conflict resolution, change detection, idempotency, and dry-run capabilities.
- Continuous reconciliation — running scheduled jobs that flag discrepancies between sources before they become close-day fire drills.
- Compliance and audit infrastructure — 61 documented controls, framework mapping, evidence generation, sub-processor tracking, retention enforcement.
- Enterprise operations — customer-managed encryption keys, dedicated regions, SLAs, SOC 2 Type II reports, support response times.
These are the things customers pay for. They're the things that require ongoing engineering investment, compliance work, and operational commitments. The schema and transforms are the foundation that makes all of this easier to build on top of — but they're not the business.
Our pricing reflects this. The Open tier is free forever, no asterisks. The Build tier ($199/mo) adds hosted operations. The Scale tier ($799/mo) adds sync, conflict resolution, and expanded compliance. The Enterprise tier adds dedicated infrastructure and support.
If you only need the schema and transforms, you never have to pay us. That's the deal. If the hosted layer is worth more to you than operating it yourself, we're the best option. If it isn't, we've still done useful work by contributing the foundation.
What this means for fintech builders
If you're building a fintech product and you're about to write your own normalization layer, stop. Use ours. Contribute back when you find edge cases. Save yourself three months of work that every team before you has also done.
If you're running a fintech data infrastructure company and you're thinking about your moat — the moat isn't the schema anymore. The moat is the operational layer on top: the sync engine, the compliance artifacts, the support relationships, the SLA commitments. Trying to defend the schema and transforms will lose to the open-source version within 18 months, because every engineer on earth can collectively maintain it better than any single vendor.
If you're a customer evaluating fintech data vendors, ask whether their schema is open. Ask whether their transforms are open. Ask whether the runtime is self-hosted. If the answer is no to all three, ask why. The answers should be specific and defensible. "It's our IP" is not an answer. "It's how we make money" is an answer but a bad one.
How you can help
If you've read this far and think we're pointed in roughly the right direction:
- Star the repo. It's the single biggest signal to other teams that open-source fintech infrastructure is a real thing.
- Try the playground. Paste a real Plaid, Stripe, QuickBooks, or Xero response and see the normalized output. Two minutes. No signup.
- File issues. Edge cases. Bugs. Things we got wrong. The schema is better when more people audit it.
- Submit transforms for new providers. Sage Intacct, FreshBooks, Zoho Books, Wave — if you use them, a transform PR would help.
- Use it in your product. If you're building a fintech product on top of ClareMesh, we want to hear about it. Email malik@claremesh.com.
This is the start of something, not the end. The schema is at v2.4.1 today. It will be at v3 in a year, v4 the year after. Each version will be better because more people use it, contribute to it, and find edge cases we missed.
Come build it with us.
CLAREMESH
ClareMesh is an open-source financial data schema and bi-directional sync SDK. It publishes a unified schema for financial primitives and provides MIT-licensed transforms for Plaid, Stripe, QuickBooks, Xero, and NetSuite. Customer data never leaves customer infrastructure.
ClareMesh was originally built as the data infrastructure layer of a broader internal system. We extracted it, open-sourced the schema, and published the transforms because a shared data model is more valuable as a standard than as proprietary plumbing.
Questions or corrections? Email malik@claremesh.com.