your TM is rotting.
we trim it in minutes.
translation memories accumulate noise the moment you start using them. junk segments, near-duplicates, encoding artefacts from CAT-tool round-trips. left alone, your TM works against you instead of for you.
━━ four ways your TM works against you
Eight near-identical translations of the same source segment all compete for the 100% slot. Your CAT tool picks one — usually not the best one. Fuzzy match rates suffer.
Numbers, URLs, emails, repeated dashes. They count toward your 'word count' in CAT analysis but produce zero leverage on real new content.
Every TMX export → import cycle (memoQ ↔ Trados ↔ XTM) leaves behind orphan tags, encoding glitches, broken inline markup. Years of TM = years of accumulated debt.
If you ever fine-tune a translation model on your TM (and you will), every junk segment is a noisy training example. Garbage in = degraded model.
━━ what you get back
benchmark: 1,000,000 segments · 30% duplicates · 108 s wall-clock · 35% reduction · 373 MB peak RAM
━━ three steps. minutes. not hours.
- 01scan
We stream your TMX through a two-pass engine — never load it into RAM. 1M segments processed in ~108 seconds on a small VPS.
- 02preview
Before we cut anything, you see exactly what's going: top duplicates with their winner/loser variations, junk samples, projected size reduction. Free — doesn't consume quota.
- 03clean
Confirm and we run the full pass. Output: cleaned.tmx + removed.tmx (re-importable for verification) + audit.csv (per-segment reasons). Original upload purged the moment it's done.
━━ vs cleaning manually in your CAT tool
| manual | tm cleaner |
|---|---|
| open the TM in your CAT tool's editor | drag-drop a .tmx |
| click through filters · sort · select duplicates · delete batch · repeat | we surface them automatically |
| no audit trail · hope you didn't cut something important | every removed segment logged + downloadable as a TMX |
| hours per gigabyte | minutes per gigabyte |
| no MT training pipeline | clean output is reusable as training data |
━━ no vendor lock-in. no surprises.
- ✓TMX in, TMX out. works with memoQ, Trados, XTM, Smartcat, Phrase, OmegaT — anything that speaks TMX 1.4
- ✓your data stays yours. uploads strict-purged the moment processing ends · downloads expire 24 h later · stored exclusively in EU datacenters
- ✓full audit trail. every removed segment is logged with the reason — you can re-import the removed.tmx and verify nothing important was cut
- ✓try it before you pay. 3 free files (≤1 GB each) · no credit card required · then $1 per GB pay-as-you-go
no signup hurdle · email magic link · clean a TM in under 2 minutes