your TM is rotting.
we trim it in minutes.
translation memories accumulate noise the moment you start using them. junk segments, near-duplicates, encoding artefacts from CAT-tool round-trips. left alone, your TM works against you instead of for you.
━━ four ways your TM works against you
Eight near-identical translations of the same source segment all compete for the 100% slot. Your CAT tool picks one — usually not the best one. Fuzzy match rates suffer.
Numbers, URLs, emails, repeated dashes. They count toward your 'word count' in CAT analysis but produce zero leverage on real new content.
Every TMX export → import cycle (memoQ ↔ Trados ↔ XTM) leaves behind orphan tags, encoding glitches, broken inline markup. Years of TM = years of accumulated debt.
If you ever fine-tune a translation model on your TM (and you will), every junk segment is a noisy training example. Garbage in = degraded model.
When duplicates collide, a single senior translator's variant loses to three freelance ones via 'most popular'. Latest-edit picks the wrong winner if a junior touched the segment last. Neither strategy knows who you trust.
━━ what you get back
benchmark: 1,000,000 segments · 30% duplicates · 108 s wall-clock · 35% reduction · 373 MB peak RAM
━━ three steps. minutes. not hours.
- 01scan
We stream your TMX through a two-pass engine — never load it into RAM. 1M segments processed in ~108 seconds on a small VPS.
- 02preview
Before we cut anything, you see exactly what's going: top duplicates with their winner/loser variations, junk samples, projected size reduction. Free — doesn't consume quota.
- 03clean
Confirm and we run the full pass. Output: cleaned.tmx + removed.tmx (re-importable for verification) + audit.csv (per-segment reasons). Original upload purged the moment it's done.
━━ vs cleaning manually in your CAT tool
| manual | tm cleaner |
|---|---|
| open the TM in your CAT tool's editor | drag-drop a .tmx |
| click through filters · sort · select duplicates · delete batch · repeat | we surface them automatically |
| no audit trail · hope you didn't cut something important | every removed segment logged + downloadable as a TMX |
| hours per gigabyte | minutes per gigabyte |
| no MT training pipeline | clean output is reusable as training data |
━━ what we ship
- ★three named presets. Lenient / Balanced / Strict — one click applies eleven options atomically. covers ~95% of jobs without touching a single toggle.
- ★eight cleanup classifiers. duplicates, junk, untranslated, MT-flagged, length outliers (both directions), placeholder mismatch, mojibake, brand-term-translated. each individually toggleable.
- ★priority-author ranking. ordered list of trusted translators whose variants win when duplicates conflict — overrides Popular / Latest for any group where a priority author has a variant.
- ★script-aware engine. NFC code-point counts so Indic / Thai / CJK targets get fair length comparison. word-boundary matching on short MT-engine signatures so SMTP-imported TMs don't lose every TU.
━━ no vendor lock-in. no surprises.
- ✓TMX in, TMX out. works with memoQ, Trados, XTM, Smartcat, Phrase, OmegaT — anything that speaks TMX 1.4
- ✓your data stays yours. uploads strict-purged the moment processing ends · downloads expire 24 h later · stored exclusively in EU datacenters
- ✓full audit trail. every removed segment is logged with the reason — you can re-import the removed.tmx and verify nothing important was cut
- ✓audit-hardened engine. repeated independent code reviews, fixes between rounds, re-audited until convergence — most recent campaign closed 9 HIGH + 23 MED issues across five rounds. see /under-the-hood.
- ✓try it before you pay. 250 MB free every month · no credit card required · then $1 per GB pay-as-you-go
no signup hurdle · email magic link · clean a TM in under 2 minutes