◇ why use it

your TM is rotting.
we trim it in minutes.

translation memories accumulate noise the moment you start using them. junk segments, near-duplicates, encoding artefacts from CAT-tool round-trips. left alone, your TM works against you instead of for you.

━━ four ways your TM works against you

duplicates dilute matches

Eight near-identical translations of the same source segment all compete for the 100% slot. Your CAT tool picks one — usually not the best one. Fuzzy match rates suffer.

junk wastes your leverage

Numbers, URLs, emails, repeated dashes. They count toward your 'word count' in CAT analysis but produce zero leverage on real new content.

imports compound the rot

Every TMX export → import cycle (memoQ ↔ Trados ↔ XTM) leaves behind orphan tags, encoding glitches, broken inline markup. Years of TM = years of accumulated debt.

dirty data poisons MT training

If you ever fine-tune a translation model on your TM (and you will), every junk segment is a noisy training example. Garbage in = degraded model.

━━ what you get back

35-55%
smaller TM
+
higher 100% match rate
↑↑
better fuzzy matches
<10 min
for a 1 GB clean

benchmark: 1,000,000 segments · 30% duplicates · 108 s wall-clock · 35% reduction · 373 MB peak RAM

━━ three steps. minutes. not hours.

  1. 01
    scan

    We stream your TMX through a two-pass engine — never load it into RAM. 1M segments processed in ~108 seconds on a small VPS.

  2. 02
    preview

    Before we cut anything, you see exactly what's going: top duplicates with their winner/loser variations, junk samples, projected size reduction. Free — doesn't consume quota.

  3. 03
    clean

    Confirm and we run the full pass. Output: cleaned.tmx + removed.tmx (re-importable for verification) + audit.csv (per-segment reasons). Original upload purged the moment it's done.

━━ vs cleaning manually in your CAT tool

manualtm cleaner
open the TM in your CAT tool's editordrag-drop a .tmx
click through filters · sort · select duplicates · delete batch · repeatwe surface them automatically
no audit trail · hope you didn't cut something importantevery removed segment logged + downloadable as a TMX
hours per gigabyteminutes per gigabyte
no MT training pipelineclean output is reusable as training data

━━ no vendor lock-in. no surprises.

clean a tmx now →

no signup hurdle · email magic link · clean a TM in under 2 minutes