five steps. no surprises.
this page is the user-friendly walkthrough. if you want the engine internals, benchmarks, and full security posture — that lives at /under-the-hood.
━━ the 5-step flow
- 01sign in
email magic link. no passwords stored, ever. lost access? request a new link.
- 02upload
drag-drop a .tmx (up to 2 GB). uploaded directly to encrypted object storage via a one-time signed URL — bytes never touch our API node.
- 03preview · free
we scan and show you exactly what will be removed: duplicates, junk, projected output size. doesn't consume your quota until you proceed.
- 04pick a preset · then clean
three named presets cover ~95% of jobs — Lenient / Balanced / Strict — one click, eleven options applied. fine-tune via the Custom rules panel if you need to. the engine streams through your file in two passes, never loading it into RAM.
- 05download
one bundle.zip (cleaned.tmx + removed.tmx + removed.csv, max compression). individual files also available. original upload is purged the moment processing finishes; downloads stay live for 24 hours.
━━ pick a preset · or fine-tune
three named cards above the rules panel — one click applies eleven options atomically. the right shape for ~95% of jobs. power users open the “custom rules” disclosure for the granular toggle panel underneath.
preserves untranslated entries, MT-flagged TUs, and short UI labels verbatim. Good for human review or pre-delivery inspection where you want the final say.
drops mechanical leakage (untranslated, empty target, MT-flagged) but doesn't run the strict QA validators. Sane defaults for delivering a TM to a client.
everything Balanced does plus all QA validators on (placeholder mismatch, mojibake, length anomalies) and case-insensitive dedup for tighter merging. Right shape for MT-training datasets.
━━ what gets removed · or flagged
the engine ships eight cleanup classifiers, grouped into three categories. each is individually toggleable in the custom-rules panel; the presets above bundle sane defaults so you don’t have to pick.
- →exact matches by normalized source
- →whitespace-only differences
- →winner picked by Popular / Latest — or by your priority-author list
- →manual per-group override always available in the preview
- →empty source / empty target
- →single character · pure numbers
- →URLs · emails · phone numbers
- →pure tags · repeated symbols (--- / === / ***)
- →suspect markup (closing-tag corruption from CAT round-trips)
- →untranslated (target = source · case + whitespace ignored)
- →MT-flagged TUs (creationid / changeid contains MT engine signature)
- →length outliers — target much longer OR shorter than source · NFC-normalised so CJK / Indic / Thai targets are treated fairly
- →placeholder mismatch — {0} / %s / <ph/> count differs between source and target
- →mojibake (encoding corruption — 'café' → 'café' style)
- →brand / DNT terms missing in target — you list iPhone, HIPAA, etc. and we flag any TU where the term appears in source but not target
━━ priority authors
when duplicates conflict, you can rank the trusted translators whose variants should win. an ordered list (5–20 names typically) overrides the Popular / Latest strategy for any group where one of those authors has a variant; everywhere else falls back to your default. catches the senior-translator-vs-freelancer case that no major CAT tool solves at TM-cleanup time.
- ★ordered list — drag to rank your trusted translators
- ★overrides Popular / Latest when a priority author has a variant in the group
- ★fallback to Popular / Latest for groups without a priority author
- ★case-insensitive matching on changeid · case preserved for display
- ★per-row warnings: ⚠ also-excluded clashes · 0-matching-TU typo detection
- ★★ priority badges in the samples view show exactly when priority kicked in
━━ what we keep, untouched
- ✓original document order
- ✓all inline tags (ph, bpt, ept, mrk, g, it)
- ✓TMX header + namespace declarations
- ✓multi-language TMs (any source / target pairing)
- ✓short UI phrases (1–2 words like “OK”, “Cancel”) — configurable
━━ privacy in plain english
- ●your original file is permanently deleted the moment cleaning finishes
- ●downloads expire 24 hours after the clean completes
- ●no passwords stored — sign in via email magic link
- ●data stays in EU datacenters (Cloudflare R2 · Hetzner)
full security posture (XXE handling, signed URLs, headers, retention policy) → /under-the-hood