five steps. no surprises.

this page is the user-friendly walkthrough. if you want the engine internals, benchmarks, and full security posture — that lives at /under-the-hood.

━━ the 5-step flow

01
sign in
email magic link. no passwords stored, ever. lost access? request a new link.
02
upload
drag-drop a .tmx (up to 2 GB). uploaded directly to encrypted object storage via a one-time signed URL — bytes never touch our API node.
03
preview · free
we scan and show you exactly what will be removed: duplicates, junk, projected output size. doesn't consume your quota until you proceed.
04
pick a preset · then clean
three named presets cover ~95% of jobs — Lenient / Balanced / Strict — one click, eleven options applied. fine-tune via the Custom rules panel if you need to. the engine streams through your file in two passes, never loading it into RAM.
05
download
one bundle.zip (cleaned.tmx + removed.tmx + removed.csv, max compression). individual files also available. original upload is purged the moment processing finishes; downloads stay live for 24 hours.

━━ pick a preset · or fine-tune

three named cards above the rules panel — one click applies eleven options atomically. the right shape for ~95% of jobs. power users open the “custom rules” disclosure for the granular toggle panel underneath.

Lenient

keep more · QA & review

preserves untranslated entries, MT-flagged TUs, and short UI labels verbatim. Good for human review or pre-delivery inspection where you want the final say.

Balanced

default · LSP delivery

drops mechanical leakage (untranslated, empty target, MT-flagged) but doesn't run the strict QA validators. Sane defaults for delivering a TM to a client.

Strict

cut deep · MT training

everything Balanced does plus all QA validators on (placeholder mismatch, mojibake, length anomalies) and case-insensitive dedup for tighter merging. Right shape for MT-training datasets.

━━ what gets removed · or flagged

the engine ships eight cleanup classifiers, grouped into three categories. each is individually toggleable in the custom-rules panel; the presets above bundle sane defaults so you don’t have to pick.

duplicates

→exact matches by normalized source
→whitespace-only differences
→winner picked by Popular / Latest — or by your priority-author list
→manual per-group override always available in the preview

structural junk

→empty source / empty target
→single character · pure numbers
→URLs · emails · phone numbers
→pure tags · repeated symbols (--- / === / ***)
→suspect markup (closing-tag corruption from CAT round-trips)

translation-quality red flags

→untranslated (target = source · case + whitespace ignored)
→MT-flagged TUs (creationid / changeid contains MT engine signature)
→length outliers — target much longer OR shorter than source · NFC-normalised so CJK / Indic / Thai targets are treated fairly
→placeholder mismatch — {0} / %s / <ph/> count differs between source and target
→mojibake (encoding corruption — 'café' → 'café' style)
→brand / DNT terms missing in target — you list iPhone, HIPAA, etc. and we flag any TU where the term appears in source but not target

━━ priority authors

when duplicates conflict, you can rank the trusted translators whose variants should win. an ordered list (5–20 names typically) overrides the Popular / Latest strategy for any group where one of those authors has a variant; everywhere else falls back to your default. catches the senior-translator-vs-freelancer case that no major CAT tool solves at TM-cleanup time.

★ordered list — drag to rank your trusted translators
★overrides Popular / Latest when a priority author has a variant in the group
★fallback to Popular / Latest for groups without a priority author
★case-insensitive matching on changeid · case preserved for display
★per-row warnings: ⚠ also-excluded clashes · 0-matching-TU typo detection
★★ priority badges in the samples view show exactly when priority kicked in

━━ what we keep, untouched

✓original document order
✓all inline tags (ph, bpt, ept, mrk, g, it)
✓TMX header + namespace declarations
✓multi-language TMs (any source / target pairing)
✓short UI phrases (1–2 words like “OK”, “Cancel”) — configurable

━━ privacy in plain english

●your original file is permanently deleted the moment cleaning finishes
●downloads expire 24 hours after the clean completes
●no passwords stored — sign in via email magic link
●data stays in EU datacenters (Cloudflare R2 · Hetzner)

full security posture (XXE handling, signed URLs, headers, retention policy) → /under-the-hood

clean a tmx now →