Question 1

Does case matter?

Accepted Answer

By default, yes — "Apple" and "apple" are different lines. The case-insensitive toggle treats them as duplicates. For email lists, case-insensitive is almost always correct (RFC 5321 makes the local part technically case-sensitive but virtually all providers normalize).

Question 2

Whitespace?

Accepted Answer

Trailing whitespace is the silent killer. "hello" and "hello " look identical and behave as duplicates conceptually but the simple comparison treats them as different. Enable whitespace trimming unless you genuinely need leading/trailing spaces preserved.

Question 3

Will the order be preserved?

Accepted Answer

First-occurrence order is the default — line 1's text appears at position 1 in output, even if it's repeated later. Toggle to alphabetical sort if you don't care about original order. For very large inputs, sorting is slightly faster.

Question 4

What about partial duplicates?

Accepted Answer

This tool does whole-line matching only. "john@example.com" and "john@example.com,subscriber" are different lines. For fuzzy or column-based deduplication, use a spreadsheet or a script — line-level dedup is the wrong tool.

Question 5

Can it count occurrences instead of removing?

Accepted Answer

Some dedup tools include a count mode (`uniq -c` in Unix). This one removes only — for counting, pipe through a tool like `sort | uniq -c` locally, or use a spreadsheet pivot. The count behavior is a different feature, not just dedup.

Question 6

What about lines with different line endings?

Accepted Answer

Mixed CRLF and LF line endings can cause apparent duplicates that aren't bytes-equivalent. The tool normalizes line endings before comparison so "hello
" and "hello
" are treated as the same line. Verify the output has consistent endings if you'll re-import.

Question 7

How are empty lines handled?

Accepted Answer

By default, empty lines are kept and only the first one is preserved (subsequent empties are duplicates). The "ignore empty lines" toggle drops them entirely, regardless of how many appeared. Useful for cleaning files with stray blank lines.

Question 8

Is this Unicode-aware?

Accepted Answer

Yes for basic comparison — "café" and "café" (with combining accent vs. precomposed) are technically different bytes but normalize to the same string with NFC normalization. The tool applies NFC before comparison, matching modern text-handling expectations.

Question 9

What's the practical limit on input size?

Accepted Answer

Browser memory bounds it to roughly 100 MB raw input. Beyond that, copying the input into the tool becomes the bottleneck. For multi-gigabyte log deduplication, run `sort -u` locally or use a streaming tool — interactive browser apps are the wrong tool at that scale.

Remove Duplicate Lines

Related Tools

About This Tool

Frequently Asked Questions