Why Your COUNTIF Misses Duplicates with Trailing Whitespace
COUNTIF sees 'Ember Lanes' and 'Ember Lanes ' as different values. One TRIM + CLEAN wrapper catches what bare COUNTIF silently misses.
On this page
You run a COUNTIF dedupe check on a title column, the formula says zero duplicates, you ship the file, and three weeks later accounting emails: “why are there two rows for ‘Ember Lanes’?” You go back to the sheet. Sure enough, there are two rows. One says Ember Lanes, the other says Ember Lanes . The trailing space is invisible to you but visible to COUNTIF, which is why it counted them as separate values.
Why COUNTIF Misses It
COUNTIF does exact-match comparison. "Ember Lanes" has 11 characters; "Ember Lanes " has 12. They are literally different strings. The human sees them as identical because the trailing space renders as nothing; the formula doesn’t.
The same problem hits:
- Leading whitespace (
" Ember Lanes"). - Internal double-spaces (
"Ember Lanes"). - Line breaks copied from PDF extracts or word-wrapping source systems (
"Ember Lanes\n"). - Non-breaking spaces (CHAR(160)) pasted from Word or web pages.
The Fix
Wrap every side of your comparison in TRIM(CLEAN(…)) before counting:
=COUNTIF(ARRAYFORMULA(TRIM(CLEAN(A:A))), TRIM(CLEAN(A2)))
TRIM removes leading/trailing spaces and collapses internal multi-spaces to single spaces. CLEAN strips non-printable characters including line breaks. Together they normalize the visual-but-different variants to the same canonical form.
If you only want to flag dupes (not count them), combine with a conditional:
=IF(COUNTIF(ARRAYFORMULA(TRIM(CLEAN(A:A))), TRIM(CLEAN(A2)))>1, "DUPE", "")
Red-fill conditional-format the DUPE values and your whitespace-shadowed duplicates appear.
Even Better — Pre-Normalize the Source
Detection is fine for finding the problem. Prevention is better. Apply TRIM+CLEAN once to the source column so the raw data stays clean:
- Add a helper column:
=TRIM(CLEAN(A2)). - Drag down.
- Copy the helper column.
- Paste values only over column A.
- Delete the helper.
Now plain COUNTIF on column A works correctly going forward — and any future comparison, sort, or lookup isn’t sabotaged by phantom whitespace.
Gotchas
- Non-breaking spaces survive CLEAN. CHAR(160) is not considered a control character; CLEAN leaves it alone. If your source was pasted from Word or a web page, you may still have NBSP pairs after TRIM+CLEAN. Use
SUBSTITUTE(A2, CHAR(160), " ")inside the TRIM to catch them. - Case differences are separate problem.
"Ember Lanes"and"ember lanes"are different to COUNTIF even after TRIM+CLEAN. If case-insensitive matching matters, addLOWER():=COUNTIF(ARRAYFORMULA(LOWER(TRIM(CLEAN(A:A)))), LOWER(TRIM(CLEAN(A2)))). - Titles with legitimate double-spaces. Some movie titles contain stylistic double spaces (rare but real). TRIM collapses them. If that’s a concern for your data, use
SUBSTITUTE(TRIM(CLEAN(A2)), " ", " ")more selectively — or don’t TRIM those titles.
If You Need the Full Workflow
For a deeper dedupe that catches typos and casing drift too, see Deduplicate Titles by EIDR/ISAN + Fuzzy Title Match. The same whitespace problem also hits contact lists — covered in Clean and Deduplicate Your Rights-Holder Contacts.
Download
The Title Deduplication Worksheet pre-wires TRIM+CLEAN into its normalized-key column. Paste your titles and the whitespace variants collapse automatically.
OpenRights Weekly
Free templates, tutorials, and data quality tips. Every week.
Need help with your catalog data?
Share your challenge and we'll help — by email or callback. 100% free, no sales pitch. Part of our mission to help rights and catalog professionals.
Get Free Help