OpenRights Entertainment Rights & Catalog Intelligence
Data Cleaning intermediate

Detect Non-ASCII Characters Breaking Your Platform Avails Ingest

Smart quotes, non-breaking spaces, and em-dashes silently break avails ingest. One regex formula surfaces every offending cell so you can fix before shipping.

By OpenRights Team · · 5 min read
availsdata cleaningvalidationencoding
On this page

You ship an avails file to a platform. The ingest rejects 40 rows. Spot-checking, you can’t see what’s wrong — the rows look identical to the ones that passed. Then you notice one Title has Jean's Story where the apostrophe is the typographic ' (U+2019) instead of the plain ' (U+0027). The platform’s parser chokes on it. Your document looked clean because the visual difference is almost invisible.

The Common Offenders

Four characters cause 90% of these ingest failures:

CharacterUnicodeOften typed asWhere it comes from
' curly apostropheU+2019straight 'Word/Google Docs autocorrect
" curly quotesU+201C, U+201Dstraight "Word/Google Docs autocorrect
em-dashU+2014hyphen -autocorrect or Mac Option+Shift+hyphen
non-breaking spaceU+00A0regular spaceHTML paste, Word paste

Zero-width characters (U+200B, U+FEFF) sneak in from rich-text paste and are completely invisible. These are the worst because you can’t even see them when you’re looking right at the cell.

The Fix — Detection Formula

Add a helper column next to your Title and Notes columns with this formula (Google Sheets):

=IF(REGEXMATCH(A2, "[^\x00-\x7F]"), "NON-ASCII", "")

The regex [^\x00-\x7F] matches any character outside the ASCII range (0–127). Any cell containing a non-ASCII character flags NON-ASCII. Conditional-format that column — amber for NON-ASCII — and the offending rows surface at a glance.

For Excel (which lacks REGEXMATCH pre-365), use:

=IF(SUMPRODUCT((CODE(MID(A2, ROW(INDIRECT("1:"&LEN(A2))), 1))>127)*1)>0, "NON-ASCII", "")

More verbose, same result.

The Fix — Bulk Replace

Once you’ve located the offending cells, fix them with find-and-replace. In Sheets:

  1. Ctrl+H for find-and-replace.
  2. Check Also search using regular expressions.
  3. Find [''] → Replace with '. Repeat for [""]", -, and so on.
  4. For NBSP, find \u00A0 → replace with a single space.

Or paste this one-shot CLEAN-like formula that strips the offenders:

=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,CHAR(8216),"'"),CHAR(8217),"'"),CHAR(8220),""""),CHAR(8221),"""")

Apply to a helper column, then paste-values back over the original.

Gotchas

  • International titles with legitimately non-ASCII characters. Amélie, Pokémon, 東京 — these are valid. The detection flags them as NON-ASCII, but they’re not the problem. Before bulk-replacing, check the platform’s spec: most modern platforms accept UTF-8 and these will ingest fine. The ingest failures are almost always autocorrect artifacts, not intentional diacritics.
  • Differential handling by platform. Amazon accepts UTF-8 throughout. Some older CSV-based ingests accept ASCII-only. Know which bucket each platform is in before stripping characters.
  • Don’t strip zero-width characters blindly. REGEXREPLACE(A2, "[\x{200B}-\x{200D}\x{FEFF}]", "") removes the truly-invisible troublemakers. But run it as a pass on a copy first and diff — if it removes too much, you’ve got a bigger encoding issue than autocorrect.
  • CSV vs. XLSX. XLSX preserves characters reliably; CSVs depend on the exporter’s encoding. If you’re exporting to CSV for ingest, explicitly save as UTF-8 CSV, not “CSV (default)” which is often Windows-1252.

If You Need the Full Workflow

The Validate an Avails Sheet Against IFTA + EMA Rules tutorial includes the character check as one of its seven validation passes, plus territory/date/required-field validation alongside.

Download

The Avails Data Quality Scorecard template flags non-ASCII characters in a dedicated rule row. Paste your avails, and the check runs automatically.

OpenRights Weekly

Free templates, tutorials, and data quality tips. Every week.

Need help with your catalog data?

Share your challenge and we'll help — by email or callback. 100% free, no sales pitch. Part of our mission to help rights and catalog professionals.

Get Free Help