· development · 4 min read
Removing diacritics in JavaScript — an universal solution for international applications
Let's discover the challenges of handling diacritical marks in JavaScript applications, especially for languages like Czech that heavily rely on characters like č, š, ž, ř, ň, and others. Learn why traditional character mapping approaches fail with thousands of Unicode characters and how JavaScript's built-in Unicode normalization provides an universal, maintainable solution that automatically handles all diacritical marks without manual maintenance.

As a developer working with international applications, I’ve encountered numerous challenges when dealing with diacritics — those small marks added to letters that change their pronunciation. This is particularly relevant for languages which heavily relies on diacritical marks. In this post, I’ll explore the problems with traditional approaches and present an universal solution using JavaScript’s Unicode normalization.
The problem with diacritics in web applications
When building applications that need to handle multiple languages, diacritics can cause significant issues, some examples:
- Search functionality fails when users search for “café” but the database contains “cafe”
- URL generation becomes problematic with characters like “č” or “š”
- Sorting and filtering doesn’t work correctly with accented characters
- User experience suffers when text appears broken or inconsistent
This can be especially challenging when dealing with languages like Czech which uses several diacritical marks: á, č, ď, é, ě, í, ň, ó, ř, š, ť, ú, ů, ý, ž. These aren’t just decorative — they’re essential for meaning. For example, “č” and “c” are completely different letters in Czech.
The traditional approach: Character mapping
Many developers start with a character mapping approach, which seems intuitive at first:
const diacriticsMap = {
'á': 'a',
'č': 'c',
'ď': 'd',
// ... and so on
}
const replaceDiacritics = (s) => {
return s
.split('')
.map((letter) => diacriticsMap[letter] ?? letter)
.join('')
}
Why this approach is not ideal
While this method works for basic cases, it has several critical flaws:
- Incomplete coverage: You can never map all possible diacritics. There are thousands of Unicode characters with diacritical marks across different languages.
- Maintenance nightmare: Every time you encounter a new character, you need to update the mapping object.
- Inconsistent results: Different developers might map the same character differently, leading to inconsistencies.
- Performance issues: Large mapping objects consume memory and slow down processing.
- Language-specific bias: The mapping often reflects the developer’s native language, missing characters from other languages.
The modern approach: Unicode normalization
Modern JavaScript provides a much more elegant and universal solution through Unicode normalization and String.prototype.normalize()
method.
const removeDiacritics = (s) => {
return s
.normalize('NFD')
.replace(/[\u0300-\u036f]/g, '')
}
How it works
This solution leverages Unicode’s decomposition system:
normalize('NFD')
: Decomposes characters into their base form plus combining diacritical marks- For example, “č” becomes “c” + combining caron (U+030C)
- “é” becomes “e” + combining acute accent (U+0301)
replace(/[\u0300-\u036f]/g, '')
: Removes all combining diacritical marks- The Unicode range U+0300 to U+036F contains all combining diacritical marks
- This effectively strips all accents while preserving the base characters
Examples
// Czech characters
console.log(removeDiacritics('č')) // 'c'
console.log(removeDiacritics('š')) // 's'
console.log(removeDiacritics('ž')) // 'z'
// French characters
console.log(removeDiacritics('é')) // 'e'
console.log(removeDiacritics('à')) // 'a'
console.log(removeDiacritics('ç')) // 'c'
// German characters
console.log(removeDiacritics('ä')) // 'a'
console.log(removeDiacritics('ö')) // 'o'
console.log(removeDiacritics('ü')) // 'u'
// Complex examples
console.log(removeDiacritics('Héllö Wörld')) // 'Hello World'
console.log(removeDiacritics('Příliš žluťoučký kůň')) // 'Prilis zlutoucky kun'
Real-world applications
1. Search functionality
const searchWithDiacritics = (searchTerm, content) => {
const normalizedSearch = removeDiacritics(searchTerm.toLowerCase())
const normalizedContent = removeDiacritics(content.toLowerCase())
return normalizedContent.includes(normalizedSearch)
}
// Usage
const text = "Příliš žluťoučký kůň úpěl ďábelské ódy"
console.log(searchWithDiacritics("zluťoučký", text)) // true
console.log(searchWithDiacritics("zlutoucky", text)) // true
2. URL slug generation
const generateSlug = (title) => {
return removeDiacritics(title)
.toLowerCase()
.replace(/[^a-z0-9\s-]/g, '')
.replace(/\s+/g, '-')
.replace(/-+/g, '-')
.trim()
}
// Usage
console.log(generateSlug('Příliš žluťoučký kůň')) // 'prilis-zlutoucky-kun'
3. Database indexing
const createSearchIndex = (text) => {
return {
original: text,
normalized: removeDiacritics(text.toLowerCase()),
keywords: removeDiacritics(text.toLowerCase()).split(/\s+/),
}
}
Performance Considerations
The Unicode normalization approach is surprisingly efficient:
- Memory: No large mapping objects needed
- Speed: Modern JavaScript engines optimize
String.prototype.normalize()
well - Accuracy: Handles all Unicode diacritical marks automatically
For high-performance applications, you can cache the normalized results:
const diacriticsCache = new Map()
const removeDiacriticsCached = (s) => {
if (diacriticsCache.has(s)) {
return diacriticsCache.get(s)
}
const result = s.normalize('NFD').replace(/[\u0300-\u036f]/g, '')
diacriticsCache.set(s, result)
return result
}
Browser Compatibility
The String.prototype.normalize()
method is well-supported:
- Chrome: 34+
- Firefox: 31+
- Safari: 10+
- Edge: 12+
- Node.js: 4+
For older environments, you can use a polyfill or fall back to the mapping approach.
UX/DX considerations
- Always normalize input: Apply diacritic removal consistently across your application
- Preserve original data: Keep the original text alongside normalized versions
- Consider user expectations: Some users might expect exact matches
- Test with real data: Use actual text from your target languages
- Document your approach: Make sure your team understands the normalization strategy
Conclusion
The Unicode normalization approach provides a universal, maintainable, and efficient solution for handling diacritics in JavaScript applications. Unlike character mapping, it automatically handles all Unicode diacritical marks without requiring manual maintenance or language-specific knowledge.
For anyone working with international applications, this method ensures that your applications work correctly with all Latin-based characters, providing a consistent user experience regardless of the user’s language or input method.
By embracing Unicode standards and JavaScript’s built-in normalization capabilities, you can build more robust, international-friendly applications that handle the complexity of human languages gracefully.
See more
- String.prototype.normalize() reference
- String.prototype.normalize() compatibility
- UnicodePlus.com is a free tool providing information about any Unicode character