Quantcast
Channel: Active questions tagged charset - Stack Overflow
Viewing all articles
Browse latest Browse all 59

How to fix the encoding of a string in JavaScript

$
0
0

I have received a broken string from another piece of software. I would have liked to fix its encoding in JavaScript but I feel I am missing something.

Here's an exemple of broken string: Détectéàlors ôù
And the expected output would be: Détectéàlors ôùi

I don't know the encoding used to send me the string.

My idea is to use the TextDecoder API; convert the string to bytes, and then reencode it in UTF8 or UTF16.

Here's the piece of code I used to detect the charset used:

const str = 'Détectéàlors ôùi';const str2 = 'Détectéàlors ôù';const charsets = ['utf-8',"ibm866","iso-8859-2","iso-8859-3","iso-8859-4","iso-8859-5","iso-8859-6","iso-8859-7","iso-8859-8","iso-8859-8-i","iso-8859-10","iso-8859-13","iso-8859-14","iso-8859-15","iso-8859-16","koi8-r","koi8-u","macintosh","windows-874","windows-1250","windows-1251","windows-1252","windows-1253","windows-1254","windows-1255","windows-1256","windows-1257","windows-1258","x-mac-cyrillic","gbk","gb18030","hz-gb-2312","big5","euc-jp","iso-2022-jp","shift-jis","euc-kr","iso-2022-kr","utf-16be","utf-16le","iso-2022-cn"];const encoder = new TextEncoder();const view = encoder.encode(str2);console.log('__________________')charsets.forEach((charset) => {  try {    const decoder = new TextDecoder(charset);    const fixedStr = decoder.decode(view, {      fatal: false,      ignoreBOM: true,    });    console.log(charset, fixedStr);  } catch (e) {    console.log(charset, 'invalid');  }})

(the code can be tested here: https://jsfiddle.net/tashebwj/ )

The output is the following:

__________________?editor_console=true:57 utf-8 Détectéàlors ôù?editor_console=true:57 ibm866 D├Г┬йtect├Г┬й├Г┬аlors ├Г┬┤├Г┬╣?editor_console=true:57 iso-8859-2 DĂŠtectĂŠĂ lors Ă´Ăš?editor_console=true:57 iso-8859-3 D�Âİtect�Âİ� lors �´�Âı?editor_console=true:57 iso-8859-4 DÊtectÊàlors ôÚ?editor_console=true:57 iso-8859-5 DУТЉtectУТЉУТ lors УТДУТЙ?editor_console=true:57 iso-8859-6 Dأآ�tectأآ�أآ lors أآ�أآ�?editor_console=true:57 iso-8859-7 DΓΒ©tectΓΒ©ΓΒ lors ΓΒ΄ΓΒΉ?editor_console=true:57 iso-8859-8 D��©tect��©�� lors ��´��¹?editor_console=true:57 iso-8859-8-i D��©tect��©�� lors ��´��¹?editor_console=true:57 iso-8859-10 DÃÂĐtectÃÂĐàlors ÃÂīÃÂđ?editor_console=true:57 iso-8859-13 DĆĀ©tectĆĀ©ĆĀ lors ĆĀ“ĆĀ¹?editor_console=true:57 iso-8859-14 Détectéàlors ÃÂṀÃÂṗ?editor_console=true:57 iso-8859-15 Détectéàlors ÃŽù?editor_console=true:57 iso-8859-16 DĂ©tectĂ©Ă lors ĂÂŽĂÂč?editor_console=true:57 koi8-r Dц┐б╘tectц┐б╘ц┐б═lors ц┐б╢ц┐б╧?editor_console=true:57 koi8-u Dц┐б╘tectц┐б╘ц┐б═lors ц┐бЄц┐б╧?editor_console=true:57 macintosh D√ɬ©tect√ɬ©√ɬ†lors √ɬ¥√ɬπ?editor_console=true:57 windows-874 Dรยฉtectรยฉรย lors รยดรยน?editor_console=true:57 windows-1250 DĂ©tectĂ©Ă lors Ă´ĂÂą?editor_console=true:57 windows-1251 DГѓВ©tectéàlors ГѓВґГѓВ№?editor_console=true:57 windows-1252 Détectéàlors ôù?editor_console=true:57 windows-1253 Détectéàlors ôù?editor_console=true:57 windows-1254 Détectéàlors ôù?editor_console=true:57 windows-1255 Dֳƒֲ©tectֳƒֲ©ֳƒֲ lors ֳƒֲ´ֳƒֲ¹?editor_console=true:57 windows-1256 Dأƒآ©tectأƒآ©أƒآ lors أƒآ´أƒآ¹?editor_console=true:57 windows-1257 DĆĀ©tectĆĀ©ĆĀ lors ĆĀ´ĆĀ¹?editor_console=true:57 windows-1258 DĂƒÂ©tectĂƒÂ©ĂƒÂ lors ĂƒÂ´ĂƒÂ¹?editor_console=true:57 x-mac-cyrillic D√Г¬©tect√Г¬©√Г¬†lors √Г¬і√Г¬є?editor_console=true:57 gbk D脙漏tect脙漏脙聽lors 脙麓脙鹿?editor_console=true:57 gb18030 D脙漏tect脙漏脙聽lors 脙麓脙鹿?editor_console=true:57 hz-gb-2312 invalid?editor_console=true:57 big5 D�穢tect�穢��饊ors �織�繒?editor_console=true:57 euc-jp D�息tect�息��lors �卒�孫?editor_console=true:57 iso-2022-jp D����tect��������lors ��������?editor_console=true:57 shift-jis Dテδゥtectテδゥテδ�lors テδエテδケ?editor_console=true:57 euc-kr D횄짤tect횄짤횄혻lors 횄쨈횄쨔?editor_console=true:57 iso-2022-kr invalid?editor_console=true:57 utf-16be 䓃菂ꥴ散瓃菂ꤠ쎃슠汯牳⃃菂듃菂�?editor_console=true:57 utf-16le 썄슃璩捥썴슃₩菃ꃂ潬獲쌠슃쎴슃�?editor_console=true:57 iso-2022-cn invalid

Why this method does not work? Is it possible to fix the string with this method or another way?


Viewing all articles
Browse latest Browse all 59

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>