java - translating bytes from korean to utf-8, what am I not getting here? -

June 15, 2015

there incomplete understanding here. if run code below, expect see:

 translatetest:: start   start_korean: (6)  c0 af c8 f1 c8 c6  expected_utf8: (6)  c7 20 d7 6c d6 c8     found_utf8: (6)  c7 20 d7 6c d6 c8  expected utf8 matches found? true

what is:

 translatetest:: start   start_korean: (6)  c0 af c8 f1 c8 c6  expected_utf8: (6)  c7 20 d7 6c d6 c8     found_utf8: (9)  ec 9c a0 ed 9d ac ed 9b 88  expected utf8 matches found? false

i think creating string, declaring bytes x-windows-949, , getting bytes utf-8 translate them 1 other. apparently, not correct this.

public class translatetest {    public static void main (string [] argv) {       (new translatetest()).translate();   }    void translate() {     system.out.println("translatetest:: start");      try {        // pages below linked http://msdn.microsoft.com/en-us/goglobal/cc305154        // please ignore lame bytestohex helper method. including completeness.        // http://msdn.microsoft.com/en-us/goglobal/gg696909       //       // 0xc0af =  u+c720 = hangul syllable ieung yu        // http://msdn.microsoft.com/en-us/goglobal/gg696960       //       // 0xc8f1 =  u+d76c = hangul syllable hieuh yi        // http://msdn.microsoft.com/en-us/goglobal/gg696960       //       // 0xc8c6 =  u+d6c8 = hangul syllable hieuh u nieun        byte[] start_korean = new byte[] { (byte)0xc0, (byte)0xaf, (byte)0xc8, (byte)0xf1, (byte)0xc8, (byte)0xc6 };       byte[] expected_utf8 = new byte[] { (byte)0xc7, (byte)0x20, (byte)0xd7, (byte)0x6c, (byte)0xd6, (byte)0xc8 };       string str = new string(start_korean, "x-windows-949");       byte[] found_utf8 = str.getbytes("utf8");        boolean isequal = java.util.arrays.equals(expected_utf8, found_utf8);        system.out.println(" start_korean: "+bytestohex(start_korean));       system.out.println("expected_utf8: "+bytestohex(expected_utf8));       system.out.println("   found_utf8: "+bytestohex(found_utf8));        system.out.println("expected utf8 matches found? "+isequal);      } catch (java.io.unsupportedencodingexception uee) {       system.err.println(uee.getmessage());     }   }    public static string bytestohex(byte[] b) {     stringbuffer str = new stringbuffer("("+b.length+") ");     (int idx = 0; idx < b.length; idx++) {       str.append(" "+bytetohex(b[idx]));     }     return str.tostring();   }    public static string bytetohex(byte b) {     string hex = integer.tohexstring(b);     while (hex.length() < 2) hex = "0"+hex;     if (hex.length() > 2)       hex = hex.substring(hex.length()-2);     return hex;   } }

your problem "expected utf8" values unicode code points , not utf-8 encoding of code points. added code:

    stringbuilder buf = new stringbuilder();     (int i=0; i<str.length(); i++) buf.append(", ").append(integer.tohexstring(str.codepointat(i)));     system.out.println("     internal: "+buf.substring(2));

producing values show.

when these code points utf-8 encoded, rendered values see.

use online unicode code converter check out. enter string c720 d76c d6c8 in "mixed input" box , click "convert numbers hex code points".

Search This Blog

Roma

java - translating bytes from korean to utf-8, what am I not getting here? -

Comments

Post a Comment

Popular posts from this blog

How to logout from a login page in asp.net -

How do i redirect a user to the previous page they came from after logging in? HTML/ASP -

java - More than one row with the given identifier was found: 1, for class: com.model.Diagnosis -