i am using CharsetDetector to detect the charset of a text file.
This is the code to detect the charset of the given file:
private String getCharset(File file) { String charset = ""; try { InputStream is = new FileInputStream(file); BufferedInputStream bis = new BufferedInputStream(is); CharsetDetector cd = new CharsetDetector(); cd.setText(bis); CharsetMatch cm = cd.detect(); if (cm != null) { Reader reader = cm.getReader(); charset = cm.getName(); } bis.close(); is.close(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } return charset; }
For a ASCII text file it returns UTF-8. ASCII is a subset of UTF-8 but i like to detect ASCII if it is ascii only and UTF-8 if there is a sign which is not in ASCII.
But how can i check it?