public class IndicNormalizer
extends java.lang.Object
Follows guidelines from Unicode 5.2, chapter 6, South Asian Scripts I and graphical decompositions from http://ldc.upenn.edu/myl/IndianScriptsUnicode.html
| Modifier and Type | Class and Description |
|---|---|
private static class |
IndicNormalizer.ScriptData |
| Modifier and Type | Field and Description |
|---|---|
private static int[][] |
decompositions
Decompositions according to Unicode 5.2,
and http://ldc.upenn.edu/myl/IndianScriptsUnicode.html
Most of these are not handled by unicode normalization anyway.
|
private static java.util.IdentityHashMap<java.lang.Character.UnicodeBlock,IndicNormalizer.ScriptData> |
scripts |
| Constructor and Description |
|---|
IndicNormalizer() |
| Modifier and Type | Method and Description |
|---|---|
private int |
compose(int ch0,
java.lang.Character.UnicodeBlock block0,
IndicNormalizer.ScriptData sd,
char[] text,
int pos,
int len)
Compose into standard form any compositions in the decompositions table.
|
private static int |
flag(java.lang.Character.UnicodeBlock ub) |
int |
normalize(char[] text,
int len)
Normalizes input text, and returns the new length.
|
private static final java.util.IdentityHashMap<java.lang.Character.UnicodeBlock,IndicNormalizer.ScriptData> scripts
private static final int[][] decompositions
private static int flag(java.lang.Character.UnicodeBlock ub)
public int normalize(char[] text,
int len)
text - input textlen - valid lengthprivate int compose(int ch0,
java.lang.Character.UnicodeBlock block0,
IndicNormalizer.ScriptData sd,
char[] text,
int pos,
int len)