Unicode System in Java
In this tutorial, we will understand the meaning and emergence of the Unicode system in java. The Unicode system is one of the important and useful features in the world of programming. So, we will try to understand it precisely. We will have a glance at its utility in the programming world. We will see a list of Unicode values along with their description.
Understanding the term
Unicode stands for universal international standard character encoding. It is capable of representing a large of written languages around the globe. It represents digits, alphabets both uppercase and lowercase, and, special characters.
As we know computers understand numbers, so our inputs get coded in certain languages that computers can easily understand.
The Unicode system assigns a unique number to each character which is independent of the platform, and language. It is the standard language.
Unicode is not the first system of encoding to be used. Before this, many other systems were being used such as –
- SO 8859-1 for Western European Language.
- KOI-8 for Russia
- ASCII for the United States of America
These earlier systems had certain limitations which resulted in the new system i.e Unicode system.
The two major limitations include the following:
- A specific code value links to different letters in many language standards.
- The encodings for languages with huge character sets have variable lengths. Some common characters are encoded as single by others who require two or more bytes.
- Limitation in size. Those are capable of encoding up to a certain capacity which is less for the technically growing era.
The Unicode system solves the above problems and it holds characters of size 2 bytes and java too, the size of the character is 2 bytes.
Unicode is capable of handling 10,000 characters whereas ASCII values can handle only 128 characters.
The Unicode value is capable of storing all the existing texts and also the forthcoming ones.
Unicode values are needed for the standardization of texts.
Here, The range of Unicode values starts from \u0000 and ends at \uFFFF. Thus, we can say that \u0000 is the lowest or the smallest Unicode value and \uFFFF is the highest Unicode value.
Example
Now, let us have a look at examples to understand the concept even better.
Unicode Char Description
U+0000 ^@ Null character
U+0001 ^A Start of heading (it is not the alphabetic A)
U+0002 ^B Start of text(it is not the B of alphabets)
U+0003 ^C End of text
U+0004 ^D End of transmission
U+0005 ^E Enquiry
U+0006 ^F Acknowledge
U+0007 ^G Bell
U+0008 ^H Backspace
U+0009 ^I Horizontal tab
U+000A ^J Line feed
U+000B ^K Vertical tab
U+000C ^L New page
U+000D ^M return of Carriage
U+000E ^N Shift out
U+000F ^O Shift in
U+0010 ^P Data link escape
U+0011 ^Q Device Control 1
U+0012 ^R Device Control2
U+0013 ^S Device Control3
U+0014 ^T Device Control4
U+0015 ^U Negative acknowledge
U+0016 ^V Synchronous idle
U+0017 ^W End of the transmission block
U+0018 ^X Cancel
U+0019 ^Y End of medium
U+001A ^Z Substitute
U+0030 0 Numeric Digit zero
U+0031 1 NumericDigit one
U+0032 2 NumericDigit two
U+0033 3 NumericDigit three
U+0034 4 NumericDigit four
U+0035 5 NumericDigit five
U+0036 6 NumericDigit six
U+0037 7 NumericDigit seven
U+0038 8 NumericDigit eight
U+0039 9 NumericDigit nine
U+003A : Colon
U+003B ; Semicolon
U+003C < Less-than sign
U+003D = Equal/Equality sign
U+003E > Greaterthan sign
U+003F ? Question mark
Summary
In this tutorial, we tried to understand the meaning of the Unicode system in java. We saw its evolution, need, and usage. We understood the facts about this system. Further, we had a glimpse of certain Unicode values starting from 0.