Notice
Recent Posts
Recent Comments
Link
«   2024/10   »
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31
Tags
more
Archives
Today
Total
관리 메뉴

HGU_CSEE

Basics of Data 본문

Computer Science/Computer Architecture & Organization

Basics of Data

LEEGH 2023. 9. 9. 07:25

 

1. How to represent numbers in computers

 As you know, Computer language consists of only 0 and 1. At this point how does a computer that can only recognize 0 and 1 understand vast numbers? What is certain is that just because a computer understands 0 and 1 does not mean it can only express 0 and 1. Computers use a lot of 0 and 1 to express a lot of numbers.

the smallest data unit, that is, the smallest unit that the computer can understand is called "bit". 0<zero> or 1<one>

Therefore, a bit can express two of data. <0 and 1>

As the number of bits increases, the number that can be expressed increases by the square of 2. As you can see, the unit 'bit' is too small to use in our daily life. <No one says "My pdf file size is 8,123,285 bits.">

For that reason, there are many units of data size such as Kilobyte<equal to 1000 bytes>, Megabyte<equal to 1000 Kilobytes>, Gigabyte, Terabytes, and so on

 

WORD: It refers to the size of data that the CPU can process at once. This is a relatively special unit because it can change depending on the performance of the CPU(32bit or 64bit) rather than a fixed value.

 

By using some policy <Binary number or Decimal number, Hexadecimal number>, computers can understand and represent huge numbers.

 

"How to represent the negative binary numbers?"

→Using two's complement: It means a number minus 2^n which is greater than that number.

It can be denoted easily by Reversing all numbers in binary and adding 1 to it

ex). 011 <3 in decimal> -> 100//reverse all bits and plus 1 -> 101 <-3 in decimal>

       010<2 in decimal> -> 101//reverse and plus 1 -> 110 <-2 in decimal>

Figure 1. an example of two's complement(Ref:Wikipedia)

As you can see it is difficult to determine the binary is positive or negative.

For example, if I want to represent -11 in the binary number, through two's complement, I can express it by 

$$0101_{(2)}$$ It is the same with the positive number '5' in decimal. It confuses us obviously. Therefore, in the computer, the "flag" is used for determining whether the number is positive or negative. More details about flags will be covered in the chapter "Register". For now, it is only necessary to think that the numbers inside the computer have a flag to distinguish whether they are negative or positive. 

 

※Two's complement's limit 

Two's complement is a useful method to distinguish numbers, but it is not a perfect way.

Let's think about 1000 <Binary>. If you check its negative version, the result is the same as its positive shape.

1000 -> 0111 +1 <Two's complement>-> 1000 

In other words, Two's complement method can't represent both positive and negative numbers of 2^n at the same time, just with n bit. If we need to negative number of 2^4, we need 5 bits.

 

 

<The other methods of expressing numbers were so easy that they were omitted.>

 

 

2. How to represent characters in computer

In order to represent characters with only 0 and 1, you need to understand 'Character set', 'Encoding', and 'Decoding'

  • Character set 
    It refers to a set of characters that a computer can understand and output. For example, a computer has a character set <a,b,c,d,e>. That means, the computer cannot understand 'f', and others. Then, how does the computer can understand the characters? -> Encoding 
  • Encoding
    Even if a character belongs to a character set, the computer cannot understand the character itself. It is similar to a number. Just as computers convert decimal numbers into binary numbers, they have to go through the process of converting characters that humans understand into 0 and 1, called encoding.
  • Decoding 
    On the other hand, humans cannot understand characters converted to 0 and 1. Therefore, The process of converting data consisting of 0 and 1 into characters that humans can understand is called decoding. This is the opposite process of encoding. 

※ An early set of characters, the ASCII code https://en.wikipedia.org/wiki/ASCII

 

ASCII - Wikipedia

From Wikipedia, the free encyclopedia American character encoding standard ASCIIMIME / IANAus-asciiAlias(es)ISO-IR-006,[1] ANSI_X3.4-1968, ANSI_X3.4-1986, ISO_646.irv:1991, ISO646-US, us, IBM367, cp367[2]Language(s)English (made for; does not support all l

en.wikipedia.org

 

Characters in the ASCII can be represented by 7 bits each, which means the total number of ASCII characters is 128.

However, That is not enough to express Korean. <The extended ASCII code can express up to 256 characters with 8 bits, but it is also not enough.>

 

2-1. Korean style encoding: EUC-KR

In Korean, one syllable can be made up of a total of three parts of combinations. ('강'=ㄱ + ㅏ+ ㅇ) 

Therefore, A unique encoding technique in Korean was needed. That is EUC-KR.

EUC-KR is based on a character set called 'KS X 1001', and 'KS X 1003. and it needs 2 bytes to represent one Korean character<Four hexadecimal numbers>. In that way, about 2,350 characters of Korean characters can be represented. but it is not enough to express all of the Korean characters. 

 

2-2. Unicode and UTF-8

Unicode is a huge character set. In addition, it contains most special characters and can express languages around the world. Unicode has many encoding methods such as UTF-8, UTF-16, UTF-32, etc. UTF-8 is the most popular.

 

https://ko.wikipedia.org/wiki/UTF-8

 

UTF-8 - 위키백과, 우리 모두의 백과사전

위키백과, 우리 모두의 백과사전. UTF-8은 유니코드를 위한 가변 길이 문자 인코딩 방식 중 하나로, 켄 톰프슨과 롭 파이크가 만들었다. UTF-8은 Universal Coded Character Set + Transformation Format – 8-bit의

ko.wikipedia.org