Basics of Data

Notice

Recent Posts

Recent Comments

Link

« 2024/10 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

HGU_CSEE

Basics of Data 본문

Computer Science/Computer Architecture & Organization

Basics of Data

LEEGH 2023. 9. 9. 07:25

1. How to represent numbers in computers

As you know, Computer language consists of only 0 and 1. At this point how does a computer that can only recognize 0 and 1 understand vast numbers? What is certain is that just because a computer understands 0 and 1 does not mean it can only express 0 and 1. Computers use a lot of 0 and 1 to express a lot of numbers.

the smallest data unit, that is, the smallest unit that the computer can understand is called "bit". 0<zero> or 1<one>

Therefore, a bit can express two of data. <0 and 1>

As the number of bits increases, the number that can be expressed increases by the square of 2. As you can see, the unit 'bit' is too small to use in our daily life. <No one says "My pdf file size is 8,123,285 bits.">

For that reason, there are many units of data size such as Kilobyte<equal to 1000 bytes>, Megabyte<equal to 1000 Kilobytes>, Gigabyte, Terabytes, and so on

WORD: It refers to the size of data that the CPU can process at once. This is a relatively special unit because it can change depending on the performance of the CPU(32bit or 64bit) rather than a fixed value.

By using some policy <Binary number or Decimal number, Hexadecimal number>, computers can understand and represent huge numbers.

"How to represent the negative binary numbers?"

→Using two's complement: It means a number minus 2^n which is greater than that number.

It can be denoted easily by Reversing all numbers in binary and adding 1 to it

ex). 011 <3 in decimal> -> 100//reverse all bits and plus 1 -> 101 <-3 in decimal>

010<2 in decimal> -> 101//reverse and plus 1 -> 110 <-2 in decimal>

Figure 1. an example of two's complement(Ref:Wikipedia)

As you can see it is difficult to determine the binary is positive or negative.

For example, if I want to represent -11 in the binary number, through two's complement, I can express it by

$$0101_{(2)}$$ It is the same with the positive number '5' in decimal. It confuses us obviously. Therefore, in the computer, the "flag" is used for determining whether the number is positive or negative. More details about flags will be covered in the chapter "Register". For now, it is only necessary to think that the numbers inside the computer have a flag to distinguish whether they are negative or positive.

※Two's complement's limit

Two's complement is a useful method to distinguish numbers, but it is not a perfect way.

Let's think about 1000 <Binary>. If you check its negative version, the result is the same as its positive shape.

1000 -> 0111 +1 <Two's complement>-> 1000

In other words, Two's complement method can't represent both positive and negative numbers of 2^n at the same time, just with n bit. If we need to negative number of 2^4, we need 5 bits.

2. How to represent characters in computer

In order to represent characters with only 0 and 1, you need to understand 'Character set', 'Encoding', and 'Decoding'.

Character set
It refers to a set of characters that a computer can understand and output. For example, a computer has a character set <a,b,c,d,e>. That means, the computer cannot understand 'f', and others. Then, how does the computer can understand the characters? -> Encoding
Encoding
Even if a character belongs to a character set, the computer cannot understand the character itself. It is similar to a number. Just as computers convert decimal numbers into binary numbers, they have to go through the process of converting characters that humans understand into 0 and 1, called encoding.
Decoding
On the other hand, humans cannot understand characters converted to 0 and 1. Therefore, The process of converting data consisting of 0 and 1 into characters that humans can understand is called decoding. This is the opposite process of encoding.

※ An early set of characters, the ASCII code https://en.wikipedia.org/wiki/ASCII

ASCII - Wikipedia

From Wikipedia, the free encyclopedia American character encoding standard ASCIIMIME / IANAus-asciiAlias(es)ISO-IR-006,[1] ANSI_X3.4-1968, ANSI_X3.4-1986, ISO_646.irv:1991, ISO646-US, us, IBM367, cp367[2]Language(s)English (made for; does not support all l

en.wikipedia.org

Characters in the ASCII can be represented by 7 bits each, which means the total number of ASCII characters is 128.

However, That is not enough to express Korean. <The extended ASCII code can express up to 256 characters with 8 bits, but it is also not enough.>

2-1. Korean style encoding: EUC-KR

In Korean, one syllable can be made up of a total of three parts of combinations. ('강'=ㄱ + ㅏ+ ㅇ)

Therefore, A unique encoding technique in Korean was needed. That is EUC-KR.

EUC-KR is based on a character set called 'KS X 1001', and 'KS X 1003. and it needs 2 bytes to represent one Korean character<Four hexadecimal numbers>. In that way, about 2,350 characters of Korean characters can be represented. but it is not enough to express all of the Korean characters.

2-2. Unicode and UTF-8

Unicode is a huge character set. In addition, it contains most special characters and can express languages around the world. Unicode has many encoding methods such as UTF-8, UTF-16, UTF-32, etc. UTF-8 is the most popular.

https://ko.wikipedia.org/wiki/UTF-8

UTF-8 - 위키백과, 우리 모두의 백과사전

위키백과, 우리 모두의 백과사전. UTF-8은 유니코드를 위한 가변 길이 문자 인코딩 방식 중 하나로, 켄 톰프슨과 롭 파이크가 만들었다. UTF-8은 Universal Coded Character Set + Transformation Format – 8-bit의

ko.wikipedia.org

'Computer Science > Computer Architecture & Organization' 카테고리의 다른 글

[Intro] The Big Picture of Computer Architecture & Organization (1)	2023.09.09

'Computer Science/Computer Architecture & Organization' Related Articles

[Intro] The Big Picture of Computer Architecture & Organization 2023.09.09

HGU_CSEE

Basics of Data 본문

Basics of Data

1. How to represent numbers in computers

"How to represent the negative binary numbers?"

※Two's complement's limit

2. How to represent characters in computer

2-1. Korean style encoding: EUC-KR

2-2. Unicode and UTF-8

'Computer Science > Computer Architecture & Organization' 카테고리의 다른 글

티스토리툴바