Tuesday, September 6, 2011

Practical Algorithms and Data Structures : Arrays–Part 1

To understand arrays, you first need to understand some basic concepts of how computer memory is managed. This is going to be a very high-level explanation with many over simplifications, basically I hope to share just enough to ensure that you don’t get lost when we get into the details of arrays.

You can think of computer memory as a collection of blocks which are placed sequentially one after the other in a row. Each block is uniquely identified by a number, this number in known as a memory address. For example, the first block would be at memory address 0 the next at memory address 1 and so on until you reach the last block.

Each of of these blocks or memory locations can store a single piece of data, this piece of data is known as a byte. A byte is made-up of 8 bits where each bit can have a binary state of either ‘1’ or ‘0’, 256 unique combinations of 1s and 0s can be formed for a byte. The interpretation or meaning associated to each of these patterns depends on what part of the system is looking at the data and how it has chosen to interpret the data. For example if your application is interpreting the data in the memory area as letters of the alphabet you might choose to interpret the bit pattern 01000001 as the upper case letter ‘A’. If on the other hand your application is treating the data as a numerical quantity representing someone's age for example, that same bit pattern would be interpreted as the numeric value 65. (See note 1 below)

More complex data can be represented by combining the patterns of multiple memory locations and interpreting those patterns appropriately. Let me give you an example.

Suppose you wanted to store a string of characters in memory, you could choose to store that string with a length prefix which indicates how many characters are in the string, and the subsequent memory locations would contain the bytes that represent each character of the string.

String in memory

The above image represents a string stored in memory with a length prefix of 5. Memory location 0 contains the bit pattern for the number 5 indicating that the next 5 memory locations contain the character data of the string. Memory location 1 contains the bit pattern 01100011 which has the numeric value of 99, but because we know this is a character of a string we will map this to the character ‘c’ (We are using an ASCII mapping here, see note 2 below), the ‘r at memory address 3 is encoded as 01110010 which represents the numeric value 114. This mapping is done for the 5 memory locations after the length prefix.

Similarly we need encoding mechanisms to store large numeric integer values, real numbers which can represent fractions of a whole number etc. As you might have already realized, the data stored in a single memory location can be quite limiting having only 256 possible values. That is not much to work with, it might be fine for storing your age, but what about the population of a country? As with the example above where we used multiple memory locations to store a string, a similar solution can be found for storing larger numbers that will require multiple memory locations which when interpreted as a whole represent a larger numeric value. For example, using 4 memory locations we have a total of 32 bits of data which gives you 4,294,967,295 unique bit patterns which covers us for the population for any country in the world, we need to go even bigger if we want to store the entire population of the world however.

The important thing here is that you know up front how to interpret the various pieces of information stored in the memory and how many bytes make up a single unit of information.

As you can imagine, these encodings get quite complex, fortunately you rarely need to deal with these low level details, for the most part the details are nicely taken care of for us by the higher level tools that we use to write our software.

The important take away from this post, is that more complex data representations can require multiple memory locations to store a single instance of data.

Prev – Introduction

* Notes:

  1. The bit patterns are not random, they follow a numbering scheme known as the binary system. Read more about this here.
  2. Interpreting the bit patterns as characters is done using a look-up table, in this case I have used the ASCII table which you can read more about here.

No comments:

Post a Comment