Programming Languages – Basics

To understand programming languages, one needs to know a bit about how a processor works. A Processing Unit is that part of an integrated circuit chip that actually performs computations. It needs to read instructions, interpret them, and perform them. This is all done by a set of Logic Gates implemented by sets of transistors on the chip. Modern chips have up to 7 billion transistors or more than one billion logic gates.

  • All electronic computation is controlled and computed by electronic signals passing in step through these logic gates.
  • A processing unit instruction is a digital number that triggers a set of data to pass through a particular computation unit of the processor.

Each PU has some internal memory called registers, usually some dedicated to instructions and some to data (although they could be mixed). Here data and instructions are stored for quick access to the PU. So the PU goes through a cycle of:

    1. Read an instruction from main memory into registers (if not already available)
    2. Load an instruction from instruction queue
    3. Interpret instruction
    4. Perform instruction
    5. Leave results in specified location
    6. Return to step #1

Instructions to a processor are extraordinarily simple, each performing just one very specific, elemental task. They do things like:

    1. Move 1 piece of data from main memory to some register. (Load)
    2. Move 1 datum from some register into main memory. (Store)
    3. Perform an arithmetic or logical operation on the specified datum. (Performed by ALU)
    4. Perform a comparison on data and branch to new instruction depending on the result.
    5. Send a signal to a given address (e.g. that would initiate an action on some piece of hardware)

 

Caveat:

This is an extreme simplification of what is done, particularly in modern CPUs. Still, it gives agood idea as to how they work.

So, when the instruction decoder of the PU detects a particular number, it allows data from one or two of the registers to pass through the portion of the processing circuitry that will perform the desired action. So, if you have put the number 2 in register 1 and 2 in register 2, and you give an instruction:

add   R1 R2 R6

Then the values from R1 & R2 will be routed through the adder circuitry and the resulting 4 will be moved into R6. In a sense it is rather simple. It just adds up to a lot of complexity. The devil is in the details!

  Assembly language programming

Because all operations are so elemental, the simplest operations become long and tedious. Obviously today’s processors are powerful enough to combine some of these, or to move data in larger blocks, but if we assume a very simple CPU as from the 1970s, we can understand the process that is still the heart of what is done today.

Let’s suppose we have a simple processor that is connected to a streaming display that will show a series of characters moving across it. And let’s say that the display reads from a memory buffer that is accessible to the CPU beginning at address 4000. Characters are stored there and are sent to display by writing the number 1 to memory location 4099 which is a trigger to the device which will then display them. We will be sending the string “Hello world!” to the display. (Quote marks not included.) In our example, this string has already been stored beginning in memory location 1206. (The string is terminated by the value 0.)

A simple set of Assembly Language instructions would look something like this:

// Double slashes signify comments not part of program
// Take a string stored @1206 and send it to the display
// Set start of string and start of display

 

      set   str   1206        // Memory location of first character
      set   R2    4000        // Mem location of display buffer
 
      set   R1    str         // Initialize Register 1 to address
                              // of first char
 
loop1:                        // This is a label
      load  @R1   R3          // Move 1 char from string to Register 3
 
      storeR3    @R2          // Put contents of R3 into memory
                              //   location address from R2
      br0   R3    Continue1   // If char is NULL (string terminator)
                              // then Branch to label “Continue1”

                              // If not end of string then...

      incr  R1                // Increment (add 1) to R1
                              // – now points to next char
      incr  R2                // Increment pointer to display buffer

      goto  loop1             // Return to top of loop and repeat
                              // until the char retrieved = 0
Continue1:
      store @40991           // Tell display to run the string

      …

 

As you can see, this is very tedious! It is also extremely prone to errors.

Higher level languages

So people invented higher level languages in which they could clearly specify their programs but in a language that is more easily understood. That whole program fragment above would reduce in some C-like language to:

printf( “Hello World!”);

A program source code written in the higher level language is then run through another program called a Compiler which translates it into Machine Code which is a digital representation of the assembly program above (without the comments) that the processor actually can interpret.

The difficulty here is that when a new central processor is created, a new compiler must be written (at least the target language part). But this is small work compared to trying to rewrite all the programs. And completely new systems are very rare. New processors keep the instruction sets of previous ones, or only extend them. This is particularly true in the CPU world. For this reason, the current line of Intel processors still have instruction sets that harken back to the original 8086 – obviously not completely, of course.

Apple, on the other hand, has had to port their operating systems several times. First, from the Motorola 6800 series to the PowerPC chip family. Then, in 2006 the ported OSX to the Intel’s processor family. Then, with the advent of the iPhone, they did what is essentially a port of OSX to the ARM architecture. True, iOS is not OSX, but they do share the basis of open source XNU kernel and Darwin operating system (derived in part from BSD Unix). The most important difference is that the interface layer is Apple’s Cocoa Touch, not the generic Cocoa. The important point is that they have successfully ported to the ARM architecture.

 


1024px-KL_Intel_C8008-1

[Source: Wikipedia]
[/con]

Actual example source program for Intel 8008 microcontroller from 1972 – 1983. The original 8-bit processor contained 3,500 transistors, and ran at a blazing 500 KHz. The following 8008 assembler source code is for a subroutine named MEMCPY that copies a block of data bytes of a given size from one location to another.

                  ; MEMCPY --
                  ; Copy a block of memory from one location to another.
                  ;
                  ; Entry parameters
                  ;       SRC: 14-bit address of source data block
                  ;       DST: 14-bit address of target data block
                  ;       CNT: 14-bit count of bytes to copy
                               ORG     1700Q       ;Data at 001700q
001700 000         SRC         DFB     0           ;SRC, low byte
001701 000                     DFB     0           ;     high byte
001702 000         DST         DFB     0          ;DST, low byte
001703 000                     DFB     0           ;     high byte
001704 000         CNT         DFB     0           ;CNT, low byte
001705 000                     DFB     0           ;     high byte
                               ORG     2000Q       ;Code at 002000q
002000 066 304     MEMCPY      LLI     CNT+0       ;HL = addr(CNT)
002002 056 003                 LHI     CNT+1
002004 327                     LCM                 ;BC = CNT
002005 060                     INL
002006 317                     LBM
002007 302         LOOP        LAC                 ;If BC = 0,
002010 261                     ORB
002011 053                     RTZ                 ;Return
002012 066 300     GETSRC      LLI     SRC+0       ;HL = addr(SRC)
002014 056 003                 LHI     SRC+1
002016 347                     LEM                 ;DE = SRC
002017 060                     INL
002020 337                     LDM
002021 364                     LLE
002022 302                     LAC                ;HL = HL+BC
002023 206                     ADL
002024 360                     LLA
002025 301                     LAB
002026 215                     ACH
002027 350                     LHA
002030 307                     LAM                ;Load A from (HL)
002031 066 302      GETDST     LLI     DST+0       ;HL = addr(DST)
002033 056 003                 LHI    DST+1
002035 347                     LEM                 ;DE = DST
002036 060                     INL
002037 337                    LDM
002040 364                     LLE
002041 353                     LHD
002042 330                     LDA                 ;D = A
002043 302                     LAC                 ;HL = HL+BC
002044 206                     ADL
002045 360                     LLA
002046 301                     LAB
002047 215                     ACH
002050 350                     LHA
002051 373                     LMD                 ;Store D to (HL)
002052 302          DECCNT     LAC                 ;BC = BC-1
002053 024 001                 SUI     1
002054 320                     LCA
002055 301                     LAB
002056 034 000                 SBI     0
002057 310                     LBA
002060 104 007 004             JMP     LOOP       ;Repeat the loop
002063                         END
In the code above, all values are given in octal. Locations SRC, DST, and CNT are 16-bit parameters for the subroutine named MEMCPY. In actuality, only 14 bits of the values are used, since the CPU has only a 14-bit addressable memory space. The values are stored in little-endian format, although this is an arbitrary choice, since the CPU is incapable of reading or writing more than a single byte into memory at a time. Since there is no instruction to load a register directly from a given memory address, the HL register pair must first be loaded with the address, and the target register can then be loaded from the M operand, which is an indirect load from the memory location in the HL register pair. The BC register pair is loaded with the CNT parameter value, and decremented at the end of the loop until it becomes zero. Note that most of the instructions used occupy a single 8-bit opcode.

2 thoughts on “Programming Languages – Basics

  1. sir,
    This is a very good starter. Please fill this out with how the higher level languages create efficiencies or do not. Most of the people I talk with correlate a higher level language with higher efficiency in the code. My experience is that this is a misnomer but not wrong exactly. An explanation of this area would be greatly appreciated and I have tried on multiple occasions and missed badly.
    I very much enjoyed this article and will be following much closer in the future.

Your comments are appreciated.