This article explains the most commonly known attack buffer overflow at a very basic level.
What is Buffer over Flow?
A traditional definition defines BOF as "In computer security and programming, a buffer overflow buffer overrun, is an anomaly where a process stores data in a buffer outside the memory the programmer set aside for it. The extra data overwrites adjacent memory, which may contain other data, including program variables and program flow control data. This may result in erratic program behavior, including memory access errors, incorrect results, program termination (a crash), or a breach of system security." source: wikipedia
Simply speaking, a buffer overflow is a condition in a program whereby a function attempts to copy more data into a buffer than it can hold.
Technically speaking, Buffer overflow is a result of lack of bounds checking on the size of input being stored in a buffer array.
A SIMPLE EXAMPLE OF BOF
Suppose, I’ve written a simple program [in some computer language].This is what the program does...
printf("Please type something: ");
printf("You typed %s\n", buffer);
-It prompts the user to 'type' something..
-It allocates a 'buffer' [90 bytes of memory for example] to temporarily hold the 'user input
-User types in some 'data'
-The program copies user's input to the 'buffer'
-Reads and Print the data in the buffer to the screen.
Everything thing runs smooth and I'm a happy coder. But here's the problem, What if, the user enters some data which is more than 90 bytes ? Say 150 bytes. Well.. my program simply crashes with an error ! BUFFER OVERFLOWED! But why? What went wrong?
Well, Buffer overflow is a cause of inefficient programming. I must agree. But it may happen because of some limitation of the programming language as well.
What went wrong in our case is the program was coded to accept no more than 90 bytes. But in our code we used a function named gets().
An get is a C library function which do not perform boundary check.
What gets() does is copies data typed in at the keyboard and copies it to a buffer. But the nature of gets() function is it keeps on copying data until it finds a 'NULL byte' and if the first NULL byte happens to be 150 bytes away from the beginning of the string then 150 bytes will be copied to a 90 bytes buffer, which would surely cause an 'overflow’. Which happened in our case, lack of bounds checking on the size of input being stored in a buffer.NULL It’s is a byte with a numeric value of 0. Strings are terminated with a NULL to denote the 'end'.
Some other functions in C which do not perform bounds checking are
strcpy() strcat() sprintf() vsprintf() scanf() getchar() etc.
Now lets cover up some more basic concepts..
What is a process?
A process is a program in execution. An executable program on a disk contains a set of binary instructions to be executed by the processor.
How a Process is organized in the memory?
STACK Higher Memory [0xFFFFFFFF]
------------------ Lower Memory [0x00000000]
TEXT : This is an area where the executable code or the program code [instructions ]reside. Includes 'read-only data’. In an executable file we usually have a text section, this region includes that as well. Any attempt to write data in the text region will cause a 'segmentation violation.'
DATA : This is the region of memory where static variables are stored. In an executable file we have 'data-bss sections’. This is the region which holds the information.
HEAP: This region of memory holds dynamic length data. This area of memory is allocated dynamically at run time for process. We’re not much concerned about 'heap' as the 'Stack BOF attack' are more in the wild.
STACK: This region is used to dynamically allocate the local variables used in functions, to pass parameters to the functions, and to return values from the function. HLL [High-Level-Language] like C , C++ introduced procedures or functions.
Stack works with LIFO [last in, first out] queue concept. It means the last object placed on the stack will be the first object removed. Think of a 'book stack . The last book you put in the book stack would be the first book to be picked up next time in the queue.
STACK -The real implementation
Stack is also known as the region 'where data is manipulated’. Point to be noted !
A program is a collection of lots of procedures [Snippet of computer code]. Each of these procedures executes and performs their job and once done the next procedure is called.
In structured programming a 'procedure call' alters the flow of control and when finished performing its task, a function 'returns control' to the 'statement' or 'instruction' following the call. This is implemented with the help of Stack.
RET : Saved Return Address: This is when a function or procedure is called; the system saves where it was called from. So when the function ends, it will read the ‘return address’ and let the program return to where is left off. This address is also known as the "saved return address" and it’s very important.
We have a fair idea now on stack and it's role.
Lets go back to our 'program' and see why it looks in stack.So when our program calls main() function it will make 'room' for Buffer[ ] on the stack.The space would be of 90 bytes.
This is how it would look in the memory..
BUFFER[ ] <----- 90 bytes space allocated for BUFFER[ ]
RETURN ADDRESS <----- When the user inputs data the program control would come here and follow the 'address' stored here to go back.
But if the users inputs more than 90 bytes of data...for example XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX [user input]
This is how it would look in the memory..
What it did is overflowed the space allocated for BUFFER[ ] and even overwrote the RETURN ADDRESS ! Now,the program control would not find the return address and show us an error 'segmentation violation'. Now,we know what happened behind the screen when our program crashed !
Now the question is what could have prevented it to crash.Well, simply by using an alternative for gets() which is fgets() where 'bounds checking' can be done.
Now the second question is how a system compromise happens then?
Well..simply speaking by overwriting the Return address and pointing it to a different location in memory where the malicious code is present.This malicious code is also known as 'Shell Code'. A shell code is usually a malicious code [A payload for an attack or exploit] that will spawn a shell or command prompt in a system.Hackers usually craft a BOF exploit which includes a shell code..so that after a successful exploitation of the vulnerability a 'shell' would be spawned back to the hacker and then the entire system would be compromised.
Buffer overflow is called the hacker's silver bullet attack.The recent devastating worm Conficker/Downadup/Kido exploits the buffer overflow vulnerability in the Server Service on Microsoft Windows System. Approximately 20 years back BOF was first mass exploited by Morris worm.It's not with in our reach to control human errors in coding always but we can protect systems from getting compromised with proper preparation.During Conficker outbreak Symantec IPS [Intrusion Prevention System - a component installed with SEP] , for instance ,did a great job protecting the endpoints from being exploited with BOF and I realized the true potential of the IPS engine.The next-gen IPS engine by Symantec called 'Krypton' is already released,which I believe will work wonders in preventing BOF attacks.