Memory layout in C
Memory layout in C
The C language is designed so that it becomes easier for a programmer to decide the amount of memory they want to use in a program.
C program consists of:
- Text segment
- Initialized data segment
- Uninitialized data segment
- Heap
- Stack
Text Segment:
- A text segment called a text/code segment is one of the sections of a program present in an object file or a memory containing executable instructions.
- Text segments consist of binary of a compiled program. In other words, it contains machine codes of a compiled program.
- The diagram shown above can be placed below a heap or a stack which in later stages prevents overflow of heap and stack from overwriting it.
- Code segment consists of code that is read-only to prevent the program from modifying the instructions.
- Text segments are sharable; hence only a single copy can be present in the memory for executed programs, like text editors, C compiler, shells, etc.
- These are the low addresses present.
Initialized Data:
- Data segments store program data in the form of initialized or uninitialized variables.
- The initialized data segment, also known as the data segment, is a part of the program's virtual address space containing global, static, constant, and external (declared and defined using extern keyword) variables, which are initialized by a programmer beforehand.
- The size of the data segment is directly proportional to the size of values in a program's source code which does not change at run time.
- This segment of data needs to be altered at run time. Hence it has both read-only and write-only permissions. While variables initialized using constant keyword (const) will apply to read-only,
Eg:
const char* string = "Java_T_Point" //this will be stored in a read-only area.
Data segment can be further grouped into:
Initialized read-only area and
Initialized read-write area.
Eg:
#include <stdio.h>
char c[] = "Java_T_Point"; // initialized global variable in read-write area
const char ch[] = "Hello"; // initialized global variable in read-only area
int main(void)
{
static int a = 10; // static variable which is an initialized data segment
return 0;
}
Uninitialized Data Segment:
- In contrast to the initialized data segment, the uninitialized data segment is generally called the "bss" segment, named after an ancient assembler operator of the name "block started by symbol."
- BSS will be included with all the uninitialized global, static, and extern (extern keyword) variables.
- The kernel initializes an uninitialized data segment to arithmetic 0 before the compiler starts the execution. The pointer will be a null pointer; hence, it will not occupy actual space in an object file.
- When the program is loaded, the program loader will allocate the required memory.
- BSS starts at the end of a data segment and consists of all the global and the static variables that will be initialized to zero or will not have explicit initialization in the source code.
Eg:
static int j;
and
int k; //global variable
Here both will be contained in the BSS segment.
Eg:
#include <stdio.h>
char c; //uninitialized global variable stored in bss
int main(void)
{
static int i; //uninitialized static variable stored in bss
return 0;
}
Heap:
- Heap is a segment in which dynamic memory allocation takes place.
- This particular area begins at the end of the BSS segment, and the addresses usually grow upward from that.
- Heap is managed by malloc, realloc, and free, which uses "brk" and "sbrk" to adjust the size. " brk()" sets the end of a data segment by address while sbrk increments a program's data space by some increment bytes.
- The Heap area is generally shared by most of the libraries and other dynamically loaded modules in a process.
- It is used to allocate memory at run time.
- The heap grows and shrinks in the opposite direction of that of a stack; they are present at the opposite ends of the process's virtual address space.
- It is also a part of the RAM (Random Access Memory), where dynamically allocated memory is stored.
Eg:
#include <stdio.h>
int main()
{
char *p = malloc(sizeof(char)*4); //allocation of memory in heap
return 0;
}
Stack:
- Stack is used in storing all local variables and passing arguments to the functions with the instruction's return address.
- All local variables will be stored in the stack.
- The stack area is traditionally adjoined to the heap area and grows in the opposite direction; when the stack meets the heap pointer, free memory will be exhausted.
- The stack area works on the principle LIFO (Last In First Out); that is, the elements will be added to the top of the stack, and the topmost element will be popped out in the first place. It will be located in higher parts of memory. On a standard PC 8086 computer architecture, it grows towards zero; yet in some architectures, it grows in other directions.
- The stack pointer register tracks the top of a stack. It will change the value every time a value is pushed into the stack.
- The set of values pushed in one function is called a stack frame; it consists of return addresses.
- Stack usually comes below the OS (Operating System) kernel, and it grows downwards to lower addresses.
- Automatic variables will be stored along with information that will be saved each time a function is called; each time it is called, the address of where to return and the information about the environment will be saved on the stack. The new called function allocates room on the stack for its automatic and other temporary variables. This is how recursive functions work.
- Every time a recursive function is called, a new stack frame is used, so that interference between variables does not happen.
Memory allocation is mainly used for protection. An executable file will be created and treated as a process by the OS; it should have its own address space to avoid any conflicts between data and code in a single program.