Inside C Pointers

Download Report

Transcript Inside C Pointers

By Anand George
SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Agenda
Have a look on to different factors contributing
to the difficulty in learning pointers in C.
 Address one by one and to have a clear and
in-depth understanding of the concepts.

SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Why Pointers looks difficult?
Lack of understanding of Memory
management in the operating system.
 Lack of understanding of assembly
generated by the C Complier for the C
source code.
 Non Intuitive syntax relations with
arrays, structures, other data types.
 Non Intuitive pointer arithmetic.
 Lack of debugging.

SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Note




All the discussion refer to a 32 bit Modern OS like
Windows 7 32 bit or a latest version of Linux variant
which is 32 bit.
Also assume that the under lying CPU is Intel x86
with No Physical Addressing Extension.
Also assume page file ( not paging ) is tuned off. So
when I say memory it is the RAM chip storage.
The fact that some portion of the address space is
specific for OS kernel is not taken in to picture as it is
not relevant to the discussion.
Don’t worry if this is not making much of a sense to you
now.
SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
What is computer memory?







Normally an electric chip connected to the
motherboard of the computer call RAM.
Very fast in reading and writing data.
Most importantly processor can read and write
data to memory.
Memory can “remember” stored 1s and 0s.
Every 8 bit or a byte of memory can be
addressed which means each 8 bit of memory
has an address.
Ram is also called physical memory.
The address of physical memory is also called
physical address.
SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Windows Memory Management
Modern protected mode operating system
like Windows or Linux uses something
called flat memory model.
 In a 32bit OS ( like a Window 7 32 bit )
every application Process has access to
2^32 address locations of 8 bits or 1 byte.
Which is 4 GB.
 Each application in a OS like Window 7 is a
process.
 You can see the process in the process
Tab of the Task manager in windows.

SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Demo
Task manager process
 Attaching visual studio to different
processes to see the address space of
different processes in visual studio.
 Check a particular address in visual
studio and see if the value is same or
different. If different why?

SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Process
4GB Address space.
 Whole purpose is isolation of one
program from another.
 All program feels like they are the only
one running.
 Address space is giving them that
feeling.

SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Windows Memory Management

Mainly 2 factors complicates memory
management implementation and
learning in a modern OS
1. Protection via segmentation
2. Paging
Above 2 features of the CPU help OS to
implement process or the 4 GB virtual
address space.
SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Segmentation and paging
Segmentation helps mainly for
protection.
 It make sure the OS memory is not
access by application programs.
 Paging also has similar features and it
mostly responsible for the
implementation of Virtual Address
space.
 Not going into details now.

SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
All we need to know is





All application has potential 4GB address space.
All the address spaces are different and map to
different physical memory.
When we say memory, address etc in the context of
a program running in windows we are mostly
referring to some portion in the virtual address space
of that program.
All the code and data or any other information
related to that application is inside the 4 GB address
space.
From and application stand point the address space
is the ‘universe’.
SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Protection
In a multitasking operating system each
task need its own memory region which
other task cannot access.
 Just like in a town each family need a
house to live their own.
 Thinking about every one in a town live
in the same one big house.
 So basically like every individual every
task running in a OS needs it own
space.

SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Protection (cont)
It is practically impossible to implement
protection by an OS without assistance
from the CPU.
 Modern OS ( like Windows or Linux )
uses the CPU feature called “Paging”
and “Segmentation” to implement
protection.

SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
A glimpse into Paging and
Segmentation
A feature in modern CPU like Intel
Pentium, Amd 64, or ARM (slightly
different terminology in arm but same
concept)etc.
 Helps OS to implement mainly 2 things

 Protection mostly via segmentation.
 Virtual Memory ( To extend Ram to disk in
way transparent to the programs /
programmers )

We are not going into too detail.
SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Windows Memory Management
(cont)





Each Process can potentially access 4 GB
of memory.
Does NOT mean that every process has 4
GB of physical memory.
It just means that process can maximum
access up to 4 GB.
4GB of a process is completely different
from the 4GB of another process.
Any address in the 4GB may or may not be
allocated that is why the name address
SPACE for the 4GB.
SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Windows Memory Management
(cont)
Application can access or use only memory
which is ALLOCATED in the 4GB address
space.
 Allocation is done by the OS by request
from the application ( say an malloc ) or
indirectly ( on stack like int a[100]; )
 One particular application process cannot
access any memory outside its 4 GB
address space and it “thinks” that 4 GB
address space is the entire system which
explains the name VIRTUAL address
space.

SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Windows Memory Management
(cont)





(Virtual )Address space contain addresses.
So address is just any integer number
between 0 and 2^32
Other words “Address points to a location
inside the 4GB”
If a variable ( which is a memory location )
contains address we normally call it a Pointer
in C programming language.
Address normally shown in Hexa decimal
format. So a 32 bit address is any number
between 0x00000000 and 0xFFFFFFFF
example 0x1234ABCD
SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Now why we got different values in same
address in two different process.?
The page table contains different entry
for the virtual addresses we had.
 So when the CPU did the virtual address
to physical address translation we got
different physical addresses.
 Details coming up.

SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Mapping of address to physical
memory
myapp.exe
notepad.exe
word.exe
Paging system in Cpu with page tables ( Part of CPU ).
RAM
SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
What is allocation?
More or less adding page table entry to
page table.
 Page table entry maps a virtual address
to a physical address or virtual memory
which is nothing but a file in the disk.

SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Divisions in 4 GB

4GB
 Allocated/ Committed regions
○ Working Set
○ Paged out.
 Free Regions
○ No allocation
○ Just “Space”
○ No counters or nothing kept by OS or CPU or any other program.
○ Any attempt to access will result in a Cpu interrupt which OS can handle.
○ Don’t confuse with Free physical page frame in windows.
 Reserved Regions
○ To avoid fragmentation.
○ Has to be allocated before use other wise same as free regions.
○ Just add a VAD to the process.
○ Once reserved a region OS wont give any of address inside that region to any
allocation which is going to happen in that address space.
Note: Region Means a contiguous group of address. Say from 1000 to
2000. Not much of a typical jargon though. I have heard the word
segment as well. Don’t confuse with arm regions.
SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Address spaces in System
0xFFFFFFFF
Entire System (
Not necessarily
memory .But
more logical)
0xD0000000
Allocated
Chunk
0xC0000000
Allocated
Chunk
0x789AC000
Allocated
0x789AB123
0x789AB000
Allocated
0x1234ABCD
Notepad.ex
e
0x00000000
HelloWorld.
exe
SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Address space
of a process.
not allocated
any RAM to it
Address
space of a
process which
has RAM
allocated.
Demo

Process explorer.
 See different process.
 See the amount of ram.
 Looking at different counters of memory.
VM Map – Looking at the address space.
RamMap – How the ram is being used in the
system.
SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
So what is a pointer after all
It is an allocated memory location of 32
bit size in the 4GB address space which
an application process has in
Windows/Linux. (just like int)
 Pointer contains a number less than
2^32 which normally a C programmer
interpret as an address to some other
allocation in the same 4GB address
space.

SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Pointer





Pointer is a variable of size 32 bit (in a 32 bit OS)
which normally contain an Address to another
variable or block of memory.
Other words, we ( programmers ) interpret the 32 bit
value in a pointer variable as an address.
Practically pointer can contain any number which
can go upto 2^32.
Like a visiting card which normally contains name
and address of a person.
One can print a visiting card with the list of his
favorite TV shows or something very different but
normally we don’t do it.
SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
How do I declare a pointer
variable in C?
int* iptr.
 char* cptr
 In General xxx * xptr where xxx stands
for a type of data.

SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Difference between pointer and a
number.
Pointer Size Always equal to the bitness
of OS. 32 bit in 32 bit OS 64 in 64.
 Support certain special operation like
differencing, pointer arithmetic etc by the
language/complier.

SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Demo

Pointers in visual Studio C program
SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Understanding pointers
Almost behaves like a card/paper which
contains addresses to places.
 In this case address are numbers not
any text.
 Although it is weird in reality it can
contain address to another set of cards
as well.

SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
What pointer pointing to?
Anything in the 4GB address space we
saw.
 Can be invalid unallocated/or allocated or
random locations.
int * ptr in this case ptr can contain
anything.
 Can be 0/NULL int *ptr = 0;
 Can point to a allocated chunk of
uninitialized memory in stack, Example int
a[100]

SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
What pointer pointing to?(cont)
Can point to allocated chunk of
uninitialized memory in heap, example
int *ptr = malloc( 100 )
 Can point to another set of pointers in
the allocated memory. Like the visiting
card has the address of a place where
other set of visiting cards are available.

SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
What is a main uses of pointers?
Share data between different parts of
application (mostly huge chunks of
data).
 Like you give the address of a
house/business location printed in
business card ( pointer).

SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Share of data between different part of
application is a common task which may be
required in other languages as well. Like Java ,
C# then why they don’t have pointers?


Most of the languages have pointers but they
don’t call it pointer but something else. For
example in Java and C# Reference is nothing
but a pointer.
The reason why C, C++ notorious with
pointers is they allow a lot of counter intuitive
syntaxes to manipulate and access the data
pointed by the pointer. While other language
like Java or C# normally has some well
defined intuitive functions to do that same
task.
SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Pointer in the Address space.
Entire System (
Not necessarily
memory .But
more logical)
int* ptr = malloc(100);
The pointer variable ptr
pointing a chunk of 100
bytes.
Allocated
Chunk
0x789AC000
The number inside ptr
is 0x789AC000
Which is the starting
Address of the 100
byte chunk.
int* ptr
Notice that ptr itself has
an address (
0x1234ABCD)which is
in the same address
space.
HelloWorld.
exe
0x1234ABCD
SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Address space
of a process.
not allocated
any RAM to it
Address
space of a
process which
has RAM
allocated.
What is the relation between pointers and the
protection techniques we discussed earlier? Like
Paging and segmentation?
CPU
Program (
*ptr = 100 )
Virtual
Address
Same
Virtual
address
RAM
Physical
memory
address
Segmentation Unit practically turned
off so same virtual address will be
given to paging system.
Physical
Address
Paging
SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
VA
PA
Demo
Visual Studio
 Memory window

 To see the 4 GB Address space.
 See allocated unallocated.
 Not a good idea to scroll memory window.
 Typing address in the Memory window.

Simple Pointer program looking at the
memory and understanding the
allocation and the address assignment.
SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Access Violation
Any access ( read or write ) to an
address other than a committed (or
allocated) region of memory will
generate an access violation – which is
to be handled by OS or the application.
 Internally this is an OS handling of a
CPU interrupt called Page Fault. ( We
will get to details later )

SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Allocated \ Committed further
division

Based on Back Up
 Working Set – Backed up by Ram
 Paged Out – Backed up by page file.

Based on programming aspects/nature of use .
 Stack – Stack of a thread. From example a local variable in C language






is will be using the stack of the thread in which the program runs.
Heap
An malloc in C is getting allocated from the heap.
Mapped.
An exe, or dll or sys is Mapped to the 4GB address space when the
program start. So Global and static variables in C language are allocated
from the mapped space which is part of the exe or dll.
Shared.
Any region can be shared between process/kernel using windows API,
later details. Not too different than any of the ones already discussed.
SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Division of a Committed / Allocated region
Mapped Region
NT Heap Region
Committed or
Allocated
Region
Thread Stack
Shared Regions
Free Or
Reserved
Regions
Working Set
( RAM)
SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Paged out ( Page file
Summary
A 32 bit application has 4GB address
space.
 All memory it access or anything matters to
that program is inside the address space it
has.
 All the application running on the system
has different 4 GB address space.
 A pointer is a variable or memory location
inside the above 4 GB which has a number
inside it which is interpreted by the
programmer as an address to some
location in the same 4GB address space.

SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)
Thank you
SourceLens.org Copyright. All rights reserved.
Content Owner - Meera R (meera at
sourcelens.org)