Lecture 10 Embedded operating systems

Transcript Lecture 10 Embedded operating systems

Advanced Embedded Systems
Lecture 10
Embedded operating
systems (2)
1
Advanced Embedded Systems
Intertask communication mechanisms





The mechanism used to implement intertask communication can
affect performance and energy consumption;
General purpose operating systems use intertask communication for
transferring large amounts of data; while real-time systems may also
have to transfer many data, they must also be optimized to meet
real-time requirements;
Intertask communication is made through semaphores, buffers,
queues and mailboxes;
A semaphore is a flag; setting a semaphore can activate a task; the
semaphores are accessed by the tasks;
A buffer is a memory area; a task may ask a buffer from the RTOS,
put the data in it and tell the RTOS to pass the data to an other task;
the output task typically receives pointers telling it where the buffer
is in memory and how many bytes are there;
2
Advanced Embedded Systems

A queue is a string of buffers; a task can place a message in a
buffer but the output task may be busy; the sending task asks the
RTOS to put the message in a queue after other messages until the
output task will be ready;

A circular queue or ring buffer is used in the same way as a first in/ first
out list:



Ring buffers are easier to manage;
In the ring buffer, simultaneous input and output to the list are achieved
through head and tail pointers; data are loaded at the tail and are read from
the head;
Additional code is necessary to test for the overflow condition in the ring
buffer; an overflow occurs when an attempt is made to write data to a full
queue;
3
Advanced Embedded Systems


A RTOS uses the mailboxes for receiving, for a task, messages from
other tasks; the messages are stored until the task is ready;
Next figure shows the solutions for the intertask communication:
4
Advanced Embedded Systems

Problems with semaphores:


Problems may arise if the operation of testing and setting a semaphore
is not atomic, that is uninterruptible;
Ex.:
procedure P(var S: boolean);
begin
while S = TRUE do;
S := TRUE
end
and the corresponding assembly code:
LOAD
R1,S
TEST
R1,1
JEQ
@1
S = TRUE ?
STORE
S,1
S := TRUE


Suppose the process using this semaphore primitive is interrupted
between the TEST and STORE instructions;
The interrupt routine, which might use the same resource, finds S to be
available and begins to use it;
5
Advanced Embedded Systems




If this task then suspends, for example because its time slice ended, and
the interrupted task resumes, it will see the device as free because the
old contents of S are still in R1;
Thus, two tasks attempt to use the same resource and a collision occurs;
Worse: this problem may occur infrequently being difficult to test and
detect;
A solution : the test-and-set instruction:






The instruction fetches a word from memory and tests one of its bits, for ex.
the most significant;
If the bit is 0, it is set to 1 and stored again, and a condition code of 0 is
returned;
If the bit is 1, a condition code of 1 is returned and no store is performed;
The fetch, test and store operations are indivisible;
Some processors have in their instruction sets test-and-set instructions
(Freescale family) others do not;
In the later case the feature must be implemented using specific solutions: for
example the LOCK prefix (at the x86 family) or by disabling and enabling the
interrupt system;
6
Advanced Embedded Systems
Power management






Hardware may provide different mechanisms to manage power,
such as sleep modes and clock rate control;
Methods that reconfigure system state to optimize power
consumption are known as dynamic power management; they are
managed by the operating system which provides a software
interface to the system tasks;
The operating system sees its own power states as a resource to be
managed along with other resources;
Centralizing the control of the power management mechanisms in
the operating system allows the OS to ensure that all necessary
components are notified of any change to the power management
state;
Power modes are managed in PC by the Advanced Configuration
and Power Interface;
The hardware solutions are more appropriate in ESs;
7
Advanced Embedded Systems
File systems in embedded operating systems



Limitations on power consumption and code size cause embedded
file systems to be designed differently than in case of workstation
oriented file systems;
The most important difference is caused by the fact that flash
memory is often used as the storage component for embedded file
systems;
Embedded file systems may vary according to different criteria:


Compatibility: some ESs do not directly expose their file systems to other
computers and so the file system may use any internal structure; other
devices, especially those with removable media, need compatibility with
other systems; compatibility is important particularly in some types of
consumer devices like CD/ MP3 players which must play MP3 files and
audio CDs written by other devices (PC);
Writeability: some devices, such as CD players, need only read files;
others, like digital cameras, must be able to write files as well;
8
Advanced Embedded Systems




Flash based memory introduces new challenges;
The first major difference is that flash cannot be written at word level
as RAM; flash memory must first be erased, at block level, and then
written; flash memory blocks may be as large as 64 kbytes in size
which is considerably large; erasing the block needs more time than
reading it;
The second major difference is that a program/ erase cycle wears
the device; the voltages applied during the cycle stresses the circuit
and may cause the fail of the memory cell; today’s flash memories
can withstand a million program/ erase cycles but a careful design of
the file system can decrease the number of these operations
increasing the lifetime of the memory device;
There are 2 technologies for flash memory: NAND and NOR; NOR
flash memories can be read similar as the RAMs are read but NAND
flash memories are more prone to transient read errors and must be
accessed as block devices; as a consequence, different file system
implementations are developed for NAND based or NOR based
flash memories;
9
Advanced Embedded Systems





Because flash memories wears out much more quickly with writes
than other types of permanent storage, wear-leveling techniques are
used for maximizing the lifetime of the flash memory;
Such a technique distributes writes around the memory avoiding the
excessively use of one block;
However a problem is with the file allocation table: whenever a file is
created, destroyed or changed in size, the file allocation table must
be updated and it can wear out much more quickly than the
reminder of the flash memory;
This is why it is recommended formatting or bulk erasing flash
memories rather than deleting individual files (bulk erasure performs
many fewer program/ erase cycles than individual file deletion);
The fig. shows the organization of a virtual mapping based flash
memory system;


It handles wear leveling and the operations particular to flash
memory based file system;
The file system sees the memory as a linear array of bytes
addresses with virtual addresses;
10
Advanced Embedded Systems



The virtual mapping system uses a virtual memory mapping table to
translate the virtual addresses into physical addresses in the flash
memory;
The virtual mapping table may be stored entirely in the flash, or it may be
cached in RAM in the host processor;
The virtual mapping system can handle several tasks:




Manage the scheduling of block program/ erase operations;
Consolidate data, moving some data in a block to a new location to make an
entire block empty so that the block can be erased and reused;
Identify bad blocks of memory, much as a magnetic disc controller substitutes
good sectors for bad sectors;
Occasionally move infrequently modified data to a new location to equalize
wear levels across the memory;
11
Advanced Embedded Systems
Memory management in embedded operating systems


Memory management is typically done by general purpose operating
systems;
Reasons for a RTOS for providing memory management:




Memory mapping hardware can protect the memory spaces of the
processes when outside programs are run on the ES;
Memory management can allow a program to use a large virtual address
space;
Memory management means: memory tasks management, memory
allocation, memory loading calculation and time loading calculation;
Memory tasks management:


Tasks switching needs to save and restore in and from memory the
context of each task; for that, the task-control block model or the runtime stacks are used;
Each task has a task-control block and a list is created; this can be either
fixed or dynamic;
12
Advanced Embedded Systems



In the fixed case, n task-control blocks are allocated to n tasks (all in
dormant state) at system generation; as tasks are created, the taskcontrol block enters the ready state and will be used when the tasks are
executed; if a task is to be deleted, its task-control block is placed in the
dormant state; no real-time memory management is necessary;
In the dynamic case, task-control blocks are added to a dynamic data
structure (e.g. a linked list) as tasks are created; when a task is deleted,
its task-control block is removed from the data structure and the memory
becomes unoccupied or available; real time memory management is
necessary for supplying the task-control blocks;
A run-time stack needs several conditions to be carried out:



Two routines, “save” and “restore”, are necessary; the “save” routine saves
the current context into a stack; the operation must be done immediately after
the interrupts have been disabled; the “restore” routine should be called just
before interrupts are enabled and before returning in the main program;
Maximum stack size needs to be known in advance; if it is not known, a
catastrophic memory allocation can occur and the event determinism will be
missed; ideally, provision for at least 1-2 more tasks then anticipated should
be allocated to the stack for spurious interrupts and time overloading;
Often a single run-time task is not enough in a multitasking ES; advantages:


It permits tasks to interrupt themselves (e.g. in case of spurious interrupts);
Languages which support reentrancy and recursion (such as C) can be used; a
single stack model is suited only for non-re-entrant languages, such as the assembly
language;
13
Advanced Embedded Systems

The fig. shows the possibly effect of a “save” and a “restore” routine:
14
Advanced Embedded Systems

Memory allocation:





Dangerous allocation must be avoided; it precludes system determinism,
it can destroy event determinism (e.g. overflowing the stack) or temporal
determinism (e.g. by entering in a deadlock situation);
Two types of memory allocation: static and dynamic; only the dynamic
one will be discussed;
Different types of memory may coexist in ESs; they must be known by
the RTOS and by the tasks; if a task that launches a memory request
doesn’t know the memory characteristics, the performance may be
affected (e.g. the internal or external memory at microcontrollers);
The memory allocation is made in contiguous blocks called segments or
pages; the segment and page notions are specific to 16 and 32 bit
processors which manage them and verify the access to them through
OS and hardware;
However, they exist at 8 bit processors too, no difference being made
between them; the physical memory space was extended using I/O ports
or I/O bits (at microcontrollers) without the processor awareness;
15
Advanced Embedded Systems

Techniques for memory allocation:





Swapping,
MFT,
MVT,
Demand paging;
Swapping: the simplest scheme to allocate memory to 2 processes;



The OS and one process co-reside in the memory space not required by the
OS, called user space; when a second process needs to run, the first process
is suspended and swapped, along with its context, to a secondary storage
device (disk); the second process, along with its context, is loaded in the user
space and initiated by the dispatcher;
The solution can be used along with round-robin or preemptive techniques but
the execution time of each process would be long because of the swap time;
the principal component of the swap time is the access time of the secondary
storage device;
Overlaying is a special case of swapping; it permits that a single program is
larger than the allowable user space; the program is broken up into code and
data sections called overlays, which can fit into available memory, and which
are swapped;
16
Advanced Embedded Systems

MFT (Multiprogramming with a Fixed number of Tasks):


The user space is divided into a number of fixed-size partitions, allowing that
more than one process is memory-resident at any one time; it is useful when
the number of tasks to be executed is known and fixed, as in many ESs;
MFT uses memory inefficiently because of the fixed size; external
fragmentation occurs when a memory request cannot be satisfied because a
contiguous block of the needed size does not exist even if the total amount of
available memory is enough: internal fragmentation occurs if a process needs
less memory than the partition size; it can be reduced by creating fixed
partitions with different sizes and then allocating the smallest partition greater
than the required amount; real time performances are degraded because of
the overhead associated;
17
Advanced Embedded Systems

MVT (Multiprogramming with a Variable number of Tasks):



Memory is allocated in variable amounts, determined by the requirements of
the process to be loaded in memory; this solution is more appropriated when
the number of tasks is unknown or varies; small or no internal fragmentation
occurs so the memory utilization is better than for MFT;
External fragmentation can still occur because of the dynamic nature of
memory allocation and deallocation and because memory must be allocated
to a process contiguously; it can be mitigated by compaction, the process of
compressing fragmented memory; compaction is a CPU intensive process
and is not encouraged in ES; if it must be performed, it should be done in the
background with the interrupt system disabled;
MVT is not appropriate for ES because its context switching overhead is
much higher than in MFT;
18
Advanced Embedded Systems

Demand paging:








It is possible to load program segments in noncontiguous memory as they are
requested in fixed-size chunks called pages;
External fragmentation is minimized;
Program code that is not held in main memory is swapped to secondary
storage, usually a disk; if a memory request is made to a location within a
page not loaded in the main memory, a page fault exception is generated;
The interrupt handler searches the requested page in the secondary storage
and loads it in main memory if there is free space or swaps it with an already
loaded page; a replacing algorithm must be implemented, the most spread
being the LRU (Least Recently Used);
Paging is advantageous because it allows nonconsecutive references to
pages via a page table;
Paging can be used in conjunction with switching hardware to extend the
virtual address space;
Pointers are used to access the desired page; they may represent memorymapped locations to map into the desired hard-wired memory bank;
Pointers may be implemented through associative memory or may be simple
offsets into memory;
19
Advanced Embedded Systems

In the latest case, the actual address in main memory needs to be calculated
with each memory reference;

The technique uses efficiently the memory but it is not appropriate in ESs
because of the great overhead caused by the page swapping (the disk access
time) and the associated hardware support (disk) is not usually available;
Another disadvantage: the lack of predictable execution times just because of
the switching overhead; the solution consists of locking into main memory
certain code and data segments or pages, along with their run-time stack;
they will not be swapped out so the execution times for the locked processes
will decrease and, more important, the execution times will be guaranteed; but
fewer pages will be available for the application;

20
Advanced Embedded Systems







Another disadvantage: thrashing, which is a very high paging activity;
Example: let the execution of an instruction that needs a source operand and
a memory write; the code and the source operand are read but a page fault is
generated at the writing because the needed location is not resident; the
interrupt handler searches a page from the main memory to be put on the disk
for freeing space for an incoming page; many algorithms choose the page
that was not modified thus saving the disk access time; a good candidate may
be even the page which contains the source operand; it is replaced with the
page containing the destination and the instruction is restarted; a new page
fault will be generated because the source operand is now missing and so on;
Thrashing can be eliminated if the instruction continues its execution instead
of restarting it but this operation is difficult because high amount of context
must be saved at the swapping;
Another disadvantage: memory locking;
In a real time system is often desirable to lock all or certain pages of a
process into memory in order to reduce the overhead involved in paging and
to make the execution times more predictable;
Any process with one or more locked pages is prevented from being swapped
out to disk;
Advantage: decreases execution times for the locked modules and
guarantees execution times;
21
Advanced Embedded Systems


Disadvantage: fewer pages are available for the application;
Shortly about paging:









Paging is more efficient when supported by the appropriate hardware;
Paging allows multitasking and extension of the address space;
When a page is referenced that is not in main memory, a page fault occurs,
causing an interrupt;
The hardware registers that are used to do page frame address translation
are part of a task’s context and give additional overhead when doing a context
switch;
If hardware page mapping is not used, then additional overhead is incurred in
the physical address calculation;
The least recently rule is the best nonpredictive page swapping algorithm;
The main disadvantages for real time systems are: thrashing and lack of
predictable execution times;
All the dynamic memory allocation techniques (swapping, MFT, MVT and
demand paging) must be avoided in time critical applications because the
overhead they introduce and the additional hardware they need;
Time critical applications must be solved by unitask ESs;
22
Advanced Embedded Systems

Memory load calculation:


Memory load is important in an ES showing how efficient is the memory
used; it may lead to savings in space, power consumption and cost
which is desirable in all ESs;
One can consider that in an ES the memory is divided into: stack or
system area, program area and RAM area; the total memory loading is
typically the sum of the three individual memory loadings, that is:
MT = MPxPP + MRxPR + MSxPS


where MT is the total memory loading, MP, MR and MS are the memory
loadings for the program, RAM and stack areas and PP, PR and PS are
percentages of the total memory allocated for the program, RAM and
stack areas, respectively;
Memory mapped I/O and DMA memory where not included since they
are fixed in hardware and generally need few locations;
The program area contains executable code of the real time program,
including the application software and the RTOS; in addition, fixed
constants can be stored in this area;
23
Advanced Embedded Systems

Program memory loading is calculated as follows:
MP = UP/ TP,
where MP is the memory loading for the program area, UP is the number
of locations used in the program area and TP is the total available
locations in the program area; the linker offers this numbers;


The RAM area memorizes global variables, data and, sometimes,
instructions for increased fetching speed and modifiability; although the
size of this area is determined at system design time, the loading factor
for this area is not determined until the application program have been
completed;
The RAM memory loading can be computed as:
M R= U R/ T R,
where MR is the memory loading for the RAM area, UR is the number of
locations used in the RAM area and TR is the total available locations in
the RAM area; again, the linker offers this numbers;
24
Advanced Embedded Systems


The stack area is used for context savings and automatic variables; one
or more stacks may be kept in this area;
The maximum stack size is:
US = cS x tmax,

where US is the stack size, cS is the maximum number of locations for
the context of a task (locations for registers, program counter, automatic
variables etc.) and tmax is maximum number of tasks that can be in the
system at any time;
Hence, the memory loading factor will be:
MS= US/ TS,

where MS is the memory loading for the stack area, US is the number of
locations used in the stack area and TS is the total available locations in
the stack area;
If MT ≥ 100% the memory is overloaded and the system cannot operate
but the same result is obtained even if MT < 100% and MP, MR or MS ≥
100%;
25
Advanced Embedded Systems




There are few solutions for reducing memory loading: variable selection,
memory fragmentation, reuse variables and self-modifying code;
Variable selection: memory loading in one area can be reduced at the
expense of another; for example all automatic variables (variables that
are local to procedures) increase the loading in the stack area, whereas
appear in the RAM area; by forcing variables to be either local or global
the memory load can be balanced between the two areas; in addition,
intermediate results calculations that are computed explicitly require a
variable either in the stack or in the RAM area, depending on whether it
is local or global;
Memory fragmentation: reduces the memory loading by increasing the
TP factor and decreasing the UP factor; the memory fragmentation
favorites the memory loading; although sufficient memory is available it
is not contiguous and cannot be used;
Reuse variables: global variables that are used only once, for example
during initialization, can be reused later for other purposes; the variable
names must be generic since they will be playing a dual role; attention
must be paid for preventing the destroying of the content of a variable by
a process if it will be used by another too;
26
Advanced Embedded Systems

Self-modifying code: it is a dangerous method;




It is based on the fact that the opcodes of certain instructions differ by only
one bit; for example, by modifying one bit in the opcode of a JUMP instruction
an ADD instruction is created;
The method is based on coincidence and its main disadvantage is that it
destroys the program’s determinism;
In addition, many processors include on-chip caches and the cache does not
update the code and executes the unmodified code; modifying code within the
cache causes performance degradation;
Time load calculation:


It is necessary to know the execution time of various modules and the
overall system time loading for choosing the adequate design solutions,
including the hardware solutions and the testing and debugging
operations;
Several methods can be used to predict or measure module execution
time and system time loading:


Using specific measurements instruments;
Instruction counting;
27
Advanced Embedded Systems

Using specific measurements instruments:




The logic analyzer and the oscilloscope; the best results are given by the logic
analyzer;
It takes into account hardware latencies and other delays; its drawbacks are
that the software must be coded (at least partially) and the target hardware
must be available; thus it is usually employed in late stages: the testing
phase, the system integration phase;
The oscilloscope is cheaper but offers less information;
Instruction counting: when it is too early for the logic analyzer, or if one is
not available, it is the best method for determining time loading due to
code execution time; it requires that the code already is written; the
approach involves tracing the longest path through the code, counting
the instruction types along the way and adding their execution times

The time loading can be calculated as follows:
T = A1/T1 + A2/T2 + … + An/Tn,


where T is the time loading, n is the number of tasks, Ai is the execution time
for task i and Ti is the number of execution cycles in task i;
The instruction execution times are required beforehand, they can be
obtained from the data sheets, simulators or by direct measurements; the
number of wait states, if they exist, must be known in advance;
The method supposes that there is no overlapping between the instructions
execution times;
28
Advanced Embedded Systems

Memory management in Windows CE





Windows CE is a real time, full featured operating system for lightweight
consumer devices;
Windows desktop applications cannot run on Windows CE directly, but
the operating system is designed to simplify porting Windows
applications to Windows CE;
Windows CE supports virtual memory; the paging memory can be
supplied by a flash memory or by a disk;
The operating system supports a flat 32 bit virtual address space; the
bottom 2 GB of the address space is for user processes, while the top
GB is for the kernel; the kernel address space is statically mapped into
the address space;
The user address space is dynamically mapped; it is divided into 64 slots
of 32 MB each:




Slot 0 holds the current running processes;
Slots 1 – 32 are the processes with slot 1 containing the Dynamically Linked
Libraries (DLLs); only 32 processes can run at any one time;
Slots 33 – 62 are for memory mapped files, operating system objects etc.
Slot 63 holds resource mapping;
29
Advanced Embedded Systems

Each process slot is divided into several sections; the bottom 64 KB is
used as a guard section; the user code grows from the bottom up; the
memory required for DLLs that are called by the processes grows from
the top of the memory space down;
30
Advanced Embedded Systems
Scheduling and interrupts in RTOSs

The two key elements of a RTOS which determine the real time
behavior of the OS:






The scheduler;
The interrupt handling mechanism;
The interrupt system provides its own priorities for the interrupt
handlers; the interrupt handlers can be seen as a distinct set of
processes that are separate from the operating system’s regular
processes;
The interrupt system priorities are determined by the hardware and
internally;
All interrupts handlers have priority over the operating system
processes, since an interrupt will be automatically fielded unless the
interrupts are masked;
The interrupts must not subvert the operating system’s scheduler; it
is a problem of the operating system designer;
31
Advanced Embedded Systems


Interrupts handlers that are dispatched by the hardware interrupt
system must be fast;
In many real time systems, there are two types of handling
interrupts:



The Interrupt Service Routine, ISR: is dispatched by the hardware
interrupt system;
The Interrupt Service Thread, IST: it is a user mode process; the ISR
performs the minimum work necessary to field the interrupt; it then
passes data to the IST that will finish the task;
Interrupts in Windows CE:


Divides interrupt handling into an ISR and an IST;
It provides two types of ISRs:



Static ISRs which are build into the kernel; they provide one way
communication to their IST;
Installable ISRs which can be dynamically loaded into the kernel; they are
processed in the order in which they were installed; they use shared memory
to communicate;
An interrupt is processed over several stages;
32
Advanced Embedded Systems

The main components of ISR latency are:




The main component of IST latency are:







The time required to turn off interrupts;
The time needed for vectoring the interrupt, saving registers and accessing
the ISR’s starting address;
Both of these factors are depending on the CPU platform;
The ISR latency;
The time spent in the kernel;
The thread scheduling time;
The Windows CE scheduler provides two types of preemptive
multitasking;
In a more general purpose style of scheduling, a thread runs until the
end of the time allocated;
In a more real time style, a thread runs until a higher priority thread is
ready to run;
The WCE provides 256 priorities; within a priority level, threads run in
round robin mode;
33
Advanced Embedded Systems
Operating system overhead





Context switching time can be neglected if it is small compared to
the execution time of the process;
It becomes important for very short processes or at very high
utilizations;
The effects of context switching overhead can be studied using
simulators; they model the CPU time for interrupts, operating system
calls and so forth; they can be fed with traces and provide timing
diagrams that show not only the total execution time but the actions
performed by the CPU over time;
Simulators can be very helpful in debugging real time systems;
The fig. shows the effects of context switching overhead:



The results refer to a single processor system;
100 random tasks graphs and schedules for each were generated;
Two design parameters were varied:
34
Advanced Embedded Systems




The time of the interrupt service;
The context switching time;
The deadlines of the tasks were adjusted for providing different amounts
of slack: no slack, 10 %, 20 % and 40 % slack;
The results show that in each plot, the system is highly schedulable
when the interrupt service and context switching times are small and
heavily schedulable when the mentioned times are large;
35
Advanced Embedded Systems
36

Lecture 10 Embedded operating systems

Transcript Lecture 10 Embedded operating systems

Directory