presentation source
Download
Report
Transcript presentation source
Anti-Virus
Product Development
Cliff Penton
Head of Software Development
Sophos Plc
Slides © 1999 Sophos Plc
http://www.sophos.com/
Who are Sophos?
Founded in 1980 as an electronic design partnership
Moved into data security in 1985
In 1989, among the first to respond to computer viruses
Anti-Virus is the main focus of the business
World leading enterprise-wide anti-virus software
Cover more platforms than any other anti-virus vendor
What do we make?
Conventional product development
WORD 95
OFFICE 97
SR1
SR2
OFFICE
2000
Anti-virus product development...
PRODUCT
FEATURES
OS
DEVELOPMENTS
VIRUS
RESEARCH
NOV
DEC
JAN
FEB
MAR
APR
Anti-virus product development...
Presents simultaneous development challenges:
Complexity
Transparency
Quality
Regularity
Anti-virus product development...
Coping with the complexity
User Interface
Virus Detection
Engine
Virus
Descriptions
Anti-virus product development...
There are many issues, but I will focus on two today...
Multiple operating systems:
DOS/Windows 3.x, Windows 95/98, Windows NT, OS/2,
NetWare, Macintosh, OpenVMS, Unix...
Dealing with multiple languages:
English, French, German, Spanish, Japanese...
Multiple operating systems
The key issues in cross-platform development are:
Endianism
Packing and alignment
Multitasking
Memory management
File I/O
Endianism
Different hardware platforms store numbers in memory
in a different order
Big endian (e.g. SPARC)
Little endian (e.g. Intel)
When exchanging information must be aware of endian
related problems
Endianism
Big endian: 0x01020304
01 02 03 04
Little endian: 0x04030201
Packing and alignment
Some platforms strictly enforce data alignment when
reading and writing memory
Careless memory references may lead to disaster
(SIGBUS, or GPF)
Usually happens when reading structures from a file
with packing set to single byte
Better to read/write struct elements by assignment
Packing and alignment
How big is this structure?
typedef struct {
long a;
char b;
short c;
} x;
Packing and alignment
How big is this structure?
Compiling with Visual C++ 6.0
typedef struct {
long a;
char b;
short c;
} x;
8 with default
packing
7 with 1 byte
packing
Multitasking
Different operating systems use different scheduling
schemes
Cooperative/competitive multitasking
Preemptive multitasking
Tight loops and other compute bound operations need
careful tweaking to maintain performance on
competitive multitasking systems
Memory management
Not all operating systems have virtual memory, so we
cannot rely on malloc() and free()
Some require explicit virtual memory management,
such as DOS and NetWare
Need to use an intermediate layer to conditionally
choose between implicit and explicit virtual memory
management
Memory management
Explicit virtual memory management involves:
Allocating a handle to a memory block
Locking the handle to get a pointer to physical memory
Using the memory as usual
Unlocking the handle, releasing physical memory
Deallocating the handle when finished
File I/O
File I/O primitives differ between operating systems
File security considerations need to be taken into
account
Standard library calls may not provide the required
functionality
Multiple languages
Our Windows products ship in five languages:
English, French, German, Spanish, and Japanese
Introduces issues of character encoding:
UNICODE vs. SBCS vs. MBCS
Adds the overhead of translation to the development
process, which can be significant
Internationalisation
Character sets, alphabets and character encoding
Code pages
Dates and times
Generic coding techniques
Adding resources for multiple languages
English language
26 characters plus others < 256
7 bits == ASCII or 8 bits == ANSI
1 character == 1 byte
SBCS or Single Byte Character Set
Very familiar to anyone who has used strxxx()
functions
European languages
Accented characters are part of many languages
à, ô
French
õ, ¡¿
Spanish
ö, ß
German
Characters 0-127 are the same (ASCII)
Characters 128-255 are called extended characters
Still SBCS, but requires code pages...
Code pages
The extended characters of each language are
supported via code pages.
The code pages in DOS and Windows are different!
DOS - English (British) code page 850 (Latin 1)
DOS - English (US) code page 437 (Latin US)
Windows Latin 1 (ANSI) code page 1252
Example code page problem
DOS CP 850
Windows CP 1252
Far East languages
Now the fun begins…
Chinese has more than 10,000 characters
Japanese has several character types:
Hiragana
phonetic characters
Katakana
phonetic characters, used to spell words
taken from foreign languages
Kanji
characters of Chinese origin
Double byte character encoding
Say hello to DBCS, or Double Byte Character Set, where:
0x00 -> 0x7F is ASCII as usual
0x80 ->0xFF is a combination of Kana (single-byte),
and Kanji lead-bytes
Used on Win95, WinNT, Mac, NetWare, OS/2
Double byte character encoding
Programming for DBCS
1 character != 1 byte
If the character is double-byte, both bytes of the
character must be dealt with together
0x00 is always NUL, so it is safe to scan a string for '\0'
Trail byte values can be confused with other characters
(e.g. \) if not handled properly
Never scan with pointer arithmetic (i.e. ptr++)
UNICODE
Instead of using 1 byte per character, Unicode uses 2
bytes per character
65536 possible characters in one character set
No need for code pages
UNICODE
Word breaking
Sentences in Japanese do not have spaces between
words.
Sentences can be broken at any Japanese character.
Break sentences on spaces and lead bytes.
Dates and times
Date and time representations are not universal
UK
22/05/98
USA
05/22/98
Japan
10Y 05M 22D
Either use an OS call (e.g. GetDateFormat() on
Windows), or
Embed a date format string in a language-dependant
resource
Generic coding techniques
Use the libraries available, e.g. for Win32
_tcsinc() maps to
strinc()
for SBCS
_mcsinc()
for MBCS
_wcsinc()
for UNICODE
Use TCHAR not char
Always enclose literal text with the _T() macro
Formatting messages
Never concatenate two strings to form a sentence
Take care when using printf(), as language
variations may dictate reordering of insertion objects
Win32 can use FormatMessage()
NetWare can use NWprintf() etc.
External resources
Avoid hard-coding text into application source code
Win32, Mac and OS/2 use resource files
NetWare uses message databases
DOS, VMS, etc. have to store strings in separate
modules, which are linked individually or loaded at run
time
Delivering multiple languages
Multiple language resources linked into executable -good for small programs with limited text
Multiple executables -- e.g. SWEEP for DOS
Multiple resource-only DLLs -- extremely flexible
solution if OS supports DLLs
Multiple text-only message files for text-only operating
systems
Cliff Penton
Sophos Plc
Oxford
England
Tel +44 1235 559933
Email [email protected]
http://www.sophos.com/