presentation source

Download Report

Transcript presentation source

Anti-Virus
Product Development
Cliff Penton
Head of Software Development
Sophos Plc
Slides © 1999 Sophos Plc
http://www.sophos.com/
Who are Sophos?
 Founded in 1980 as an electronic design partnership
 Moved into data security in 1985
 In 1989, among the first to respond to computer viruses
 Anti-Virus is the main focus of the business
 World leading enterprise-wide anti-virus software
 Cover more platforms than any other anti-virus vendor
What do we make?
Conventional product development
WORD 95
OFFICE 97
SR1
SR2
OFFICE
2000
Anti-virus product development...
PRODUCT
FEATURES
OS
DEVELOPMENTS
VIRUS
RESEARCH
NOV
DEC
JAN
FEB
MAR
APR
Anti-virus product development...
Presents simultaneous development challenges:

Complexity

Transparency

Quality

Regularity
Anti-virus product development...
Coping with the complexity
User Interface
Virus Detection
Engine
Virus
Descriptions
Anti-virus product development...
There are many issues, but I will focus on two today...
 Multiple operating systems:
DOS/Windows 3.x, Windows 95/98, Windows NT, OS/2,
NetWare, Macintosh, OpenVMS, Unix...
 Dealing with multiple languages:
English, French, German, Spanish, Japanese...
Multiple operating systems
The key issues in cross-platform development are:

Endianism

Packing and alignment

Multitasking

Memory management

File I/O
Endianism
 Different hardware platforms store numbers in memory
in a different order

Big endian (e.g. SPARC)

Little endian (e.g. Intel)

When exchanging information must be aware of endian
related problems
Endianism
Big endian: 0x01020304
01 02 03 04
Little endian: 0x04030201
Packing and alignment
 Some platforms strictly enforce data alignment when
reading and writing memory
 Careless memory references may lead to disaster
(SIGBUS, or GPF)
 Usually happens when reading structures from a file
with packing set to single byte
 Better to read/write struct elements by assignment
Packing and alignment
How big is this structure?
typedef struct {
long a;
char b;
short c;
} x;
Packing and alignment
How big is this structure?
Compiling with Visual C++ 6.0
typedef struct {
long a;
char b;
short c;
} x;
8 with default
packing
7 with 1 byte
packing
Multitasking
 Different operating systems use different scheduling
schemes
 Cooperative/competitive multitasking
 Preemptive multitasking
 Tight loops and other compute bound operations need
careful tweaking to maintain performance on
competitive multitasking systems
Memory management
 Not all operating systems have virtual memory, so we
cannot rely on malloc() and free()
 Some require explicit virtual memory management,
such as DOS and NetWare
 Need to use an intermediate layer to conditionally
choose between implicit and explicit virtual memory
management
Memory management
Explicit virtual memory management involves:
 Allocating a handle to a memory block
 Locking the handle to get a pointer to physical memory
 Using the memory as usual
 Unlocking the handle, releasing physical memory
 Deallocating the handle when finished
File I/O
 File I/O primitives differ between operating systems
 File security considerations need to be taken into
account
 Standard library calls may not provide the required
functionality
Multiple languages
 Our Windows products ship in five languages:
English, French, German, Spanish, and Japanese

Introduces issues of character encoding:
UNICODE vs. SBCS vs. MBCS

Adds the overhead of translation to the development
process, which can be significant
Internationalisation
 Character sets, alphabets and character encoding
 Code pages
 Dates and times
 Generic coding techniques
 Adding resources for multiple languages
English language
 26 characters plus others < 256
 7 bits == ASCII or 8 bits == ANSI
 1 character == 1 byte
 SBCS or Single Byte Character Set
 Very familiar to anyone who has used strxxx()
functions
European languages

Accented characters are part of many languages
à, ô
French
õ, ¡¿
Spanish
ö, ß
German
 Characters 0-127 are the same (ASCII)

Characters 128-255 are called extended characters
 Still SBCS, but requires code pages...
Code pages
 The extended characters of each language are
supported via code pages.
 The code pages in DOS and Windows are different!
 DOS - English (British) code page 850 (Latin 1)
 DOS - English (US) code page 437 (Latin US)
 Windows Latin 1 (ANSI) code page 1252
Example code page problem
DOS CP 850
Windows CP 1252
Far East languages
Now the fun begins…
 Chinese has more than 10,000 characters
 Japanese has several character types:
Hiragana
phonetic characters
Katakana
phonetic characters, used to spell words
taken from foreign languages
Kanji
characters of Chinese origin
Double byte character encoding
Say hello to DBCS, or Double Byte Character Set, where:
 0x00 -> 0x7F is ASCII as usual
 0x80 ->0xFF is a combination of Kana (single-byte),
and Kanji lead-bytes
 Used on Win95, WinNT, Mac, NetWare, OS/2
Double byte character encoding
Programming for DBCS
 1 character != 1 byte
 If the character is double-byte, both bytes of the
character must be dealt with together
 0x00 is always NUL, so it is safe to scan a string for '\0'
 Trail byte values can be confused with other characters
(e.g. \) if not handled properly
 Never scan with pointer arithmetic (i.e. ptr++)
UNICODE
 Instead of using 1 byte per character, Unicode uses 2
bytes per character
 65536 possible characters in one character set
 No need for code pages
UNICODE
Word breaking
 Sentences in Japanese do not have spaces between
words.
 Sentences can be broken at any Japanese character.
 Break sentences on spaces and lead bytes.
Dates and times
 Date and time representations are not universal
UK
22/05/98
USA
05/22/98
Japan
10Y 05M 22D
 Either use an OS call (e.g. GetDateFormat() on
Windows), or
 Embed a date format string in a language-dependant
resource
Generic coding techniques
 Use the libraries available, e.g. for Win32
_tcsinc() maps to
strinc()
for SBCS
_mcsinc()
for MBCS
_wcsinc()
for UNICODE
 Use TCHAR not char
 Always enclose literal text with the _T() macro
Formatting messages
 Never concatenate two strings to form a sentence
 Take care when using printf(), as language
variations may dictate reordering of insertion objects
 Win32 can use FormatMessage()
 NetWare can use NWprintf() etc.
External resources
 Avoid hard-coding text into application source code
 Win32, Mac and OS/2 use resource files
 NetWare uses message databases
 DOS, VMS, etc. have to store strings in separate
modules, which are linked individually or loaded at run
time
Delivering multiple languages
 Multiple language resources linked into executable -good for small programs with limited text
 Multiple executables -- e.g. SWEEP for DOS
 Multiple resource-only DLLs -- extremely flexible
solution if OS supports DLLs
 Multiple text-only message files for text-only operating
systems
Cliff Penton
Sophos Plc
Oxford
England
Tel +44 1235 559933
Email [email protected]
http://www.sophos.com/