Advanced Scalability and Reliability

Download Report

Transcript Advanced Scalability and Reliability

EM419
MobiLink Advanced
Scalability and Reliability
Reg Domaratzki
Sustaining Engineering
iAnywhere Solutions
[email protected]
Do you use MobiLink yet?
Which version of MobiLink?
Which type of clients?
Adaptive Server Anywhere (ASA) or UltraLite
Which type of consolidated database?
ASA, ASE, MS SQL Server, Oracle8, IBM DB2 UDB
How many remote databases?
Common questions
We are often asked the following:
What is the maximum number of remote
users?
How scalable is MobiLink?
An example of scalability
Goals of this presentation
I will try to convince you that MobiLink:
Scales ideally with increasing remote databases
Makes efficient use of hardware
Has modest hardware requirements
I want you to:
Use MobiLink for large number of remote databases
Get the best performance
Benefits
You can:
Support a large number of remote
databases
Predict performance for a large number
of remote databases from tests with a
small number
Maximize throughput by following
performance tips
Performance of MobiLink
MobiLink overview
What takes time in a MobiLink synchronization?
How performance was measured
Results of performance testing
Optimum number of worker threads
Number of clients
Size of synchronizations
Parallel efficiency
Recommendations and next steps
What is MobiLink?
A two-way synchronization technology for
large scale mobile database
deployment
remote database (mobile, embedded, or workgroup server
database)
consolidated database (enterprise, workgroup, or desktop
database)
A server that processes synchronization
requests from mobile databases
What is MobiLink?
Consolidated
Database
MobiLink
Communication Infrastructure
( Internet / Dial-up / Wireless )
Data
Data
Data
Data
Data
Data
Data
Mobile or Remote Databases
MobiLink design goals
Heterogeneous consolidated database
Scalable and robust
(tens of thousands of remote
databases)
Manageable in large deployments
Support handheld and wireless devices
Flexible
Designed for scalability
Connection pooling
Worker threads
Little or no disk access
Almost no contention in MobiLink
Performance of MobiLink
MobiLink overview
What takes time in a MobiLink synchronization?
How performance was measured
Results of performance testing
Optimum number of worker threads
Number of clients
Size of synchronizations
Parallel efficiency
Recommendations and next steps
What takes time in a
synchronization?
Connections
Upload
Download
Connections
Remote database (client) to MobiLink
Overhead of creating network connection
Client may have to wait for available MobiLink worker
thread.
MobiLink to consolidated database
Worker thread uses database connection from pool
Each database connection is tied to a sync version
Reconnection on error or change in sync version
Tip: # db connections  # versions  # workers
Upload
Consolidated
DB
2. MobiLink to consolidated
MobiLink
1. Client to MobiLink
Remote
Database
Upload: client to MobiLink
Data transfer from client to MobiLink worker
thread
upload size  bandwidth
packing reduces transfer of zero-valued bytes
some client processing with UltraLite clients
worker does character set translation to Unicode
all in memory, unless upload or BLOB cache overflow to disk
Tip: upload cache (-u)  largest upload  # workers
Tip: BLOB cache (-bc)  2  largest BLOB data in a row  # workers
Upload: MobiLink to consolidated
MobiLink worker thread applies upload to
consolidated database
via your upload synchronization scripts
time dictated by consolidated database performance
• simultaneous connections
• concurrency
• size of transactions
• network bandwidth
Download
Consolidated
DB
1. Consolidated to MobiLink
MobiLink
2. MobiLink to client
Remote
Database
Download: consolidated to
MobiLink
MobiLink worker thread fetches data to
be downloaded
via your download synchronization scripts
time dictated by consolidated database performance
MobiLink uses same BLOB cache as for upload
Download: MobiLink to client
Data transferred from MobiLink worker thread to
client
worker does character set translation from Unicode
more client processing than in upload
download size  bandwidth  client processing
MobiLink worker thread waits for client
acknowledgement
This is optional in v8
We’ve found that with very slow clients, that a MobiLink worker
thread would spend a majority of it’s time waiting for an
acknowledgement of the download stream
Scaling up to more clients
More worker threads allow more simultaneous
syncs
Ideally:
total time  single sync time  # clients  #
workers
(assuming # clients  # workers)
Neglects contention and multitasking overhead
In practice, should hit limit where increasing
worker threads does not reduce total time
Potential bottlenecks
Throughput may be limited by:
client processing speed
bandwidth for client-to-MobiLink communications
speed of the computer running MobiLink
number of MobiLink worker threads
bandwidth for communication between MobiLink and the
consolidated database
performance of the consolidated database
contention in your synchronization scripts
Performance of MobiLink
MobiLink overview
What takes time in a MobiLink synchronization?
How performance was measured
Results of performance testing
Optimum number of worker threads
Number of clients
Size of synchronizations
Parallel efficiency
Recommendations and next steps
Performance tests
Determine performance characteristics
of MobiLink
optimal number of worker threads for many clients
differing number of clients
synchronization size
parallelism
Testing methodology
vary one thing at a time
stress MobiLink and/or consolidated database
keep it simple
Schema
Single table
Two-column primary key to avoid primary
key pool
Representative data types
CREATE TABLE Purchase (
emp_id
INT
NOT NULL,
purch_id
INT
NOT NULL,
cust_id
INT
NOT NULL,
cost
NUMERIC
NOT NULL,
order_date TIMESTAMP NOT NULL,
notes
VARCHAR(64),
PRIMARY KEY ( emp_id, purch_id ),
)
Values
Emp_id maps to remote client via
employee table (which is not
synchronized)
Mutually exclusive partitioning of data between clients (to
avoid contention and conflicts)
Large values chosen for integer data
(so packing would not shrink data
transferred)
Each row is 92 bytes when transferred
Timing framework
Extra tables in consolidated database
MobiLink synchronization scripts
Small, efficient client application
Win32 console application
Spawns multiple child processes that act as clients
UltraLite with no file-based persistent storage
Supervisor program
Coordinates clients on different computers
Ensuring simultaneous
synchronizations
Clients kept in step via gates
At a gate, each client waits for all the others
Win32 event objects for clients on one computer
Named pipes to supervisor for multiple computers
Efficient (1 to 2 seconds for 1000 clients on 10 PCs)
Gates before and after each
synchronization
Times recorded between gates and
synchronization on both client and
server
Timing a synchronization
1. Client: prepare for synchronization
 2. Client: wait for all other clients (“gate”)
 3. Client: record client start time
4. Client: start synchronization, via
ULSynchronize()
 5. ML: record start (begin_synchronization
script)
6. Perform synchronization
 7. ML: record end (end_synchronization script)
 8. Client: record client end
 9. Client: wait for all other clients (“gate”)
Times and throughput definitions
Client-measured time (for a single
synchronization):
tclient_end - tclient_start
Server-measured time (for a single
synchronization):
tserver_end - tserver_start
Total server time (for a set of simultaneous
syncs):
max(tserver_end) - min(tserver_start)
Throughput:
total # rows  total server time
Test environment
Sybase SQL Anywhere Studio 7.0.1 and 8.0.0
Isolated test rack
MobiLink and ASA on Dell PowerEdge 6300/550
(4P3-550, 512 MB, database file on array
drive, database log file on separate drive)
Clients on 10 Dell Optiplex GXa 266Mbr
(P2-266, 64 MB)
100 Mbps Fast Ethernet hub (with utilization
gauge)
Performance of MobiLink
MobiLink overview
What takes time in a MobiLink synchronization?
How performance was measured
Results of performance testing
Optimum number of worker threads
Number of clients
Size of synchronizations
Parallel efficiency
Recommendations and next steps
Results of performance testing
Four main tests:
Number of worker threads
Fast clients
Slower Clients
Slowest Clients
Upload Cap
Number of clients
Size of synchronizations
Number of server processors
Test 1-A: Varying worker threads
Constants:
1,000 clients
1,000 rows per client synchronization
(92 bytes per row)
total of 1,000 synchronizations
Varied ML worker threads
2, 4, 5, 10, 20, 50
Throughput vs. worker threads
for fast clients
14000
Throughput (rows/s)
12000
10000
8000
Downloads
Inserts
Deletes
Updates
6000
4000
2000
0
0
10
20
30
MobiLink worker threads
40
50
60
Optimal number of worker threads
Throughput rises then drops with increasing
workers
Two likely causes for drop:
Hardware contention due to CPU or disk access saturation on server
computer
Software contention between connections in the consolidated
database (blocking)
In this case, 100% CPU utilization reached with 5
worker threads
Clients fast enough to saturate ML/ASA (no
difference increasing from 10 to 12 computers
running clients)
Client perspective
0.5% of syncs active at any time with 5
worker threads
Rest are either queued waiting, or
already finished
Client times:
Longest client time  total server time
Average client time  ½ total server time
Maximizing throughput also minimizes
average and longest client sync times
Test 1-B : Varying worker
threads with slower clients
Constants as before, except client
hardware and network
clients now run on 15 P-75 computers
10 Mbps Ethernet hub
Varied ML worker threads
5, 10, 20, 50, 100
Throughput vs. worker threads
for slower clients
10000
9000
Throughput (rows/s)
8000
7000
6000
Downloads
Inserts
Deletes
Updates
5000
4000
3000
2000
1000
0
0
20
40
60
MobiLink worker threads
80
100
120
Effects of slower clients
All types of synchronization slowed
Downloads depend more on client speed
than uploads
With 5 MobiLink worker threads, downloads slowed by
46%, deletes slowed by 18%, updates and inserts
slowed by 10%
Adding worker threads reduces shortfall
Uploads best with 10, download best with 50
High variability for downloads
 25-30% instead of usual  2%
Timings vs. worker threads
for slower clients
You may not want to optimize for download
add ~400 s to upload to save 20 s in download!
Action
Time with 10 Time with 50 Difference % difference
download
128
109
-19
-15%
insert
490
836
346
71%
delete
649
1061
412
64%
update
1074
1589
515
48%
Test 1-C : Simulating very slow
clients
Wanted to simulate 1000 Palm devices on
wireless WAN network
Actual timings with Palm IIIx connected at 4800 baud
Single Win32 client slowed to match or exceed Palm
timings (using special UL runtime with optional delays)
Use same delays for 1000 Win32 clients to simulate 1000
Palm devices connecting at 4800 baud
Varying worker threads with very
slow clients
Constants:
1,000 clients (delayed to match Palm timings)
1,000 rows per client synchronization
(92 bytes per row)
total of 1,000 synchronizations
Varied ML worker threads
5, 10, 20, 50, 100, 200, 500
Throughput vs. worker threads
for very slow clients
1600
Throughput (rows/s)
1400
1200
1000
Downloads
Inserts
Deletes
Updates
800
600
400
200
0
0
100
200
300
MobiLink worker threads
400
500
600
Optimal number of worker threads
for very slow clients
Download improves almost linearly
Long times to apply downloads are overlapped more with
more workers
Uploads best at 100 or 200 worker
threads
Optimal # of workers very different for
upload and download!
Upload cap
Limits number of worker threads that can
apply uploads simultaneously
Referred to as “uploaders”
Other worker threads can still download
or receive upload
Allows independent optimization of
worker threads for upload and
download throughput
Test 1-D : Varying uploaders with
very slow clients
Constants:
1,000 clients (delayed to match Palm timings)
1,000 rows per client synchronization
(92 bytes per row)
total of 1,000 synchronizations
500 ML worker threads
Varied ML upload cap
2, 5, 10, 20, 50, 100
Upload throughput vs. uploaders
for very slow clients
1400
Throughput (rows/s)
1200
Inserts
Deletes
Updates
1000
800
600
400
200
0
0
20
40
60
80
100
Max number of simultaneously uploading worker threads (out of 500)
120
Test 1-E : Varying worker threads
with upload cap and very slow
clients
Constants:
1,000 clients (delayed to match Palm timings)
1,000 rows per client synchronization
(92 bytes per row)
total of 1,000 synchronizations
5 for upload cap (i.e. 5 uploaders)
Varied ML worker threads
50, 100, 200, 334, 500
Throughput vs. worker threads
for upload cap and very slow
clients
1600
Throughput (rows/s)
1400
1200
Downloads
Inserts
Deletes
Updates
1000
800
600
400
200
0
0
100
200
300
400
MobiLink worker threads (with upload cap of 5)
500
600
Upload cap improves upload
throughput with very slow
clients
Upload type
Inserts
Deletes
Updates
No cap
589
551
309
With cap % Difference
1419
141%
1118
103%
616
99%
Optimum number of worker
threads and uploaders
Best throughput with relatively small number of
worker threads for upload
For fast clients, small number of worker threads (5 is best here)
For slower clients, need more worker threads and upload cap
• Higher number of total worker threads for slower clients
• Small upload cap to maximize upload throughput (depends on
consolidated speed, around 5 best here)
May need more worker threads and uploaders when ML and
consolidated on different computers
Tip: Add workers or uploaders until server
saturated or contention in the consolidated
limits throughput
Test 2: Varying number of clients
Constants:
5 MobiLink worker threads
1,000 rows per client synchronization
(92 bytes per row)
# clients  # synchronizations per client
total of 1,000 synchronizations
Varied:
number of clients (20, 50, 100, 200, 500, 1000)
number of synchronizations per client adjusted to fix total
number of synchronizations
Constant amount of data
1000 Clients:
Total server time
• Each client synchronizes once
• One set of simultaneous synchronizations
500 Clients:
Total server time
• Each client synchronizes twice
• Two sets of simultaneous synchronizations
Throughput vs. number of clients
16000
Throughput (rows/s)
14000
12000
10000
Downloads
Inserts
Deletes
Updates
8000
6000
4000
2000
0
0
200
400
600
Clients
800
1000
1200
Client scalability
total time  single sync time  # clients  # workers
Action
1000 delete
clients download
insert
×1
update
sync
delete
20
clients download
insert
× 50
syncs update
delete
20
clients download
insert
×1
update
sync
Sum of
Estimated
server times total time
2732
546
371
74
2192
438
4637
927
2843
569
422
84
2053
411
4893
979
57
569
9
90
40
396
98
981
Actual time Difference
570
4%
77
3%
450
3%
1044
13%
570
0%
77
-9%
450
10%
1044
7%
570
0%
77
-15%
450
14%
1044
6%
Client scalability
MobiLink scales linearly with additional
clients
Tests with a small number of clients can
effectively predict performance with a
much larger number of clients
Test 3:
Varying size of synchronizations
Constants:
200 clients
5 MobiLink worker threads
# rows  # synchronizations per client
total of 1,000 synchronizations
Varied:
number of rows per sync (100, 200, 500, 1000, 2500,
5000)
number of synchronizations per client adjusted to fix total
number of rows synchronized
Effect of synchronization size
16000
Throughput (rows/s)
14000
12000
Downloads
Inserts
Deletes
Updates
10000
8000
6000
4000
2000
0
0
1000
2000
3000
4000
Rows per synchronization
5000
6000
Effect of synchronization size
Smallest synchronizations slowest
Levels out with larger synchronizations
Greatest effect on downloads; almost no
effect on updates
Suggests per-synchronization overhead
MobiLink has some per-synchronization
overhead
Timing framework adds more
Effect of synchronization size
Above 2500 rows, some performance
drop
Download reduced 12%
Uploads reduced around 2%
Why?
Increased contention in consolidated database
(0.5 MB per sync with 5000 rows)
Disk access becoming bottleneck
(lower CPU utilization observed with 5000 rows)
Synchronization size
Some per-synchronization overhead, so
throughput lower for smaller-sized
synchronizations
Throughput might be reduced with larger
synchronizations, particularly for
downloads
Likely to depend on consolidated database
Test 4:
Varying number of server CPUs
Constants:
200 clients
5 MobiLink worker threads
5 synchronizations per client
total of 1,000 synchronizations
1,000 rows per synchronization (92 bytes each)
Varied:
CPUs in use on ML/ASA server (1, 2, 3, 4)
Parallel scalability
16000
Throughput (rows/s)
14000
12000
Downloads
Inserts
Deletes
Updates
10000
8000
6000
4000
2000
0
0
1
2
3
Number of server processors
4
5
Parallel scalability (uploads only)
2500
Throughput (rows/s)
2000
Inserts
Deletes
Updates
1500
1000
500
0
0
1
2
3
Number of server processors
4
5
Parallel speedups
CPUs
Downloads
Inserts
Deletes
Updates
1
1.00
1.00
1.00
1.00
2
1.59
1.49
1.50
1.50
3
2.11
1.82
1.90
1.95
4
2.34
1.93
2.01
2.05
Parallel efficiency
Improved performance with additional
processors
Speedups less than ideal, especially going from 3
to 4
Best speedup for downloads, least for inserts
Note: Results are for MobiLink and ASA on same
server computer, not for MobiLink on its own
Contention is much more likely in consolidated database than in
MobiLink
Hardware requirements
CPU utilization usually 98% to 100%, except for:
Slower clients (especially for downloads)
Downloads with few clients
Few worker threads
Points where consolidated database was committing data to disk
(checkpoints)
MobiLink fluctuates from 25 to 35%, ASA 65 to
75%
MobiLink needs less processing power than consolidated database
(less than ½ with ASA).
Performance of MobiLink
MobiLink overview
What takes time in a MobiLink synchronization?
How performance was measured
Results of performance testing
Optimum number of worker threads
Number of clients
Size of synchronizations
Parallel efficiency
Recommendations and next steps
Recommendations
Summary of performance tips
Applicability to your situation
Large deployments
Summary of performance tips
Avoid contention in your scripts
Use smallest number of worker threads
that gives you optimum throughput
Set connection pool size if using multiple
versions
Set upload and BLOB cache sizes to avoid
disk access
Dedicate enough processing power to
MobiLink so that it can saturate your
consolidated database
Applicability
YMMV (your mileage may vary)
You should test with your
schema
data
consolidated database
synchronization scripts
clients
Test with relatively small number of
clients
Suggested test procedure
1. Determine your synchronization needs
2. Set up a pilot implementation with a few
clients
(e.g. 20 clients and test with 5, 10, 20 worker
threads)
3. Could run ML on same server as consolidated
4. Enable minimal verbose logging (-v)
5. Perform test synchronizations
6. From times in log, estimate times for more
clients
total server time, maximum and average client times
If you want to use our timing
framework
To use it as-is, or modify it yourself, we
intend to make it available through
Sybase Developer Network:
http://www.sybase.com/developer/mobile
For help in adapting it to your needs, or
for help in setting up efficient
deployments with MobiLink, email our
Solution Services group at:
[email protected]
Large deployments
A single MobiLink server can handle tens
of thousand or hundreds of thousand
remote databases
These tests equivalent to 80,000 to 1.4 million per day
For higher scalability or availability, can
use multiple MobiLink servers with
single consolidated database
Use load balancer to make them appear as one, and provide
failover and load balancing
Can also use multiple MobiLink servers in
a synchronization hierarchy
Large deployment with load
balancing and failover
WAN/
Internet
Router
Load Balancer
MobiLink
Server
MobiLink
Server
Load Balancer
MobiLink
Server
MobiLink
Server
Consolidated
DB
Other consolidated types?
Have done some testing with Oracle (8
and 8i), ASE (11.9.2 and 12), and MS
SQL Server (7 and 2000).
All require considerable tuning
MobiLink scalability unchanged
Similar download throughput to ASA
Slower uploads, best with fewer worker threads and
uploaders
• Upload performance better for non-ASA
consolidateds in Vail; Oracle on par with ASA
MobiLink is scalable!
These tests show that MobiLink:
Scales ideally with increasing remote databases
Makes efficient use of hardware
Has modest hardware requirements
I want you to:
Use MobiLink for large number of remote databases
Get the best performance
Benefits
You can:
Support a large number of remote
databases
Predict performance for a large number
of remote databases from tests with a
small number
Maximize throughput by following
performance tips
Q&A
White papers of interest at
http://my.sybase.com/detail?id=XXX
1009664 – MobiLink Performance
1011880 – Recommended ODBC Drivers for MobiLink
1010502 – Synchronizing with Oracle and ASA
1012973 – Using Different Script Versions in MobiLink
1009621 – MobiLink Transport Layer Security
1013181 – Mobilink and TCP/IP Keep-Alive
1012332 – Details on how the non-DMC Conduit Works
1002288 – ASA Supported Platforms and Support Status
Q&A
Award winning newsgroup support at
forums.sybase.com
sybase.public.sqlanywhere.mobilink
sybase.public.sqlanywhere.ultralite
sybase.public.sqlanywhere.general
iAnywhere Solutions Highlights
• Ask the Experts - about Mobile & Wireless Solutions
-Mezzanine Level Room 15B
Mon./Tues. 11:30 am - 3:30 pm; Wed. 11:30 - 1:30;
Thurs. 9 am - 12 noon
-Exhibit Hall - Demo Center (truck) exhibit hall hours
• SIG (Special Interest Group)
- Tuesday 5:30pm Mobile & Wireless SDCC, Upper level, Room 11
• Keynote - Enabling m-Business Solutions
Wednesday 1:30 pm - 3:00 pm
• iAnywhere Solutions Developer Community
-Excellent resource for commonly asked questions, newsgroups, bug
fixes, newsletters, event listings - visit www.ianywhere.com/developer
Q&A
Scalable fast-food dining available at:
http://www.webersrestaurants.com/