who is server?

Download Report

Transcript who is server?

Distributed Data
Mining System in
Java
Group Member
王春笙,林俊甫,王慧芬
Overview of Project
• Project participants
– 王春笙,林俊甫,王慧芬
Project Programming Tasks
• D92725002 林俊甫
–
–
–
–
–
–
Polling and reply Multicast between client and server
Client/Server Socket programming
Client dynamic join and leave mechanism
Multi-thread programming
Synchronization mechanism
Data chunks maintenance and dispatching
mechanism
– Client/Server communication link control
Project Programming
Tasks(cont’d)
– Client failure handling
• Reassign backup server, if failure client is backup
• Restore failure client works (with 王春笙)
– Server failure handling
• Backup Server designate mechanism and logic design
– RMI mechanism (with 王春笙)
– Basic GUI
System Infrastructure
• System diagram
Client
Client
Client
...
LAN
Mining data chunk
Mining result
Server/Coordinator
Basic Operation
Time
Time
Server
Listen multicast
Group query and
reply
Fork thread to
Handle client
connection
1. Polling on port 4444 Group 230.0.0.1
@: who is server?
2. Servername: I am the server
Server found;
Connect to the
Server
3. Connect to <servername, port 4445>
4. Client do: filechunk#
Wait for client’s
Processed result,
Order client to get
Another file chunk
Client
5. ok
6. Client do: next filechunk#
7…..
8…..
….
Receive server’s
Instruction, ivoke
RMI to get file
chunk
Port Assignment
• Port 4444: for multicast
• Port 4445: for TCP/IP socket connection
• Port 4446: for RMI services
Finding A Server
• Once a client start up, it
will query periodically
1. Client Query: who
2. Listen for
every 3 sec. over the
is the Server now?
server response
multicast group 230.0.0.1
port 4444 by sending 1
byte string “@” to locating
6. Server failure
the server host.
3.Connect to
detect -> if I am backup
Server on port
go to backup server
• Once a server start up, it procedure,
4445
otherwise
go to step.1.
will fork a thread to
4. Use RMI Get file
chunk from
dealing with the query
Server
5. Process data mining
and return
result to server
File Dispatching
• Server maintain a file chunk pool .
FileChunks
…………
-1: empty, 0: available, 1: using, 2:used
• Server will find a available file chunk for client, set it to 1
and order client to get this file chunk by RMI file chunk
will be update to 2 when client return result.
• Recovery: When server detects client’s link-broken, it will
restore file chunk allocate to client to 0.
• File chunk class is declared as Serializable for RMI
message passing to backup server
• File chunk class use Synchronization for concurrent
control
Backup Server Selection
• Server maintains and assigns unique id for
each individual client.
• Unique id is incremented as serial number.
• Client with smallest id is assigned as
backup server
• When client failure, server will check if it is
the backup server to restart the selection
process or not.
Nodes Maintenance
• Server maintain connected client’s records in an
ArrayList
• ArrayList is compound with class Nodes, which records
client’s detail information.
ArrayList: ht
Key
Nodes
Value
Id
Address
Port
Work on
Status
RMI Services
• RMI services is written in independent
program because server and client (which
acts as backup server) will use it.
• RMI services provides:
– Backup server data to backup-server.
– Get file chunk from server
– Return mining result to server
– Receive nodes information from server
Client Failure
• Server’s action took:
– Recovery
– Reassignment
– Redo backup server selection if failure nodes
is backup
• Client’s action
– Do nothing except one is told by server to act
as backup
Server Failure
Time Server S
Client A Time
1.A is told by S that
It is the backup
A invoke RMI to
get all Server data
A: Do backup
RMI Get file
Server run backup
Selection choose A
As backup
RMI reply
Client do #
2. A periodically
Get server services,
File chunk data
Client do #
do reply
3. Comm.link broken
Is detected, start
ServerAction class
X
4. Create server
Socket at 4445,
fork thread
To listen to query
And wait for
connection
do reply
Server Crash
X
Time Client B
1. B receives
instruction as
discuss before
2. Comm.Link
Broken is
detected,
multicast query
who is the server
now?
B Polling @: who is server?
A reply: I am the server
Connect to A:4445
3. B know A is
the backup, reconnect to A
Server/Client Life Cycle
Server
Client
evolve
Normal/Abnormal
Termination
Server
Normal/Abnormal
Termination
Project Programming Tasks
• D91725001 王春笙
– Web log file preprocessing and separating
– Web pages traversal sequences parsing
– Page items transferring and mapping
– Web pages sequential patterns mining
– Mining results maintenance
– RMI mining results transfer
– Mining results lookup and display
Project Programming
Tasks(cont’d)
– Backup mechanism
• Separate thread backup server files and memory data
• Restore failure client works (with 林俊甫)
– RMI mechanism (with 林俊甫)
– GUI global states refreshment
– System integration
• Testing and debugging
Web Log File Format
•
•
•
•
User IP
Date
Time
Web pages URL
Web File Preprocessing
•
•
•
•
Select *.htm and *.html pages
First sort by user ID
Second sort by time
Pages sequences separated by time
– more than 30 seconds
Chunk Data Files
• Part*.ppp
6023 2 1 1 2 8
6024 1 1 206
6025 7 1 1 1 1 1 1 1 2 5 17 18 19 20 11
6026 3 1 1 1 144 145 338
6027 2 1 1 2 9
6028 3 1 1 1 2 8 3
• Items.ppp
/~visualdep/htm/p5b.htm 168
/~businessdep/student/picture.html 169
/~comedu/inde.htm 170
/~account/91tuition.htm 171
/~stuaffair/life/procedure-17.htm 172
/~stuaffair/life/procedure-25.htm 173
Apriori algorithm
•
•
•
•
•
•
•
1:find all L1
2:generate C2 from L1
3:count C2 and find all L2
4:k=3
5:generate & prune Ck from Lk-1
6:count Ck and find all Lk
7:if Lk not empty then k++, goto 5
Apriori algorithm (cont’d)
• join phase:s1 join s2 if s1(drop first) =
s2(drop last)
s1  {a, b}, s2  {b, a}
– s1 join s2 => {a, b, a}
• prune phase:delete a k candidate if any k1 sub sequence not large
• C & L are stored in hash data structure
Mining Result Display
• Client frequent patterns
– Web page ID
– Support
– Saved as *.pppl files
• Client frequent patterns
– Web page ID
– Support
– Web page name
Backup Mechanism
• When backup server selected, that client
start a backup thread
• Backup thread loop every 0.5 second
• RMI data transfer
– Chunk data file(part*.ppp,items.ppp)
– Client information
– File chunk information
• determine MaxID and set “in use” to “available”
– Frequent patterns information
System Integration
• Java class integration
– Server component
– Client component
– Data mining component
– GUI component
• Testing
• Debugging
Project Programming Tasks
• D92725001 王慧芬
– Graphical User Interface
• Since this is a system working on data mining task
in a distributed way, its GUI provides four panels:
–
–
–
–
A system console
A result window
A connection table
A graphical network configuration
GUI
• The system console shows how system
proceeds
GUI (cont’d)
• The result window displays the progress
and results of data mining
GUI (cont’d)
• A connection table lists all of the on-line
client connection information
GUI (cont’d)
• A connection table consists of 5 fields
– NO:client-server connection id
– IP address:client’s IP address
– Port:client’s port number
– Status:connection status, it could be
•
•
•
•
•
0: offline
1: online
2: file transfer from server to client
3: client is doing data mining
4: client returns value back to server if data mining finished
5: client is doing the backup and data mining at the same time
– # chunk works on:if data mining and backup, it
indicates the chuck number that the connection
works on
GUI (cont’d)
• A graphical network configuration follows the
connection table to depict the dynamic
network configuration
GUI (cont’d)
• In the dynamic network configuration, we use
different client GIFs to express the status:
– Offline
– Data mining
– Backup and mining
On-line
GUI interface
• mw.showMsg()
– provided by GUI for server/client module to show the
console message
• mw.showResultString()
– provided by GUI for server/client module to show the
results of data mining
• Connection table
– modified by server/client module for connection
information
– read by GUI every 0.01 second to depict the dynamic
network configuration
GUI design
• Java swing is used to generate label, text,
scrollbar, and table, etc..
• Java AWT 2D painting is used to form the
animation of the connection lines in the
dynamic configuration panel
• ‘Photo Impact’ and ‘GIF animator’ are used
to generate the node icons
• EasyRGB used to tune the color
harmonies.
GUI design (cont’d)
• A new thread is forked from the GUI task to work on the
animation of the connection lines in the dynamic
configuration panel,
GUI
– to read the table
every 0.03 second and
to show the connection
status with a moving
ball.
Generate
system
console
Generate
result panel
Generate
connection
table
Generate
connection
table
animation
Installation
• 以執行一個 server,兩個client為例
– 建立三個資料夾,此三資料夾Ser(Server),Cli(Client1),Cli2(Client
2)
– 將附檔解壓至Ser資料夾,此資料夾內要下載weblog10.zip檔,並
解壓
– 將附檔解壓至 Cli 與Cli2的空資料夾
– 開啟二個dos視窗(1,2號視窗),進入Ser資料夾
– 開啟三個dos視窗(3,4,5號視窗),3,4號進入Cli資料夾,5號進入
Cli2資料夾
– 1號視窗執行 compile.bat 批次檔,再執行 rmi.bat
– 2號視窗執行 server.bat 批次檔
– 3號視窗執行 compile.bat 批次檔,再執行 rmi.bat
– 4號視窗執行 client.bat批次檔
– 5號視窗執行 compile.bat批次檔,再執行 client.bat批次檔