GIST Arch Build Deploy Commercial websites

Download Report

Transcript GIST Arch Build Deploy Commercial websites

Architecting, Building and Deploying
Successful Commercial Websites
Gist.com case study
Paul Finster – Chief Technology Officer
Dave Ekhaus - Director, Platform Engineering
NYU
Feb 29, 2000
Page 1
Architecting, Building and Deploying

Agenda



Page 2
Part I – Hardware Configurations
 Server Farms
 Databases
 Paradigms
Part II – Software Technology
 Architectural elements
 Relational database
 Java Server Pages (JSP)
 Java Beans
 Testing & Deploying
Part III - Questions & Answers
Part I
Hardware Configurations
Page 3
The Internet is here!

Informational websites are big today!





Even 1% of Yahoo! traffic is a lot of traffic


Gist.com runs tv.yahoo.com on 6 servers
Like e-Commerce, informational websites are
mission critical applications for those
business and individuals that rely on it


Page 4
Yahoo!
Snap.
MarketWatch
Gist.com
These are enterprise class applications!
Denial of Service attacks proved popular need
What is the Commercial Website landscape?

The scale and dynamic nature of the web
changes everything




Common platforms in use



TV Listings – built in-house
Generic Content Applications

Page 5
XML, WAP, HTTP
XSL, CSS
Custom Applications


Possibly 100’s of thousands of hits per day
Dynamic customized content
Huge peaks during certain times of day
Newsfeeds
What is the right Hardware Architecture?

Two major hardware philosophies/paradigms


Page 6
Many cheap redundant machines
 Example: Yahoo.com
 100’s of Intel BSD machines with specially
modify Apache web server
 Content stored in huge memory caches
 Cost Estimate: $2,000 per server
Few expensive highly-reliable machines
 Example: IWON.com
 12 High-end Sun Solaris web server
 Content stored in 2 parallel Oracle
databases running on Sun E10000 servers
 Cost Estimate: $20,000-$100,000 per server
Common Hardware Requirements

Co-location at data centers





Hardware






Page 7
Exodus
GlobalCenter
Level3
AboveNet
Load Balancing: Cisco,F5, Radware
Application level switches
Hi-speed virtual networks
Firewalls
Network Monitoring software
Enterprise Storage devices
Typical N-tier Hardware Architecture
Firewall
Load Balancer
Web Server
Web Server
Web Server
Database Server
Page 8
Web Server
Web Server
Database Server
Enterprise
Storage
Legacy
Applications
(if any)
Gist’s Hardware Architecture
Cisco PIX Firewall
NT Web Server
NT Web Server
RadWare: WSD Load Balancer
NT Web Server
SQL Server 7.0
Database Server
NT Web Server
SQL Server 7.0
Database Server
Enterprise
Storage
EMC
Page 9
NT Web Server
ISAPI DLL
Part I I
Software Technology
Page 10
Gist’s Application Framework
Website
Production
TV Listings
GRID
Gist Dev
Tools
Java Beans
Other Applications
Bulletin Boards
Process Step
Custom App
Scripting Tool: JSP
Templates
Cookie-based Server-side Sessions
Services
& APIs
Data
Sources
Page 11
System
Independence
JSP
Adapter
OS
Drivers
Web
Drivers
NT
Solaris
IIS
NSAPI
SQL
Abstraction
Content
Interface
JDBC
Drivers
Oracle
& Sybase
SQL Server
& Informix
Article
Drivers
File
system
Java Code Samples
Page 12
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<%@ page import="gist.external.gistcom.*,gist.*" %>
<jsp:useBean id="adf" scope="session" class="gist.internal.publishing.ADFObject"></jsp:useBean>
<%
UserObject
u = UserObject.getUser (request,response);
if (u.isDefault())
{
response.sendRedirect ("/tv/login.jsp?nexturl=/tv/channels.jsp");
return;
}
String nexturl = request.getParameter("nexturl");
if ( nexturl == null )
nexturl = "/tv/channels.jsp";
%>
<%@ include file="/tv/templates/global.jsp" %>
.
.
Channel channelsDisplay [] = u.getSortedChannels ();
String c_name = null ;
for(int i = 0; i < channelsDisplay.length; i++)
{
if (channelsDisplay[i].isVisible() )
{
chanchecked = " CHECKED ";
}
else
{
chanchecked = "";
}
}
Technology
Architecting
Page 13
Architecture Requirements








Page 14
Scalability - performance, growth
Security - authentication, access control, privacy
Management - monitoring, dynamic configuration
Availability - fault tolerance
Stability - data integrity
Portability - OS, DB, WS independence
Extensibility - ability to adapt to changes in technology
Integration - integration of RDBMS and legacy systems
Scalability

1000’s of “transactions” per minute

Sub-second response time


Database connection pooling
Performance

Concurrency – multithreading
 Inherent

Page 15
support in Java
Growth

Load balancing

Want to be able to “Throw” hardware at the
problem
Security

Authentication



Access Control




What is the user permitted to do?
Ordering PPV over the web; credit card numbers
Attributes of our UserObject
Privacy (if required)



Page 16
Identifying the user via Cookies
Architecture supports “cookie-less” mode with
URL re-writing of session parameters
Encryption - RC4, MD5
SSL
URL rewriting
Management

Monitoring




Dynamic Configuration


Page 17
Consistent logging and reporting of system activity
 One way: Extensive use of site-wide email
diagnostics
Enterprise Integration – SNMP
 Integration with load balancer
Rebooting crashed servers (NT primarily)
 Automated
 Manual (if all else fails)

Adding/Removing new features on-the-fly
Incremental updating of site content
Incremental Database updates
Availability

100% Availability


Fault Tolerance


Page 18
24x7x365 Operations
Hardware solutions
 Backup servers
Software solutions
 Dynamic database connections
Stability

Data Integrity



Protection


Isolation of subsystems and application execution
 Java sandbox and exception handling
Resource Recovery


Page 19
Support for “transactions”
Redundant databases
Database Connectivity
System Resources – memory
 Java memory garbage collection
Portability

OS independence




DB independence


SQL Server, Sybase, Oracle, Informix
WS independence

Page 20
NT
Various flavors of UNIX
LINUX
Netscape Enterprise Server, Microsoft IIS, Apache
Extensibility

Adapt to Changes in Business




How well does the architecture allow you to
change the content or navigation of your
commercial website?
Does the architecture support your current legacy
systems?
Does the architecture provide for Content and/or
Editorial changes?
Adapt to Changes in Technology

How quickly can you leverage new standards?



Page 21
XML
WAP
WDL
Integration

Partner Advertising


Statistical Processing/Analysis



Partner cookies versus Gist.com cookies
URL links back and forth between partners
Billing Partners

Page 22
MarketWave statistics
Navigation controls


Co-branded websites with differing ad serving
ratios
Measuring Page views
What we’ve learned


Prototype ASAP in order to discover
architectural dependencies
Database statements




Test more: Then test again!
Keep objects as light as possible
Never store moving data




Member Age vs. Brithdate
Channels Change: Excluded channels
User Migration is HARD!

Page 23
Be specific in your SQL
Incremental vs. batch
Get training
Technology
Building
Page 24
Development

Prototypes



Work closely with partners to determine
functionality
Prototype Deep, not Wide
Process



Small Development Teams (2-4 people)
 Include: Designers, Technical producers
Develop Components in Parallel
Frequent Releases


Page 25
3-6 day development cycles
Scoping - Controlled Feature Set
Technology
Testing & Deploying
Page 26
Testing

Quality










Automated Tools
Repeatability


Consistent and Isolated Environment
Metrics


Page 27
Unit Test - thread safety, code coverage
Smoke Test - quick validation
BVT - build validation test
Full Functional Test
Regression Test - consistent functionality
Load Test - high availability
Benchmark - by Platform
Installation Test - by Platform
Measure real world scenarios
Load test specific subsystems
Deploying






Page 28
Capacity Planning - sizing exercise
Beta Testing - early, well defined subset
Focus Groups – early feedback
Performance - simulating real world load
Benchmarking - critical areas to measure
Maintenance - staging environment,
versioning
Capacity Planning

Sizing Exercise





Page 29
What workloads run at each node?
What hardware is needed to maintain service due
to workload growth?
How many more users can each existing server
support?
How will server utilization be impacted if the
number of transactions increase by n%?
What are those “transactions” doing?
Performance

Simulating Real World Load


“Transactions” per minute
Database access requirements
Legacy connection requirements
Networking requirements

Principles of Algorithms really matter



Page 30
This is a Challenge
 Analyze existing system (web server logs)
 Forecast activity by looking at competitors
 Number of registered users
Performance (continued)
Page 31

Scalability and Fail-over are required for
24x7x365 availability

Determine appropriate hardware architecturemaximum acceptable response time, target
server CPU utilization at 80% (leave room for
growth)

Determine # and type of transactionsreading web pages, executing a query,
updating a database, searching, sorting
Benchmarking “Transactions”

Home Page


Grid Page


Pre-compile pages if possible
User Registration Transactions

Page 32
Remove archive links
Soaps Updates Pages


Many hits, as fast as possible
Article Pages


Many hits, as light as possible
As clear as possible
The answer to many problems is “caching”!
Maintenance

Staging Environment




Versioning


Page 33
Mirrored hardware/software
Separate Database
Migration strategy
Component Version Control
Change Management
Part I I I
Questions & Answers
Page 34
Architecting, Building and Deploying
Successful Commercial Websites
Paul Finster – Chief Technology Officer
[email protected]
Dave Ekhaus - Director, Platform Engineering
[email protected]
Page 35
NYU
Feb 29, 2000