GIST Arch Build Deploy Commercial websites
Download
Report
Transcript GIST Arch Build Deploy Commercial websites
Architecting, Building and Deploying
Successful Commercial Websites
Gist.com case study
Paul Finster – Chief Technology Officer
Dave Ekhaus - Director, Platform Engineering
NYU
Feb 29, 2000
Page 1
Architecting, Building and Deploying
Agenda
Page 2
Part I – Hardware Configurations
Server Farms
Databases
Paradigms
Part II – Software Technology
Architectural elements
Relational database
Java Server Pages (JSP)
Java Beans
Testing & Deploying
Part III - Questions & Answers
Part I
Hardware Configurations
Page 3
The Internet is here!
Informational websites are big today!
Even 1% of Yahoo! traffic is a lot of traffic
Gist.com runs tv.yahoo.com on 6 servers
Like e-Commerce, informational websites are
mission critical applications for those
business and individuals that rely on it
Page 4
Yahoo!
Snap.
MarketWatch
Gist.com
These are enterprise class applications!
Denial of Service attacks proved popular need
What is the Commercial Website landscape?
The scale and dynamic nature of the web
changes everything
Common platforms in use
TV Listings – built in-house
Generic Content Applications
Page 5
XML, WAP, HTTP
XSL, CSS
Custom Applications
Possibly 100’s of thousands of hits per day
Dynamic customized content
Huge peaks during certain times of day
Newsfeeds
What is the right Hardware Architecture?
Two major hardware philosophies/paradigms
Page 6
Many cheap redundant machines
Example: Yahoo.com
100’s of Intel BSD machines with specially
modify Apache web server
Content stored in huge memory caches
Cost Estimate: $2,000 per server
Few expensive highly-reliable machines
Example: IWON.com
12 High-end Sun Solaris web server
Content stored in 2 parallel Oracle
databases running on Sun E10000 servers
Cost Estimate: $20,000-$100,000 per server
Common Hardware Requirements
Co-location at data centers
Hardware
Page 7
Exodus
GlobalCenter
Level3
AboveNet
Load Balancing: Cisco,F5, Radware
Application level switches
Hi-speed virtual networks
Firewalls
Network Monitoring software
Enterprise Storage devices
Typical N-tier Hardware Architecture
Firewall
Load Balancer
Web Server
Web Server
Web Server
Database Server
Page 8
Web Server
Web Server
Database Server
Enterprise
Storage
Legacy
Applications
(if any)
Gist’s Hardware Architecture
Cisco PIX Firewall
NT Web Server
NT Web Server
RadWare: WSD Load Balancer
NT Web Server
SQL Server 7.0
Database Server
NT Web Server
SQL Server 7.0
Database Server
Enterprise
Storage
EMC
Page 9
NT Web Server
ISAPI DLL
Part I I
Software Technology
Page 10
Gist’s Application Framework
Website
Production
TV Listings
GRID
Gist Dev
Tools
Java Beans
Other Applications
Bulletin Boards
Process Step
Custom App
Scripting Tool: JSP
Templates
Cookie-based Server-side Sessions
Services
& APIs
Data
Sources
Page 11
System
Independence
JSP
Adapter
OS
Drivers
Web
Drivers
NT
Solaris
IIS
NSAPI
SQL
Abstraction
Content
Interface
JDBC
Drivers
Oracle
& Sybase
SQL Server
& Informix
Article
Drivers
File
system
Java Code Samples
Page 12
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<%@ page import="gist.external.gistcom.*,gist.*" %>
<jsp:useBean id="adf" scope="session" class="gist.internal.publishing.ADFObject"></jsp:useBean>
<%
UserObject
u = UserObject.getUser (request,response);
if (u.isDefault())
{
response.sendRedirect ("/tv/login.jsp?nexturl=/tv/channels.jsp");
return;
}
String nexturl = request.getParameter("nexturl");
if ( nexturl == null )
nexturl = "/tv/channels.jsp";
%>
<%@ include file="/tv/templates/global.jsp" %>
.
.
Channel channelsDisplay [] = u.getSortedChannels ();
String c_name = null ;
for(int i = 0; i < channelsDisplay.length; i++)
{
if (channelsDisplay[i].isVisible() )
{
chanchecked = " CHECKED ";
}
else
{
chanchecked = "";
}
}
Technology
Architecting
Page 13
Architecture Requirements
Page 14
Scalability - performance, growth
Security - authentication, access control, privacy
Management - monitoring, dynamic configuration
Availability - fault tolerance
Stability - data integrity
Portability - OS, DB, WS independence
Extensibility - ability to adapt to changes in technology
Integration - integration of RDBMS and legacy systems
Scalability
1000’s of “transactions” per minute
Sub-second response time
Database connection pooling
Performance
Concurrency – multithreading
Inherent
Page 15
support in Java
Growth
Load balancing
Want to be able to “Throw” hardware at the
problem
Security
Authentication
Access Control
What is the user permitted to do?
Ordering PPV over the web; credit card numbers
Attributes of our UserObject
Privacy (if required)
Page 16
Identifying the user via Cookies
Architecture supports “cookie-less” mode with
URL re-writing of session parameters
Encryption - RC4, MD5
SSL
URL rewriting
Management
Monitoring
Dynamic Configuration
Page 17
Consistent logging and reporting of system activity
One way: Extensive use of site-wide email
diagnostics
Enterprise Integration – SNMP
Integration with load balancer
Rebooting crashed servers (NT primarily)
Automated
Manual (if all else fails)
Adding/Removing new features on-the-fly
Incremental updating of site content
Incremental Database updates
Availability
100% Availability
Fault Tolerance
Page 18
24x7x365 Operations
Hardware solutions
Backup servers
Software solutions
Dynamic database connections
Stability
Data Integrity
Protection
Isolation of subsystems and application execution
Java sandbox and exception handling
Resource Recovery
Page 19
Support for “transactions”
Redundant databases
Database Connectivity
System Resources – memory
Java memory garbage collection
Portability
OS independence
DB independence
SQL Server, Sybase, Oracle, Informix
WS independence
Page 20
NT
Various flavors of UNIX
LINUX
Netscape Enterprise Server, Microsoft IIS, Apache
Extensibility
Adapt to Changes in Business
How well does the architecture allow you to
change the content or navigation of your
commercial website?
Does the architecture support your current legacy
systems?
Does the architecture provide for Content and/or
Editorial changes?
Adapt to Changes in Technology
How quickly can you leverage new standards?
Page 21
XML
WAP
WDL
Integration
Partner Advertising
Statistical Processing/Analysis
Partner cookies versus Gist.com cookies
URL links back and forth between partners
Billing Partners
Page 22
MarketWave statistics
Navigation controls
Co-branded websites with differing ad serving
ratios
Measuring Page views
What we’ve learned
Prototype ASAP in order to discover
architectural dependencies
Database statements
Test more: Then test again!
Keep objects as light as possible
Never store moving data
Member Age vs. Brithdate
Channels Change: Excluded channels
User Migration is HARD!
Page 23
Be specific in your SQL
Incremental vs. batch
Get training
Technology
Building
Page 24
Development
Prototypes
Work closely with partners to determine
functionality
Prototype Deep, not Wide
Process
Small Development Teams (2-4 people)
Include: Designers, Technical producers
Develop Components in Parallel
Frequent Releases
Page 25
3-6 day development cycles
Scoping - Controlled Feature Set
Technology
Testing & Deploying
Page 26
Testing
Quality
Automated Tools
Repeatability
Consistent and Isolated Environment
Metrics
Page 27
Unit Test - thread safety, code coverage
Smoke Test - quick validation
BVT - build validation test
Full Functional Test
Regression Test - consistent functionality
Load Test - high availability
Benchmark - by Platform
Installation Test - by Platform
Measure real world scenarios
Load test specific subsystems
Deploying
Page 28
Capacity Planning - sizing exercise
Beta Testing - early, well defined subset
Focus Groups – early feedback
Performance - simulating real world load
Benchmarking - critical areas to measure
Maintenance - staging environment,
versioning
Capacity Planning
Sizing Exercise
Page 29
What workloads run at each node?
What hardware is needed to maintain service due
to workload growth?
How many more users can each existing server
support?
How will server utilization be impacted if the
number of transactions increase by n%?
What are those “transactions” doing?
Performance
Simulating Real World Load
“Transactions” per minute
Database access requirements
Legacy connection requirements
Networking requirements
Principles of Algorithms really matter
Page 30
This is a Challenge
Analyze existing system (web server logs)
Forecast activity by looking at competitors
Number of registered users
Performance (continued)
Page 31
Scalability and Fail-over are required for
24x7x365 availability
Determine appropriate hardware architecturemaximum acceptable response time, target
server CPU utilization at 80% (leave room for
growth)
Determine # and type of transactionsreading web pages, executing a query,
updating a database, searching, sorting
Benchmarking “Transactions”
Home Page
Grid Page
Pre-compile pages if possible
User Registration Transactions
Page 32
Remove archive links
Soaps Updates Pages
Many hits, as fast as possible
Article Pages
Many hits, as light as possible
As clear as possible
The answer to many problems is “caching”!
Maintenance
Staging Environment
Versioning
Page 33
Mirrored hardware/software
Separate Database
Migration strategy
Component Version Control
Change Management
Part I I I
Questions & Answers
Page 34
Architecting, Building and Deploying
Successful Commercial Websites
Paul Finster – Chief Technology Officer
[email protected]
Dave Ekhaus - Director, Platform Engineering
[email protected]
Page 35
NYU
Feb 29, 2000