Application Architecture for the rest of us
Download
Report
Transcript Application Architecture for the rest of us
APPLICATION ARCHITECTURE
FOR THE REST OF US
Presented by
M N Islam Shihan
Introduction
Target Audience
What is Architecture?
Architecture
is the foundation of your application
Applications are not like Sky Scrappers
Enterprise Vs Personal Architecture
Why look ahead in Architecture?
Adaptability
with Growth
Maintainability
Requirements never ends
Enterprise Architecture (cont…)
Security
Responsiveness
Extendibility
Availability
Load Management
Distributed Computation
Caching
Scalability
Security
Security (cont…)
Think about Security first of all
Network Security: Implement Firewall &
Reverse Proxy for your network
SQL Injection: Never forget to escape
field values in your queries
XSS (Cross Site Scripting): Never trust user provided (or
grabbed from third party data sources) data and display
without sanitizing/escaping
CSRF (Cross Site Request Forgery): Never let your forms to be
submitted from third party sites
Security (cont…)
DDOS (Distributed Daniel of Services): Enable real time
monitoring of access to detect and prevent DDOS attacks
Session fixation: Implement session key regeneration for
every request
Always hash your security tokens/cookies with new random
salts per request/session basis (or in an interval)
Stay tuned and up-to-date with security news and releases
of all of your used tools and technologies
Responsiveness
Responsiveness (cont…)
Web applications should be as responsive as Desktop
Applications
Plan well and apply good use of JavaScript to achieve
Responsiveness
Detect browsers and provide separate response/interface
depending on detected browser type
Implement unobtrusive use of JavaScript
Implement optimal use of Ajax
Use Comet Programming instead of Polling
Implement deferred/asynchronous processing of large
computations using Job Queue
Extendibility
Implement and use robust data access interface, so
that they can be exposed easily via web services
(like REST, SOAP, JSONP)
Use architectural patterns & best practices
SOA
(Service Oriented Architecture)
MVC (Model View Controller)
Modular architecture with plug-ability
Allow hooks and overrides through Events
Availability
Availability (cont…)
Implement well planned Disaster Recovery policy
Use version control for your sources
Use RAID for your storage devices
Keep hot standby fallback for each of your primary
data/content servers
Perform periodical backup of your source repository, files &
data
Implement periodical archiving of your old data
Provide mechanism to the users to switch between current and
archived data when possible
Load Management
Load Management (cont…)
Monitor and Benchmark your servers periodically and find
pick usage time
Optimize to support at least 150% of pick time load
Use web servers with high I/O performance
Introduce load balancer to distribute loads among multiple
application Servers
Start with software (aka. reverse proxy) then grow to use
hardware load balancer only if necessary
Use CDNs to serve your static contents
Use public CDNs to serve the open source JavaScript or CSS
files when possible
Caching
To Cache Or Not to Cache?
Analyze the nature of content and response generated by your
application very well
What to cache?
Analyze and set proper expiry time
Invalidate cache whenever content changes
Partial caching will also bring you speed
When caching is bad?
Understand various types of web caches
Browser cache
Proxy cache
Gateway cache
Caching (cont…)
Implement server side caching
Runtime in-memory cache
Per request: Global variables
Shared: Memcached
Persistent Cache
Per Server: File based, APC
Shared: Db based, Redis
Optimizers and accelerators: eAccelerator, XCache
Reverse proxy/gateway cache
Varnish cache
Distributed Computing
Scalability
What the heck is this?
Scalability is the soul of enterprise architecture
Scalability pyramid
Scalability (cont…)
Vertical Scalability (scaling up)
Scalability (cont…)
Horizontal Scalability (scaling out)
Scalability (cont…)
Scalability
Scaling up (vertical) vs. Scaling out (horizontal)
Scalability
Database Scalability
Vertical:
In
Add resource to server as needed
most cases produce single point of failure
Horizontal:
Distribute/replicate data among multiple
servers
Cloud Services: Store your data to third party data
centers and pay with respect to your usage
Scalability (cont…)
Scaling Database
Scaling options
Master/Slave
Cluster Computing
Large tables are split among partitions
Federated Tables
Single storage with multiple server node
Table Partitioning
Master for Write, Slaves for Read
Tables are shared among multiple servers
Distributed Key Value Stores
Distributed Object DB
Database Sharding
Scalability (cont…)
Database Sharding
Smaller databases are
easier to manage
Smaller databases are
faster
Database sharding can
reduce costs
Need one or multiple well
define shard functions
"Don't do it, if you don't
need to!" (37signals.com)
"Shard early and often!"
(startuplessonslearned.blo
gspot.com)
Scalability (cont…)
Database Sharding
When appropriate?
High-transaction database applications
Mixed workload database usage
Frequent reads, including complex queries
and joins
Write-intensive transactions (CRUD
statements, including INSERT, UPDATE,
DELETE)
Contention for common tables and/or rows
General Business Reporting
Typical "repeating segment" report
generation
Some data analysis (mixed with other
workloads)
What to analyze?
Identify all transaction-intensive tables in
your schema.
Determine the transaction volume your
database is currently handling (or is
expected to handle).
Identify all common SQL statements
(SELECT, INSERT, UPDATE, DELETE), and
the volumes associated with each.
Develop an understanding of your "table
hierarchy" contained in your schema; in
other words the main parent-child
relationships.
Determine the "key distribution" for
transactions on high-volume tables, to
determine if they are evenly spread or
are concentrated in narrow ranges.
Scalability (cont…)
Database Sharding
Challenges
Reliability
Automated
backups
Database Shard redundancy
Cost-effective hardware redundancy
Automated failover
Disaster Recovery
Distributed
queries
Aggregation
of statistics
Queries that support comprehensive reports
Scalability (cont…)
Database Sharding
Challenges (cont…)
Avoidance
of cross-shard joins
Auto-increment key management
Support for multiple Shard Schemes
Session-based
sharding
Transaction-based sharding
Statement-based sharding
Determine
Shard
the optimum method for sharding the data
by a primary key on a table
Shard by the modulus of a key value
Maintain a master shard index table
Scalability (cont…)
Database Sharding
Example Bookstore schema showing how data is sharded
Tools
Application framework
Load balancer with multiple application servers
Continuous integration
Automated Testing
Monitoring
TDD (Test Driven Development)
BDD (Behavior Driven Development)
Services
Servers
Error Logging
Access Logging
Content Data Networks (CDN)
FOSS
Think Ahead
Think Ahead (cont…)
Understand business model
Analyze requirement in greatest detail
Plan for extendibility
Be agile, do incremental architecture
Create/use frameworks
SQL or NoSQL?
Sharding or clustering or both?
Cloud services?
Guidelines
Enrich your knowledge: Read, read & read. Read
anything available : jokes to religions.
Follow patterns & best practices
Mix technologies
Don’t let your tools/technologies limit your vision
Invent/customize technology if required
Use FOSS
Don’t expect ready solutions
Find the closest match
Customize as needed
Guidelines (cont…)
Database Optimization
Use established & proven solutions
Understand and utilize indexing & full-text search
Use optimized DB structure & algorithms
MySQL
PostgreSQL
MongoDB
Redis
Memchached
CouchDB
Modified Preorder Tree Traversal (MPTT)
Map Reduce
ORM or not?
Guidelines (cont…)
Database Optimization
Optimize your queries
One
big query is faster than repetitive smaller queries
Never be lazy to write optimized queries
One
Use
Ring to Rule `em All
Runtime In Memory Cache
Filtering in-memory cached dataset is much faster than
executing a query in DB
Guidelines (cont…)
One Ring to Rule `em All
Perform Selection, then Projection, then Join
a_id
A
1,000 records
B
1000,000 records
C
1000,000,000 records
A simple example
Write a standard SQL query to find all records with fields A.a1, B.b1 and C.c1 from
tables A (id, a1,a2, a3, …,aP), B (id, a_id, b1, b2, b3, …, bQ), and C(id, b_id,
c1, c2, c3, …,cR) given that A.aX, B.bY and C.cZ will match ‘X’, ‘Y’ and ‘Z’
values respectively.
Assume all tables A, B, C has primary keys defined by id column and a_id and b_id
are the foreign keys in B from A and in C from B respectively.
Guidelines
One Ring to Rule `em All (cont…)
Solution 1
SELECT A.a1, B.b1, C.c1
FROM A, B, C
WHERE A.id = B.a_id AND B.id = C.b_id
AND A.aX = ‘X’ AND B.bY = ‘Y’ AND C.cZ = ‘Z’
Why it Sucks?
•Remembered the size of A, B and C tables?
•Cross product of tables are always memory extensive, why?
•A x B x C will have 1,000 x 1,000,000 x 1,000,000,000 records with (P +1) +
(Q +2) + (R +2) fields
•Can you imagine the size of in-memory result set of joined tables?
•It will be HUGE
Guidelines
One Ring to Rule `em All (cont…)
Solution 2
SELECT A.a1, B.b1, C.c1
FROM A
INNER JOIN B ON A.id = B.a_id
INNER JOIN C ON B.id = C.b_id
WHERE A.aX = ‘X’ AND B.bY = ‘Y’ AND C.cZ = ‘Z’
Why it still Sucks?
•A B C will produce (1,000 x 1,000,000) records to perform A B and then
produce another (1,000 x 1,000,000,000) records to compute (A B) C and then it
will filters the records defined by WHERE clause.
•The number of fields, that is P+1 in A, Q+2 in B and R+2 in C will also contribute in
memory consumption.
•It is optimized but still be HUGE with respect to memory consumption and computation
Guidelines
One Ring to Rule `em All (cont…)
Optimal Solution
SELECT A.a1, B.b1, C.c1
FROM (SELECT id, a1 FROM A WHERE aX = ‘X’) as A
INNER JOIN ( SELECT id, b1, a_id FROM B WHERE bY = ‘Y’) as B ON A.id = B.a_id
INNER JOIN ( SELECT id, c1, b_id FROM C WHERE cZ = ‘Z’) as C ON B.id = C.b_id
Why this solution out performs?
•Let’s keep the explanation as an exercise
Reference : Tools
Security
Caching
Nmap: http://nmap.org/
Nikto: http://cirt.net/Nikto2
List of Tools: http://sectools.org/
APC: http://php.net/manual/en/book.apc.php
XCache: http://xcache.lighttpd.net/
eAccelerator: http://sourceforge.net/projects/eaccelerator/
Varnish Cache: https://www.varnish-cache.org/
MemCached: http://memcached.org/
Redis: http://redis.io/
Load Balancer
HAProxy: http://haproxy.1wt.eu/
Pound: http://www.apsis.ch/pound/
Reference : Tools (cont…)
NoSQL
Distributed Computing
Nagios: http://www.nagios.org/
Testing
RabitMQ: http://www.rabbitmq.com/
ActiveMQ: http://activemq.apache.org/
Monitoring
GearMan: http://gearman.org/
Message Queue/Job Server
MongoDB: http://www.mongodb.org/
CouchDB: http://couchdb.apache.org/
A complete list: http://nosql-database.org/
Selenium: http://seleniumhq.org/
Cucumber: http://cukes.info/
Watir: http://watir.com/
PhpUnit: http://www.phpunit.de/manual/3.7/en/
MPTT
Shameless Promotion: https://github.com/mnishihan/phpMptt
Reference : Articles
Caching
Load Balancing
http://www.diranieh.com/DistributedDesign_1/Scalability.htm
http://www.infoq.com/presentations/Facebook-Software-Stack
http://99designs.com/tech-blog/blog/2012/01/30/infrastructure-at-99designs/
http://bit.ly/16cKu
Database Sharding
http://www.codefutures.com/database-sharding/
http://bit.ly/Y3b3J
http://www.startuplessonslearned.com/2009/01/sharding-for-startups.html
CDN
http://en.wikipedia.org/wiki/Load_balancing_%28computing%29
http://1wt.eu/articles/2006_lb/index.html
Scalability & Architecture
http://www.mnot.net/cache_docs/
http://bit.ly/9cTJfA
http://bit.ly/sMRyxC
MPTT
http://www.sitepoint.com/hierarchical-data-database/
Thank You
Join phpXperts [http://bit.ly/phpxperts]
Follow me on twitter [http://twitter.com/mnishihan]
Subscribe in facebook [http://fb.me/mnishihan]
Questions???
I will be glad to answer