Transcript PPT

By : Uday Kumar
 What is .htaccess?
 Preventing access to your
 How to use .htaccess?
 Error documents

 Redirects & Rewrites

 Password protection
 Deny visitors by IP address
 DirectoryIndex uses
 Adding MIME types
 Activate Caching


 Disable directory listings

 Setting server timezone

 Changing server signature
PHP includes files
Prevent access to php.ini
Ensuring media files are
downloaded instead of
played
Setting up Associations for
Encoded Files
Preventing requests with
invalid characters
Regular Expressions
Useful Resources
Hypertext Access or htaccess is an Apache
configuration web server configuration file. It is loaded
by the server and allows you to significantly modify its
behaviour by specifying redirects, turning features on
and off or protecting special sections of your site. In a
most literal way htaccess is simple a small text file with
the filetype of .htaccess and no name.
'.htaccess' is the filename in full, it is not a file
extension. For instance, you would not create a file
called, 'file.htaccess', it is simply called, '.htaccess'.
This file will take effect when placed in any directory
which is then in turn loaded via the Apache Web
Server software. The file will take effect over the entire
directory it is placed in and all files and subdirectories
within the specified directory.
##Rewrite Engine on code - MUST BE ACTIVE for rewrites##
RewriteEngine on
AuthName "Member's Area Name"
AuthUserFile /path/to/password/file/.htpasswd
AuthType Basic
require valid-user
ErrorDocument 401 /error_pages/401.html
AddHandler server-parsed .html
One of the most common uses for htaccess is handling
header status errors, these are the numbers that come
back from the server when a client makes a request, for
example you will all be familiar with the error status
404, commonly called “page not found”. By default
your browser will show an ugly generic 404 page in the
event of this error but with some htaccess magic we
can customise this page!
 You need to report the correct status code to search
engines. This means if a page has moved for good you
report 301, if it’s not there and has never been there it
should be 404, if it’s there and working it should be
200, incorrect reporting of errors can lead to duplicate
crawling and indexing problems.
 It helps your users, they will know when they are in the
wrong section of a site and can easily follow some
recommended links or go back, if you just bounce
them back to the homepage or throw up a generic
browser 404 it can shake their confidence and force
them to leave the site.
ErrorDocument 404 http://www.example.com/404.html
It will fail to return the 404 status and likely return 301
or 200.
##Error Handling-Note to preserve error stratus DO NOT use full
URLs##
ErrorDocument 401 /401.html
ErrorDocument 403 /403.html
ErrorDocument 404 /404.html
ErrorDocument 400 /400.html
ErrorDocument 404 http://www.example.com/404.html
a full list of error codes is available on Wikipedia
 404 – Not Found; (The file is missing or could not be
accessed )
 401/403 – Unauthorized/Forbidden; (You are not
allowed to access the content, entering the corrected
details may fix a 401 but not a 403)
 400 – Bad Request; (Something is wrong with the
syntax of your request, usually a typo in the url.)
 500 – Internal Server Error; (Frequently caused when
playing around with htaccess if you are not careful, it
indicates a generic server error.)
Redirects enable us to direct web site visitors from one
document within your web site to another. This is
useful for example, if you have moved your web site
content and would like to redirect visitors from old
links to the new content location.
Redirect /old_dir/ http://www.yourdomain.com/new_dir/index.html
The above line tells the Apache Web Server that if a
visitor requests a documents located in the directory
'old_dir', then to display the document 'index.html'
located in the directory 'new_dir'.
 Another frequent duplicate index problem occurs
when search engines index the default file of a directly,
most frequently the home in its file and root form
causing www.example.com/ and
www.example.com/index.php to both be indexed.
## Redirect index to root ##
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\/.*index\.html\HTTP/
RewriteRule ^(.*)index\.html$ /$1 [R=301,L]
 With this piece of code you will need to modify both
the name of the index file (99% of the time this is just
index) and the extension (usually html, htm or php).
 While not the most useful code for every affiliate, this
code can still be very useful for secure subsections of a
site, it will move every thing in the folder and below
into https however so it’s not a good idea to use this
one in your root htaccess.
## Redirect all Pages to Secure ##
RewriteCond %{HTTPS} !on
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI}
 Note the backslash proceeding the period (".") to
actually donate a period, as in Regular Expressions.
 The password protection and authentication systems
offered by the Apache Web Server are probably the
most important use of .htaccess files.
AuthName "Member's Area Name"
AuthUserFile /path/to/password/file/.htpasswd
AuthType Basic
require valid-user
 The password file would contain something similar to
the following text:
username:encryptedpassword
fred_smith:oCF9Pam/MXJg2
 The visitor blocking facilities offered by the Apache
Web Server enable us to deny access to specific visitors,
or allow access to specific visitors. This is extremely
useful for blocking unwanted visitors, or to only allow
the web site owner access to certain sections of the
web site, such as an administration area.
order allow,deny
deny from 255.0.0.0
deny from 123.45.6.
allow from all
 The directoryindex command allows you to specify a
default page to display when a directory is accessed.
For instance, if a visitor requests a directory on your
web site, you can specify the file to load when the
directory is accessed.
DirectoryIndex index.html
DirectoryIndex index.html index.cgi index.php
 The above lines tell the Apache Web Server to display
the 'index.html' file as the directoryindex, if this file is
not available then display 'index.cgi', and if this is not
available then display 'index.php'.
 MIME types set what a file is, or rather what file
extensions refer to what file types. For example, a
'.html' file extension refers to a HTML document, a
'.zip' file extension refers to a ZIP archive file. The
server needs to know this so it knows how to deal with
the file. This is often used to create custom file
extension for common file types.
AddType text/html .html .htm
AddType text/plain .txt
AddType text/x-setext .etx
AddType application/pdf .pdf
AddType application/slate
AddType application/zip .zip
 Caching is a way to stop repeat visitors completely
redownloading every element of your site
##Enable Caching##
##Files to Cache for One Month
<FilesMatch"\.(flv|gif|jpg|jpeg|png|ico|swf )$">
Header set Cache - Control "max - age=2592000"
</FilesMatch>
## Files to Cache for One Day
<FilesMatch "\.(html|htm)$">
Header set Cache - Control "max - age=43200"
</FilesMatch>
## Disable cache for script files
<FilesMatch "\.(pl|php|cgi|spl|scgi|fcgi)$">
Header unset Cache - Control
</FilesMatch>
 Preventing directory listings can be very useful if for
example, you have a directory containing important
'.zip' archive files or to prevent viewing of your image
directories. Alternatively it can also be useful to enable
directory listings if they are not available on your
server, for example if you wish to display directory
listings of your important '.zip' files.
IndexIgnore *.zip
The above line tells the Apache Web Server to list all
files except those that end with '.zip'.
 To set your web servers date timezone
SetEnv TZ America/Indianapolis
SetEnv TZ America/Los_Angeles America/New_York - Eastern Time
America/Detroit - Eastern Time - Michigan (most locations)
America/Louisville - Eastern Time (Louisville, Kentucky)
America/Indianapolis - Eastern Standard Time (Indiana, most locations)
America/Indiana/Knox - Eastern Standard Time (Indiana, Starke County)
America/Chicago - Central Time
 To change the server signature which is displayed as
part of the default Apache error documents, use the
following code:
ServerSignature EMail
SetEnv SERVER_ADMIN [email protected]
 To remove the server signature completely, use the
following code:
ServerSignature Off
 If you have a directory containing PHP includes, that
you do not wish to be accessed directly from the
browser, there is a way of disabling the directory using
Mod_Rewrite.
## Enable Mod Rewrite, this is only required once in each .htaccess file
RewriteEngine On
RewriteBase /
## Test for access to includes directory
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /includes/ .*$ [NC]
## Test that file requested has php extension
RewriteCond %{REQUEST_FILENAME} ^.+\.php$
## Forbid Access
RewriteRule .* - [F,NS,L]
 If you run the risk of someone accessing your php.ini
or php.cgi files directly through their browsers, you
can limit access to them using .htaccess.
<FilesMatch "^php5?\.(ini|cgi)$">
Order Deny,Allow
Deny from All
Allow from env=REDIRECT_STATUS
</FilesMatch>
 It is possible to ensure that any media files are treated
as a download, rather than to be played by the browser.
AddType application/octet-stream .zip .mp3 .mp4
 This tells the Apache Web Server to treat .zip, .mp3,
and .mp4 files as downloadable, and should be used
instead of specifiying them as audio/video/zip files in
your MIME types section.
 Some browsers are capable of uncompressing encoded
information as they receive it.
AddEncoding x-gzip .gz .tgz
AddEncoding x-compress .Z
 This tells the Apache Web Server to treat .gz and .tgz
files as encoded by x-gzip, and .Z files as encoded by xcompress.
 If you wish, you can use Mod_Rewrite to deny requests
containing invalid characters, please be aware that
with certain site setups this may break links.
RewriteEngine On
RewriteBase /
RewriteCond %{THE_REQUEST} !^[A-Z]{3,9}\ [a-zA-Z0-9\.\+_/\-\?\=\&]+\
HTTP/ [NC]
RewriteRule .* - [F,NS,L]
 A regular expression is basically a small piece of code
that checks for patterns. The pattern can range from a
single character that matches to absolutely everything.
 There are some predefined 'terms' in regular
expressions
 [ ] enclose the expression or a portion of the expression.
(Used for determining the characters, or range of
characters to be matched.)
 letter-letter (EG [a-z] matches any single lowercase
alphabetical character in the range of a to z), so [c-e] will
match any single character that is the lowercase letter c,
d, or e.
 LETTER-LETTER (EG [A-Z] matches any single capital
alphabetical character in the range of A to Z), so [C-E]
will match any single character that is the capital letter
C, D, or E.
 number-number (EG [0-9] matches any single number
in the range of 0 to 9), so [4-6] would match any single
number 4, 5, or 6.
 character list (EG [dog123] matches any single
character, either d, o, g, 1, 2, or 3.
 ^ has two purposes, when used inside of [ ] it
designates 'not'. (EG [^0-9] would match any character
that is not 0 to 9 and [^abc] would match any
character that is not a lowercase a, b, or c.) When used
at the beginning of a pattern in mod_rewrite, it also
designates the begining of a 'line'.
 It is very important to understand and remember
[dog] does not match the word 'dog', it matches any
individual lowercase letter d, o, or g anywhere in the
comparison. In the same way, [^dog] does not exclude
the word 'dog' from matching, it excludes the
lowercase letter d, o, or g from matching individually.
 To match a 'word' or a group of characters in order, you
do not need to use [] so ^dog$ would match the word
dog, and not d, o, or g as a single character.
 . (a dot) matches any single character, except the
ending of a line.
 ? matches 0 or 1 of the characters or set of characters in
brackets or parentheses immediately before it. (EG a?
would match the lowercase letter 'a' 0 or 1 time, (abc)?
would match the phrase 'abc' 0 or 1 time, while [a-z]?
would match any lowercase letter from 'a to z' 0 or 1
time.)
 + matches 1 or more of the characters or set of
characters in brackets or parentheses immediately
before it. (EG a+ would match the lowercase letter 'a' 1
or more times, (abc)+ would match the phrase 'abc' 1
or more times, while [a-z]+ would match 1 or more
lowercase letters from 'a to z'.)
 * matches 0 or more of the characters or set of
characters immediately before it. (EG a* would match
the lowercase letter 'a' 0 or more times, (abc)* would
match the phrase 'abc' 0 or more times, while [a-z]*
would match 0 or more lowercase letters from 'a to z'.)
 RewriteRule tells the server to interpret the following
information as a rule.
 RewriteCond tells the server to interpret the
following information as a condtion of the rule(s) that
are immediately after it.
 ^ defines the begining of a 'line' (starting anchor).
Remember, ^ also designates 'not' in a regular
expression, so please don't get confused.
 ( ) creates a variable to be stored and possibly used
later, and is also used to group text for use with the
quantifiers ?, +, and * described above.
 $ defines the ending of a 'line' (ending anchor), and
when followed by a number from 1 to 9, also references
a variable defined in the RewriteRule pattern (used for
variables on the right side of the equation or to match
a variable from the rule in a condition, see example
below).
 % references a variable defined in a preceding rewrite
condition. (used for variables on the right side of the
equation only, see example below)
 *note* - The right side of the equation is everything
that follows the $ in a RewriteRule.
RewriteRule ^(var1)/no-var/(var2)$ /to-use-variables-type-$1-and-$2
The final result would look like this:
to-use-variables-type-var1-and-var2
RewriteCond %{CONDITION_STUFF} ^(var1)/no-var/(var2)
RewriteRule ^no-var/no-var/no-var$ /to-use-variables-type-%1-and-%2
The final result would look like this:
to-use-variables-type-var1-and-var2
RewriteCond %{CONDITION_STUFF} ^(var1)/no-var/(var2)
RewriteRule ^(var1)/no-var/(var2)$ /to-use-variables-type-$1-and-%2-$2
The final result would look like this:
to-use-variables-type-var1-and-var2-var2
 ** Do not use this flag if you are trying to make a
'silent' redirect. (EG proxy /outlook mailing)
 Flags, in mod_rewrite are what give you the control of
the response sent by the server when a specific URL is
requested. They are an integral part of the rule writing
process, because they designate any special
instructions that might be needed. (EG If I want to tell
everyone a page is moved permanently, I can add
R=301 to my rule and they will know.)
 Flags follow the rule and the most often used, are
enclosed with [ ] (Not all flags are covered here, but
the main and widely used ones are.)
 [R] stands for Redirect. The default is 302-Temporarily
Moved. This can be set to any number between 300
and 400, by entering it as [R=301] or
[R=YourNumberHere], but 301 (Permanently Moved)
and 302 (Temporarily Moved) are the most common.
 (If you just use [R] this will work, and defaults to 302-
Temporarily Moved)
 [F] stands for Forbidden. Any URL or file that matches
the rule (and condition(s) if present) will return a 403Forbidden response to anyone who tries to access
them. (Useful for files that you would like to keep
private, or you do not want indexed prior to 'going live'
with them.)
 [G] stands for Gone. (Similar to 404-Not Found, but it
indicates that a resource was intentionally removed.)
Not recommended for use unless you test the HTTP
protocol level used by the client and return 410-Gone
only to HTTP/1.1 or enhanced HTTP/1.0 clients. Older
true HTTP/1.0 clients will treat 410-Gone as 400-Bad
Request.
 [P] stands for Proxy. This creates a type of 'silent
redirect' for files or pages that are not actually part of
your site and can be used to serve pages from a
different host, as though they were part of your site.
(DO NOT mess with copyrighted material, some of us
get very upset.)
 [NC] stands for No Case as applied to letters, so if you
use this on a rule, MYsite.com, will match
mysite.com... even though they are not the same case.
(This can also be used with regular expressions, so
instead of [a-zA-Z], you can use [a-z] and [NC] at the
end of the rule for the same effect.)
 [QSA] stands for Query String Append. This means
the 'query string' (stuff after the?) should be passed
from the original URL (the one we are rewriting) to the
new URL.
 [L] stands for Last rule. As soon as this flag is read, no
other following rules are processed. (Every rule should
contain this flag, until you know exactly what you are
doing.)
Expression:
[a-z]+
Explanation: [a-z] matches any single letter. + matches 1 or more of the
previous character or string of characters. When you put the two together
you have a regular expression that matches any single letter from a to z over
and over, until it runs into a character that is not a letter.
Expression:
([a-z]+) [NC]
Explanation: Same as above with the addition of () and [NC]. In
mod_rewrite, () creates a single variable out of the regular expression, so the
word matched is now in a variable. [NC] stands for 'No Case' ( from
mod_rewrite) specifying that the regular expression or regular text strings
match both upper and lowercase letters. With this expression you can match
any single word.
 http://www.htaccess-guide.com/
 http://www.twisttraining.com/archive/mn/htaccess-
guide.pdf
 http://www.webmasterworld.com/forum92/4332.htm
Thanks & Regards,
Uday Kumar