Saturday, June 27, 2009

DAG repo to yum

You can make yum more robust by adding more repositories like DAG, UPDATE and RPMforge. For adding extra repositories to yum, please do the following.

cd /etc/yum.repos.d
vi dag.repo // the add the following lines in that file//

[dag]
name=Dag RPM Repository for Red Hat Enterprise Linux
baseurl=http://apt.sw.be/redhat/el$releasever/en/$basearch/dag
gpgcheck=1
rpm --import http://dag.wieers.com/rpm/packages/RPM-GPG-KEY.dag.txt

After this save this file and run the following command

yum check-update

Now yum will be having more repositories.

Thursday, June 18, 2009

MySQL Tweak[core level]

A my.cnf values run on a dual xeon with 2 GB's of ram, this is a shared hosting machine that runs MySQL and web, so all memory is not allocated to MySQL.
------------------------------------------------
/etc/my.cnf

datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
skip-locking
skip-innodb
query_cache_limit=1M
query_cache_size=32M
query_cache_type=1
max_connections=900
interactive_timeout=100
wait_timeout=100
connect_timeout=10
thread_cache_size=128
#key_buffer=16M
key_buffer=200M
join_buffer=1M
max_allowed_packet=16M
table_cache=1536
sort_buffer_size=1M
read_buffer_size=1M
read_rnd_buffer_size=1M
max_connect_errors=10
# Try number of CPU's*2 for thread_concurrency
thread_concurrency=4
myisam_sort_buffer_size=64M
#log-bin
server-id=1

Query caching was added as of MySQL version 4, the following three directives will greatly enhance mysql server performance.

query_cache_limit=1M
query_cache_size=32M
query_cache_type=1

Query caching is a server wide variable, so set these generous. I have found the above levels are generally best if you server has at least 512 ram. If you run a server just for DBs with a lot of ram, you can up these quite a bit, like 2m limit and a 64+M cache size.

The key buffer is a variable that is shared amongst all MySQL clients on the server. A large setting is recomended, particularly helpful with tables that have unique keys. (Most do)

key_buffer=150M

The next set of buffers are at a per client level. It is important to play around with these and get them just right for your machine. With the setting below, every active mysql client will have close to 3 MB's in buffers. So 100 clients = almost 300 MB. Giving too much to these buffers will be worse than giving too little. Nothing kills a server quite like memory swapping will.

sort_buffer_size=1M
read_buffer_size=1M
read_rnd_buffer_size=768K

The following directive should be set to 2X the number of processors in your machine for best performance.

thread_concurrency=2

Heres a few example configurations for servers running MySQL and web for common memory sizes. These are not perfect, but good starting points.

Server with 512MB RAM:

thread_cache_size=50
key_buffer=40M
table_cache=384
sort_buffer_size=768K
read_buffer_size=512K
read_rnd_buffer_size=512K
thread_concurrency=2

For servers with 1 GB ram:

thread_cache_size=80
key_buffer=150M
table_cache=512
sort_buffer_size=1M
read_buffer_size=1M
read_rnd_buffer_size=768K
thread_concurrency=2

########################################################

For optimizing mysql, first we need to know the values of mysql variables and status.
The following are some commands used for this purpose:
# mysqladmin processlist extended-status

or

mysql> show status;
mysql> show variables;

To get more specific answer, the commands can be enhanced a little more like as follows:

mysql> show status like '%Open%_tables';

mysql> show variables like 'table_cache';

1. The most important variables in mysql are table_cache and key_buffer_size

a) Run the above two commands and check Open_tables and Opened_tables
If Opened_tables is big, then your table_cache variable is probably
too small.

So increase the table_cache variable. Open /etc/my.cnf and change/add table_cache=newvalue

b) Run the following commands to check key_buffer_size, key_read_requests and key_reads

mysql> show variables like '%key_buffer_size%';
mysql> show status like '%key_read%';

If key_reads / key_read_requests is < 0.01, key_buffer_size is enough. Otherwise key_buffer_size should be increased.

Also run the following command to check key_write_requests and key_writes

mysql> show status like '%key_write%';

If key_writes / key_write_requests is not less than 1 (near 0.5 seems to be fine), increase key_buffer_size.

Check the total size of all .MYI files. If it is larger than key_buffer_size change key_buffer_size to total size of MYI files.

2. Wait_timeout, max_connection, thread_cache

If you want to allow more connections, reduce wait_timeout to 15 seconds and increase max_connection as you want.

Check the number of idle connections. If it is too high reduce the wait_timeout and use Thread_cache

How many threads we should keep in a cache for reuse. When a client disconnects, the client's threads are put in the cache if there aren't more than thread_cache_size threads from before. All new threads are first taken from the cache, and only when the cache is empty is a new thread created. This variable can be increased to improve performance if you have a lot of new connections. (Normally this doesn't give a notable performance improvement if you have a good thread implementation.) By examing the difference between the Connections and Threads_created you can see how efficient the current thread cache is for you.

If Threads_created is big, you may want to increase the
thread_cache_size variable. The cache hit rate can be calculated with
Threads_created/Connections.
Default thread_cache_size may be 0 if so increase it to 8.
You may try this formula : table_cache = opened table / max_used_connection

Monday, June 15, 2009

Hub, Switches, and Routers

Hub, Switches, and Routers
---------------------------

Hub
A common connection point for devices in a network. Hubs are commonly used to connect segments of a LAN. A hub contains multiple ports. When a packet arrives at one port, it is copied to the other ports so that all segments of the LAN can see all packets.

Switch
In networks, a device that filters and forwards packets between LAN segments. Switches operate at the data link layer (layer 2) and sometimes the network layer (layer 3) of the OSI Reference Model and therefore support any packet protocol. LANs that use switches to join segments are called switched LANs or, in the case of Ethernet networks, switched Ethernet LANs.

Router
A device that forwards data packets along networks. A router is connected to at least two networks, commonly two LANs or WANs or a LAN and its ISP.s network. Routers are located at gateways, the places where two or more networks connect. Routers use headers and forwarding tables to determine the best path for forwarding the packets, and they use protocols such as ICMP to communicate with each other and configure the best route between any two hosts.

The Differences Between These Devices on the Network
Today most routers have become something of a Swiss Army knife, combining the features and functionality of a router and switch/hub into a single unit. So conversations regarding these devices can be a bit misleading — especially to someone new to computer networking.

The functions of a router, hub and a switch are all quite different from one another, even if at times they are all integrated into a single device. Let's start with the hub and the switch since these two devices have similar roles on the network. Each serves as a central connection for all of your network equipment and handles a data type known as frames. Frames carry your data. When a frame is received, it is amplified and then transmitted on to the port of the destination PC. The big difference between these two devices is in the method in which frames are being delivered.

In a hub, a frame is passed along or "broadcast" to every one of its ports. It doesn't matter that the frame is only destined for one port. The hub has no way of distinguishing which port a frame should be sent to. Passing it along to every port ensures that it will reach its intended destination. This places a lot of traffic on the network and can lead to poor network response times.

Additionally, a 10/100Mbps hub must share its bandwidth with each and every one of its ports. So when only one PC is broadcasting, it will have access to the maximum available bandwidth. If, however, multiple PCs are broadcasting, then that bandwidth will need to be divided among all of those systems, which will degrade performance.

A switch, however, keeps a record of the MAC addresses of all the devices connected to it. With this information, a switch can identify which system is sitting on which port. So when a frame is received, it knows exactly which port to send it to, without significantly increasing network response times. And, unlike a hub, a 10/100Mbps switch will allocate a full 10/100Mbps to each of its ports. So regardless of the number of PCs transmitting, users will always have access to the maximum amount of bandwidth. It's for these reasons why a switch is considered to be a much better choice then a hub.

Routers are completely different devices. Where a hub or switch is concerned with transmitting frames, a router's job, as its name implies, is to route packets to other networks until that packet ultimately reaches its destination. One of the key features of a packet is that it not only contains data, but the destination address of where it's going.

A router is typically connected to at least two networks, commonly two Local Area Networks (LANs) or Wide Area Networks (WAN) or a LAN and its ISP's network . for example, your PC or workgroup and EarthLink. Routers are located at gateways, the places where two or more networks connect. Using headers and forwarding tables, routers determine the best path for forwarding the packets. Router use protocols such as ICMP to communicate with each other and configure the best route between any two hosts.

Today, a wide variety of services are integrated into most broadband routers. A router will typically include a 4 - 8 port Ethernet switch (or hub) and a Network Address Translator (NAT). In addition, they usually include a Dynamic Host Configuration Protocol (DHCP) server, Domain Name Service (DNS) proxy server and a hardware firewall to protect the LAN from malicious intrusion from the Internet.

All routers have a WAN Port that connects to a DSL or cable modem for broadband Internet service and the integrated switch allows users to easily create a LAN. This allows all the PCs on the LAN to have access to the Internet and Windows file and printer sharing services.

Some routers have a single WAN port and a single LAN port and are designed to connect an existing LAN hub or switch to a WAN. Ethernet switches and hubs can be connected to a router with multiple PC ports to expand a LAN. Depending on the capabilities (kinds of available ports) of the router and the switches or hubs, the connection between the router and switches/hubs may require either straight-thru or crossover (null-modem) cables. Some routers even have USB ports, and more commonly, wireless access points built into them.

Some of the more high-end or business class routers will also incorporate a serial port that can be connected to an external dial-up modem, which is useful as a backup in the event that the primary broadband connection goes down, as well as a built in LAN printer server and printer port.

Besides the inherent protection features provided by the NAT, many routers will also have a built-in, configurable, hardware-based firewall. Firewall capabilities can range from the very basic to quite sophisticated devices. Among the capabilities found on leading routers are those that permit configuring TCP/UDP ports for games, chat services, and the like, on the LAN behind the firewall.

So, in short, a hub glues together an Ethernet network segment, a switch can connect multiple Ethernet segments more efficiently and a router can do those functions plus route TCP/IP packets between multiple LANs and/or WANs; and much more of course.

DNS(Domain Name Service)

Domain Name Service
Host Names

Domain Name Service (DNS) is the service used to convert human readable names of hosts to IP addresses. Host names are not case sensitive and can contain alphabetic or numeric letters or the hyphen. Avoid the underscore. A fully qualified domain name (FQDN) consists of the host name plus domain name as in the following example:

computername.domain.com

The part of the system sending the queries is called the resolver and is the client side of the configuration. The nameserver answers the queries. Read RFCs 1034 and 1035. These contain the bulk of the DNS information and are superceded by RFCs 1535-1537. Naming is in RFC 1591. The main function of DNS is the mapping of IP addresses to human readable names.

Three main components of DNS

1. resolver
2. name server
3. database of resource records(RRs)

Domain Name System

The Domain Name System (DNS) is basically a large database which resides on various computers and it contains the names and IP addresses of various hosts on the internet and various domains. The Domain Name System is used to provide information to the Domain Name Service to use when queries are made. The service is the act of querying the database, and the system is the data structure and data itself. The Domain Name System is similar to a file system in Unix or DOS starting with a root. Branches attach to the root to create a huge set of paths. Each branch in the DNS is called a label. Each label can be 63 characters long, but most are less. Each text word between the dots can be 63 characters in length, with the total domain name (all the labels) limited to 255 bytes in overall length. The domain name system database is divided into sections called zones. The name servers in their respective zones are responsible for answering queries for their zones. A zone is a subtree of DNS and is administered separately. There are multiple name servers for a zone. There is usually one primary nameserver and one or more secondary name servers. A name server may be authoritative for more than one zone.

DNS names are assigned through the Internet Registries by the Internet Assigned Number Authority (IANA). The domain name is a name assigned to an internet domain. For example, mycollege.edu represents the domain name of an educational institution. The names microsoft.com and 3Com.com represent the domain names at those commercial companies. Naming hosts within the domain is up to individuals administer their domain.

Access to the Domain name database is through a resolver which may be a program or part of an operating system that resides on users workstations. In Unix the resolver is accessed by using the library functions "gethostbyname" and "gethostbyaddr". The resolver will send requests to the name servers to return information requested by the user. The requesting computer tries to connect to the name server using its IP address rather than the name.

Structure and message format

The drawing below shows a partial DNS hierarchy. At the top is what is called the root and it is the start of all other branches in the DNS tree. It is designated with a period. Each branch moves down from level to level. When referring to DNS addresses, they are referred to from the bottom up with the root designator (period) at the far right. Example: "myhost.mycompany.com.".

Partial DNS Hierarchy

DNS is hierarchical in structure. A domain is a subtree of the domain name space. From the root, the assigned top-level domains in the U.S. are:

* GOV - Government body.
* EDU - Educational body.
* INT - International organization
* NET - Networks
* COM - Commercial entity.
* MIL - U. S. Military.
* ORG - Any other organization not previously listed.

Outside this list are top level domains for various countries.

Each node on the domain name system is separated by a ".". Example: "mymachine.mycompany.com.". Note that any name ending in a "." is an absolute domain name since it goes back to root.
DNS Message format:

Bits Name Description
0-15 Identification Used to match responses to requests. Set by client and returned by server.
16-31 Flags Tells if query or response, type of query, if authoritative answer, if truncated, if recursion desired, and if recursion is available.
32-47 Number of questions
48-63 Number of answer RRs
64-79 Number of authority RRs
80-95 Number of additional RRs
96-?? Questions - variable lengths There can be variable numbers of questions sent.
??-?? Answers - variable lengths Answers are variable numbers of resource records.
??-?? Authority - variable lengths
??-?? Additional Information - variable lengths

Question format includes query name, query type and query class. The query name is the name being looked up. The query class is normally 1 for internet address. The query types are listed in the table below. They include NS, CNAME, A, etc.

The answers, authority and additional information are in resource record (RR) format which contains the following.

1. Domain name
2. Type - One of the RR codes listed below.
3. Class - Normally indicates internet data which is a 1.
4. Time to live field - The number of seconds the RR is saved by the client.
5. Resource data length specifies the amount of data. The data is dependent on its type such as CNAME, A, NS or others as shown in the table below. If the type is "A" the data is a 4 byte IP address.

The table below shows resource record types:

Type RR value Description
A 1 Host's IP address
NS 2 Host's or domain's name server(s)
CNAME 5 Host's canonical name, host identified by an alias domain name
PTR 12 Host's domain name, host identified by its IP address
HINFO 13 Host information
MX 15 Host's or domain's mail exchanger
AXFR 252 Request for zone transfer
ANY 255 Request for all records
Usage and file formats

If a domain name is not found when a query is made, the server may search for the name elsewhere and return the information to the requesting workstation, or return the address of a name server that the workstation can query to get more information. There are special servers on the Internet that provide guidance to all name servers. These are known as root name servers. They do not contain all information about every host on the Internet, but they do provide direction as to where domains are located (the IP address of the name server for the uppermost domain a server is requesting). The root name server is the starting point to find any domain on the Internet.
Name Server Types

There are three types of name servers:

1. The primary master builds its database from files that were preconfigured on its hosts, called zone or database files. The name server reads these files and builds a database for the zone it is authoritative for.
2. Secondary masters can provide information to resolvers just like the primary masters, but they get their information from the primary. Any updates to the database are provided by the primary.
3. Caching name server - It gets all its answers to queries from other name servers and saves (caches) the answers. It is a non-authoritative server.

The caching only name server generates no zone transfer traffic. A DNS Server that can communicate outside of the private network to resolve a DNS name query is referred to as forwarder.
DNS Query Types

There are two types of queries issued:

1. Recursive queries received by a server forces that server to find the information requested or post a message back to the querier that the information cannot be found.
2. Iterative queries allow the server to search for the information and pass back the best information it knows about. This is the type that is used between servers. Clients used the recursive query.
3. Reverse - The client provides the IP address and asks for the name. In other queries the name is provided, and the IP address is returned to the client. Reverse lookup entries for a network 192.168.100.0 is "100.168.192.in-addr arpa".

Generally (but not always), a server-to-server query is iterative and a client-resolver-to-server query is recursive. You should also note that a server can be queried or it can be the person placing a query. Therefore, a server contains both the server and client functions. A server can transmit either type of query. If it is handed a recursive query from a remote source, it must transmit other queries to find the specified name, or send a message back to the originator of the query that the name could not be found.
DNS Transport protocol

DNS resolvers first attempt to use UDP for transport, then use TCP if UDP fails.
The DNS Database

A database is made up of records and the DNS is a database. Therefore, common resource record types in the DNS database are:

* A - Host's IP address. Address record allowing a computer name to be translated into an IP address. Each computer must have this record for its IP address to be located. These names are not assigned for clients that have dynamically assigned IP addresses, but are a must for locating servers with static IP addresses.
* PTR - Host’s domain name, host identified by its IP address
* CNAME - Host’s canonical name allows additional names or aliases to be used to locate a computer.
* MX - Host’s or domain’s mail exchanger.
* NS - Host’s or domain’s name server(s).
* SOA - Indicates authority for the domain
* TXT - Generic text record
* SRV - Service location record
* RP - Responsible person
* HINFO - Host information record with CPU type and operating system.

When a resolver requests information from the server, the DNS query message indicates one of the preceding types.
DNS Files

* CACHE.DNS - The DNS Cache file. This file is used to resolve internet DNS queries. On Windows systems, it is located in the WINNTROOT\system32\DNS directory and is used to configure a DNS server to use a DNS server on the internet to resolve names not in the local domain.

Example Files

Below is a partial explanation of some records in the database on a Linux based system. The reader should view this information because it explains some important DNS settings that are common to all DNS servers. An example /var/named/db.mycompany.com.hosts file is listed below.

mycompany.com. IN SOA mymachine.mycompany.com. root.mymachine.mycompany.com. (
1999112701 ; Serial number as date and two digit number YYMMDDXX
10800 ; Refresh in seconds 28800=8H
3600 ; Retry in seconds 7200=2H
604800 ; Expire 3600000=1 week
86400 ) ; Minimum TTL 86400=24Hours
mycompany.com. IN NS mymachine.mycompany.com.
mycompany.com. IN MX 10 mailmachine.mycompany.com.
mymachine.mycompany.com. IN A 10.1.0.100
mailmachine.mycompany.com. IN A 10.1.0.4
george.mycompany.com. IN A 10.1.3.16

A Line by line description is as follows:

1. The entries on this line are:
1. mycompany.com. - Indicates this server is for the domain mycompany.com.
2. IN - Indicates Internet Name.
3. SOA - Indicates this server is the authority for its domain, mycompany.com.
4. mymachine.mycompany.com. - The primary nameserver for this domain.
5. root.mymachine.mycompany.com. - The person to contact for more information.
The lines in the parenthesis, listed below, are for the secondary nameserver(s) which run as slave(s) to this one (since it is the master).
2. 1999112701 - Serial number - If less than master's SN, the slave will get a new copy of this file from the master.
3. 10800 - Refresh - The time in seconds between when the slave compares this file's SN with the master.
4. 3600 - Retry - The time the server should wait before asking again if the master fails to respond to a file update (SOA request).
5. 604800 - Expire - Time in seconds the slave server can respond even though it cannot get an updated zone file.
6. 86400 - TTL - The time to live (TTL) in seconds that a resolver will use data received from a nameserver before it will ask for the same data again.
7. This line is the nameserver resource record. There may be several of these if there are slave name servers.

mycompany.com. IN NS mymachine.mycompany.com.

Add any slave server entries below this like:

mycompany.com. IN NS ournamesv1.mycompany.com.
mycompany.com. IN NS ournamesv2.mycompany.com.
mycompany.com. IN NS ournamesv3.mycompany.com.

8. This line indicates the mailserver record.

mycompany.com. IN MX 10 mailmachine.mycompany.com.

There can be several mailservers. The numeric value on the line indicates the preference or precedence for the use of that mail server. A lower number indicates a higher preference. The range of values is from 0 to 65535. To enter more mailservers, enter a new line for each one similar to the nameserver entries above, but be sure to set the preferences value correctly, at different values for each mailserver.
9. The rest of the lines are the name to IP mappings for the machines in the organization. Note that the nameserver and mailserver are listed here with IP addresses along with any other server machines required for your network.

mymachine.mycompany.com. IN A 10.1.0.100
mailmachine.mycompany.com. IN A 10.1.0.4
george.mycompany.com. IN A 10.1.3.16

Domain names written with a dot on the end are absolute names which specify a domain name exactly as it exists in the DNS hierarchy from the root. Names not ending with a dot may be a subdomain to some other domain.

Aliases are specified in lines like the following:

mymachine.mycompany.com IN CNAME nameserver.mycompany.com.
george.mycompany.com IN CNAME dataserver.mycompany.com.
Linux1.mycompany.com IN CNAME engserver.mycompany.com.
Linux2.mycompany.com IN CNAME mailserver.mycompany.com.

When a client (resolver) sends a request, if the nameserver finds a CNAME record, it replaces the requested name with the CNAME, then finds the address of the CNAME value, and return this value to the client.

A host that has more than one network card which is set to address two different subnets can have more than one address for a name.

mymachine.mycompany.com IN A 10.1.0.100
IN A 10.1.1.100

When a client queries the nameserver for the address of a multi homed host, the nameserver will return the address that is closest to the client address. If the client is on a different network than both the subnet addresses of the multi homed host, the server will return both addresses.

For more information on practical application of DNS, read the DNS section of the Linux User's Guide.

Sunday, June 14, 2009

ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)

[root@sylesh ~]# mysql -u root
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)

>>disabling password authentication
service mysql stop

wait until MySQL shuts down. Then run

mysqld_safe --skip-grant-tables &

then you will be able to login as root with no password.

mysql -uroot mysql

In MySQL command line prompt issue the following command:
use databasename;

UPDATE user SET password=PASSWORD("abcd") WHERE user="root";
FLUSH PRIVILEGES;
EXIT

/etc/init.d/mysqld restart

At this time your root password is reset to "abcd" and MySQL will now
know the privileges and you'll be able to login with your new password:

mysql -uroot -p mysql

Friday, June 12, 2009

How to enable SSI On Your Server with .htaccess and XBitHack apache directive

>>>

The below notes will demonstrates how to enable SSI on your server using .htaccess.

If you are paying for hosting services you may need to get permission from your host to make sure you are not violating their Terms of Service which could result in you getting the boot! Every decent host supports SSI but double-check to make sure.

To enable SSI either create a file simple called .htaccess or edit your existing .htaccess file and place the following code in it:

AddType text/html .shtml
AddHandler server-parsed .shtml
Options Indexes FollowSymLinks Includes

Note: to enable SSI for your full web site place the .htaccess in the root directory of your site; to enable it for just a certain directory place the .htaccess file only in that particular directory.

The first line of the code above tells the server that .shtml is a valid extension. The second line adds a handler to all pages with the .shtml extension which tells the server to parse (process) the document for server side includes.

If you prefer you can use a different file extension for your files which you want parsed for server side includes. Simply change the .shtml to .shtm etc. If you also want your .htm documents parsed by the server (so you don't need to rename all your files) simply add the following after the first line of the code above:

AddHandler server-parsed .htm

If you want to use SSI in your default directory page, such as index.shtml you may (but normally won't) need to add the following to the .htaccess file:

DirectoryIndex index.shtml index.htm

This means that index.shtml can be your default page. If this page is not found the server will look for index.htm etc. More on this in the .htaccess guides section.

>>SSI Without .shtml

In order to understand what this use of htaccess can do for you, you have to understand what SSI directives are. (SSI directives are covered in the How To Use Your CGI-BIN page.) You can put an SSI directive tag in your Web page, but that doesn't mean the server will look for it. Looking through an html file for SSI directives is called "parsing", and by default a server doesn't parse every html file. It only parses pages that have a .shtml extension.

Dilemma:

You want to start using SSI directives in your Web pages to call a script or display certain things on the pages. Your host requires that pages with SSI directives have a .shtml extension. However, over time all of your pages have been linked to and indexed by search engines using their current .html extensions. If you change the extensions to comply with your host, a lot of people will start getting 404 errors.

htaccess to the rescue! Certain htaccess statements allow you to tell the server to parse certain pages that don't have a .shtml extension.

If you created the htaccess.txt file above, simply add the statements given below to it and re-ftp/rename it. If you didn't, here are the steps:

1. Use a text editor to create an htaccess.txt file and enter the following statements into it:

AddType text/html .html
AddHandler server-parsed .html

replacing .html with .htm if that's what you are using for your pages.

2. Save the file and ftp it (using ASCII mode) to your Web root directory (or whatever directory your index.html file is in).

3. Rename the htaccess.txt file on the server to .htaccess

4. Try it out by entering a URL for one of the pages that contains an SSI directive and see if it's working.

The above can be thought of as the "directory method" method for enabling SSI parsing because all files in the directory with the specified extension will be parsed, including files in any sub-directories. SSI parsing does have a small performance price due to all this parsing. If your site has a lot of traffic and a lot of pages that performance price could add up. What if you have a lot of traffic and a lot of pages but you only have a few files that you want parsed? Then you'd want to use XBitHack which is covered in the next section.

Not all hosts allow you to use a .htaccess file. They have to use an AllowOverride statement in one of the global configuration files. Ask your host, or a potential host, if they allow the use of .htaccess files. If so, also ask if they allow the use of XbitHack. If they so 'No' to the question of htaccess, pleading with them to enable it on your server may work, especially if you sound like you know what you're talking about (which this page will help you to do).

A .htaccess file is a very powerful tool. You can use it to set up password-protected directories, change the way Apache responds to certain events, etc. The flip side of that is that you can really hose things up or give unintended access to visitors if you're not careful. You may want to try out your attempts with .htaccess during low-traffic times on your Website so that any problems can be corrected without affecting too many visitors.

Note also that the very fact that this is a very powerful tool may be reason enough for some hosting services not to allow you to use it. A hosting service sets up multiple "virtual" Web servers so multiple domains can be hosting on a single system (each domain having it's own virtual Web server). They do this by adding statements (aka directives) to the main Apache configuration file (named httpd.conf). When they add these virtual server directives they must include the directive to enable htaccess functionality. If you try the above and it doesn't work, chances are good your host doesn't have the htaccess function enabled.

What is XBitHack
----------------

XBitHack (pronounced "X bit hack") is simply one of those htaccess configuration statements mentioned above. If you're not willing to put up with the performance costs of the "directory method" for enabling parsing of non-.shtml pages covered above, think of XBitHack as a "file method". This is because you can specify on a file-by-file basis which non-.shtml files get parsed.

Using XBitHack for this "file method" has two steps:

* turn on XBitHack by adding the statement to your .htaccess file
* "flag" the html pages you want parsed by changing their permissions to something a little out of the ordinary

If you created the htaccess.txt file above, simply add the statement given below to it and re-ftp/rename it to enable XBitHack. If your .htaccess file contains the AddType and AddHandler statements from above, REMOVE THEM. If you didn't create the file earlier, here are the steps to enabling XBitHack:

1. Use a text editor to create an htaccess.txt file and enter the following statement into it:

XBitHack on

2. Save the file and ftp it (using ASCII mode) to your Web root directory (or whatever directory your index.html file is in).

3. Rename the htaccess.txt file to .htaccess

4. CHMOD the page files, and only the page files, that you want parsed (i.e. that will contain SSI directives) to 744 (instead of 644). This is what tells the server to parse the page.

5. Try it out by entering a URL for one of the pages that contains an SSI directive and see if it's working.

If it doesn't work, check your error log for a message like

XBitHack not allowed here

It is possible that your host allows htaccess but not XBitHack. If you don't find the above error, you'll have to contact your host's technical support operation. However, by knowing what htaccess and XBitHack are, you can ask them intelligent questions regarding your problem. When they realize you know what you are talking about, they will be less likely to feed you a line of BS. Also, don't be surprised if the support person you speak to doesn't know what you are talking about. First-line technical support and sales people are usually entry-level jobs in an organization. If you get the sense they don't know what you are talking about, ask to speak to a more senior support person who does.

Wednesday, June 10, 2009

dbmmanage - Manage user authentication files in DBM format(apache binary)

DBM User Authentication

This week, we explain how to store user authentication information in DBM files for faster access when you have thousands of users.

The feature on User Authentication shows how to restrict pages to selected people. We showed how to use the htpasswd program to create the necessary .htpasswd files, and how to create group files to provide more control over the users. We also said that .htpasswd files and group files like this are not very efficient when a large number of users are involved. This is because these are plain text files and for every request in the authenticated area Apache has to read through the file looking for the user. A much faster way to store the user information is to use files in DBM format. This article explains how to create and manage DBM format user authentication files.

What is DBM?

DBM files are a simple and relatively standard method of storing information for quick retrieval. Each item of information stored in a DBM file consists of two parts: a key and a value. If you know the key you can access the value very quickly. The DBM file maintains an 'index' of the keys, each of which points to where the value is stored within the file, and the index is usually arranged such that values can be accessed with the minimum number of file system accesses even for very large numbers of keys.

In practice, on many systems a DBM 'file' is actually stored in two files on the disk. If, for example, a DBM file called 'users' is created, it will actually be stored in files called users.pag and users.dir. If you ever need to rename or delete a DBM from the command line, remember to change both the files, keeping the extensions (.pag and .dir) the same. Some newer versions of DBM only create one file.

Provided the key is known in advance DBM format files are a very efficient way of accessing information associated with that key. For web user authentication, the key will be the username, and the value will store their (encrypted) password. Looking up usernames and their passwords in a DBM file will be more efficient than using a plain text file when more than a few users are involved. This will be particularly important for sites with lots of users (say, over 10,000) or where there are lots of accesses to authenticated pages.

Preparing Apache for DBM Files

If you want to use DBM format files with Apache, you will need to make sure it is compiled with DBM support. By default, Apache cannot use DBM files for user authentication, so the optional DBM authentication module needs to be included. Note that this is included in addition to the normal user authentication module (which uses plain text files, as explained in the previous article). It is possible to have support for multiple file formats compiled into Apache at the same time.

To add the DBM authentication module, edit your Configuration file in the Apache src directory. Remove the comment from the line which currently says

  # Module dbm_auth_module     mod_auth_dbm.o

To remove the comment, delete the # and space character at the right-hand end of the line. Now update the Apache configuration by running ./Configure, then re-make the executable with make.

However, before compiling you might also need to tell Apache where to find the DBM functions. On some systems this is automatic. On others you will need to add the text -lndbm or -ldbm to the EXTRA_LIBS line in the Configuration file. (Apache 1.2 will attempt to do this automatically if needed, but you might still need to configure it manually in some cases). If you are not sure what your system requires, try leaving it blank and compiling. If at the end of the compilation you see errors about functions such as _dbm_fetch() not being found, try each of these choices in turn. (Remember to re-run ./Configure after changing Configuration). If you still cannot get it to compile, you might have a system where the DBM library is installed in a non-standard directory, or where the there is no DBM library available. You could either contact you system administrator, or download and compile your own copy of the DBM libraries.

Creating A DBM Users File

For standard (htpasswd) user authentication password files, the program htpasswd is used to add new users and set their passwords. To create and manage DBM format user files another program from the Apache support directory is used. The program is called dbmmanage and is written in perl (so you will need perl on your system, and it will need to have been compiled with support for the same DBM library you compiled into Apache. If you have only just installed DBM on your system you will might need to re-compile perl to build in DBM support).

This program can be used to create a new DBM file, add users and passwords to it, change passwords, or delete users. To start by creating a new DBM file and adding a user to it, run the command:

  dbmmanage /usr/local/etc/httpd/usersdbm adduser martin hamster

The creates the DBM file /usr/local/etc/httpd/usersdbm (which might actually consist of /usr/local/etc/httpd/usersdbm.dir and /usr/local/etc/httpd/usersdbm.pag), if it does not already exist. It then adds the user 'martin' with password 'hamster'. This command can be used with other usernames and passwords to add more users, or with an existing username to change that user's password. A user can be deleted from the password file with

   dbmmanage /usr/local/etc/httpd/usersdbm delete martin

You can get a list of all the users in the DBM file with

   dbmmanage /usr/local/etc/httpd/usersdbm view

Restricting a Directory

Now you have a DBM user authentication file with some users in it, you are ready to create an authenticated area. You can restrict a directory either using a section in access.conf or by using a .htaccess file. The feature on user authentication explained how you can set up a basic .htaccess file, using this example:

  AuthName "restricted stuff"
 AuthType Basic
 AuthUserFile /usr/local/etc/httpd/users

 require valid-user

To use DBM files, the only change is to replace the directive AuthUserFile line with

  AuthDBMUserFile /usr/local/etc/httpd/usersdbm

This single change tells Apache that the user file is now in a DBM format, rather than plain text. All the rest of the user authentication setup remains the same (so the authentication type is still Basic, and the syntax of require is the same as before).

Using Groups

Each user can be in one or more "groups", and you can restrict access to people just in a specified group. This makes it possible to manage all your users on your site in a single database, and customise the areas that each can access. The use of DBM files for storing group information is particularly efficient because you can use the same file to store both password and group information.

The dbmmanage command can be used to set group information for users. For example, to add the user "martin" to the group "staff", you would use

  dbmmanage /usr/local/etc/httpd/users adduser martin hamster staff

You put a user into multiple groups but listing them, separated by commas. For example,

  dbmmanage /usr/local/etc/httpd/users adduser martin hamster staff,admin

Note that dbmmanage has to be told the password as well, and there is no way to set or change group information for a user without knowing their password. This means in practice that dbmmanage is not suitable for managing users in groups, and you will have to write your own management scripts. Some help writing perl to manage DBM files is given later in this article.

After creating a user and group file containing details of which users are in which groups, you can restrict access by these groups. For example, to restrict access to an area to only people in the group staff, you could use:

  AuthName "restricted stuff"
 AuthType Basic
 AuthDBMUserFile /usr/local/etc/httpd/users
 AuthDBMGroupFile /usr/local/etc/httpd/users

 require group staff

Custom Management of DBM Files

The supplied dbmmanage script to manage DBM files is adequate for basic editing, but cannot handle advanced use, such as managing group information. It is also command line driven, while a Web interface might be a better choice in many situations. To do either of these things you will have to write programs to manage DBM files yourself. Using perl this is not too difficult.

As a simple example, say you have an existing .htpasswd file and you want to convert it to a DBM file, putting all the users in a specific group. We will introduce the concepts here, and there is a link below to the completed program for you to download. It will be written in Perl which is quick to write and easy to customise, although the principles of DBM use are the same whatever language is used.

The basic way to look in a DBM file is given here. DBM files are opened in Perl as 'hashed arrays'. The "key" is the user name, and the value is the encrypted password and optionally group information. A simple script to lookup all the keys and values in a DBM is:

  dbmopen(%DBM, "/usr/local/etc/httpd", 0644) ||
                        die "Cannot open file: $!\n";
 while (($key, $value) = each %DBM) {
   print "key=$key, value=$value\n";
 }
 dbmclose(%DBM);

Note that if the given DBM file does not exist, it will be created. This script will work with both perl 4 and perl 5 (although Perl 5 users might prefer to use the new tie facility instead of dbmopen). To lookup a known key you would use:

  $key = "martin";

 dbmopen(%DBM, "/usr/local/etc/httpd", 0644) ||
                                       die "Cannot open file: $!\n";
 $value = $DBM{$key};
 if (!defined($value)) {
   print "$key not stored\n";
 } else {
   print "key=$key, value=$value\n";
 }
 dbmclose(%DBM);

Now we can write a script to convert a htpasswd file into a DBM database, optionally putting each user into one or more groups. The script is htpasswd2dbm.pl , and is used like this:

  cd /usr/local/etc/httpd
 htpasswd2dbm.pl -htpasswd users usersdbm

The -htpasswd option specifies the htpasswd file to be read, the the final argument is the DBM file to create (or add to). To set a group, use the -group argument. For example, to put all the users from this file into the groups admin and staff, use

  htpasswd2dbm.pl -htpasswd users -group admin,staff usersdbm

The program will add users to an existing DBM database, so it can be used to merge multiple htpasswd files. If you give users from different files different groups, you will be able to set up access restrictions on a group-by-group basis, and manage all your users in one database. Note that if there is already a user with the same username in the DBM file it will be overwritten by the new information.

Group information stored in a DBM file as part of the value. If no group information is stored, the key associated with a username just consists of the encrypted password. To store group information, the encrypted password is followed by a colon, then a list of groups that the user is in, each separated by a comma. So a typical key might look like this:

  E7yT67YGht65:admin,staff

A program written in perl can easily extract the group information, for example:

  $value = $DBM{$key};
 ($enc, $groupfield) = split(/:/, $value);
 @groups = split(/,/, $groupfield);

It is also possible to store additional information in the DBM file, by following the groups list with a colon. Apache will ignore any data after a colon following the groups list, so it could be used, for example, to store the real name and contact details for the user, and an expiry date. This could be stored in the DBM like this:

  $DBM{$key} = join(":", $enc, join(",", @groups),
                          $realname, $company, $emailaddr,
                          $expdate);

Keeping all the user information together in a database like this, which Apache can also use for user authentication, can make administering a site with many users simpler.