(page requirements)

robots.txt file related security issues

Did you used to say "I didn't steal any cookies" out of the blue and wondered how your mom found out that you were in fact stealing cookies before dinner?
 
Some of the robots.txt security related issues can be summarized using the above "cookies" example of voluntarily denying information that you don't want others to have. In other words, since the robots.txt file is accessible to everyone it should not be used to hide specific files or directories on your server.
 
For example, if you're trying to stop search engines from indexing a file named "list_of_my_passwords.txt" and a folder with sensitive information named "secrets_folder", adding their full names as follows should be avoided whenever possible:
 
Directory structure:
/list_of_my_passwords.txt
/secrets_folder/
Listing #1 : TEXT code. Download dir1 (0.17 KB).
 
robots.txt:
User-agent: *
Disallow: /list_of_my_passwords.txt
Disallow: /secrets_folder/
Listing #2 : TEXT code. Download donts1 (0.2 KB).
 
Instead, move your sensitive files and directories into a sub directory and exclude that sub directory by itself. As in the following example, excluding a non-specific directory name such as "folder_a" is a better solution.
 
New directory structure:
/folder_a/list_of_my_passwords.txt
/folder_a/secrets_folder/
Listing #3 : TEXT code. Download dir2 (0.17 KB).
 
New robots.txt:
User-agent: *
Disallow: /folder_a/
Listing #4 : TEXT code. Download dos1 (0.16 KB).
 
If you're unable to reorganize your directory structure, yet have a strong need to exclude certain directories from indexes, use only partial names in the robots.txt file. Although this may not be the best solution, it will at least make it almost impossible to guess full directory names. For example, to exclude "secrets_folder" and "list_of_my_passwords.txt" use following names (given that there aren't any other files or directories in the web root starting with those characters):
 
robots.txt:
User-agent: *
Disallow: /se
Disallow: /li
Listing #5 : TEXT code. Download dos2 (0.16 KB).
 
     Do not use the robots.txt file to protect or hide information.
 
Misunderstanding and misuse of any technology or tool, including paperclips, can be a security risk. Use the robots.txt file only for what it was intended to do -- as a way to suggest robots how to index content on your web server. You must use other security methods such as not using default pages for sensitive directories, removing "allow directory browsing" attributes, password protecting or even utilizing a firewall, depending on the desired level of security, if you really want to protect data on your web servers.
 
 
Applicable Keywords : HTML, Internet, Mini Tutorial, World Wide Web
 
 
 
Copyright © 2009 Chami.com. All Rights Reserved. | Advertise | Created in HTML Kit editor