Why should you help out web bots trying to scrape your data? Most users try to make their websites easy for search engines to crawl, but search engines usually only care about words and cache the entire page. There are other bots that look for specific data based on patterns it can find. Most of these bots get a bad reputation since you mainly hear about them when one gets in trouble for stealing email addresses that users leave openly on their website. Some users can get crazy and lock up all sorts of information bots would love, (including Google), such as user pages that hold information like post counts, game scores, or other statistics that can be used.
Making your HTML readable for web bots is generally easy. Most times you already iterate the data and put it in a way web bots can traverse it. We should always use plenty of ID attributes to clearly identify parts of the web page. This is especially important if you are not iterating data, like a user page, where you do a query at the top and place all the data appropriately. If none of the user profile has an id or another distinguishable feature, it might be difficult for web bots to find appropriate data. Also, make sure items are separate from their labels. Putting them next to each other can make it difficult to parse; a separate tag and the same parent is ideal.
Web APIs are also a great way for bots to use your data. Most sites already have RSS feeds for bots and users to use freely. Some sites even allow bots to freely use their site search engine, others ban it due to possible abuse. Since APIs like this are easy for bots to parse and less data is sent out, it can help both the website and the bot. If you make some sort of API, make sure all important information can be obtained from it.
(Source: Skynet Solutions)
By Blaine Schmeisser