Have you ever wished you could have an RSS feed for sites that don’t have RSS feeds? Well, I can’t give you that. However, I can give you an RSS feed that tells you when any particular page is updated. All you have to do is use my new script “Update Checker” (which REALLY needs a NEW name).
This script merely uses md5() on the contents of the page to determine if it’s changed. This means that, if anything on the page changes, you’ll be notified. If the time is displayed (in something other than JavaScript), if there are comments on the page (that aren’t provided by a JavaScript include), if there is a constantly updating list of weblogs.com pings (again, not provided by JavaScript) or any other information on the page that updates even when the page author has not included any new content, it will be counted as an update. This isn’t exactly desireable, however, without inventing a scraper for each site (and updating it when the author changes their layout), there aren’t many other ways.
However, in the event that the page you want to monitor falls into these specifications, then this tool may be for you.
It supports conditional GET on both sides of the communication. If the site you’re trying to check uses Conditional GET, it will recognize that and therefore save bandwidth on both ends. If your RSS reader supports conditional GET, it will recognize that and save even more bandwidth.
The script will also cache the page being checked in order to lessen the bandwidth blow. If the site being checked supports conditional GET, it will be cached for 5 minutes. If it does not, it will be cached for 30 minutes. If the site being checked is broken for some reason (500 error, 404 error, etc), another attempt to retrieve it won’t be made for 12 hours. These times may be altered to provide the best performance and the most flexibility.
The script also supports HTTP Redirect (both 301 and 302). In the event of a permanent redirect, the feed itself will notify you, the reader, that the URL has moved. Additionally, if the site being monitored is broken for some reason, the feed will also note that and let you, the reader, know when it will try to retrieve it again. The script also does its best to fix broken URLs (missing end slash, no http:// provided, etc).
I’ve put up a VERY UGLY page that will allow you to enter the URL and the NAME of the site you’d like to check. It will provide you with the URL to use as the RSS feed. I’ll make the page look nicer later on.
So give it a shot and let me know if you like it. If it tells you a page has updated when it hasn’t, let me know so I can figure out why. And if you can think of a better name, by all means, let me know. Additionally, let me know if you can think of any improvements. Later today I’ll release the source so you can see how it works.
Remember, this isn’t a perfect solution. It’s merely a way of getting around the limitations of other people’s sites.