revjim.net

May 9th, 2003:

URI issues: solved

I think it was Mart’s comment that really pushed this idea into my head. Inklog will be a CMS first, and a weblog second. The goal here is to get everything to be accessible via the same methods and easily accessible via the same methods. It isn’t to turn everything into a weblog entry. Therefore, ID numbered URLs are out. Every item in the CMS will have a name that is unique to its path. If a document is moved and a redirect is desired for old links, a redirect must be placed into the system. An entire category/path can be redirected if needed. If a weblog entry is posted, it should be titled as such (i.e. /weblog/20030509_Inklog). The module that handles weblog entries CAN provide predated and ID Numbered titles if that is desired by the user. However, the ID number will not be part of the Node system and will be generated independently.

Thank you to those of you who offered your comments and suggestions.

UpdateChecker: RSS based update notification

Have you ever wished you could have an RSS feed for sites that don’t have RSS feeds? Well, I can’t give you that. However, I can give you an RSS feed that tells you when any particular page is updated. All you have to do is use my new script “Update Checker” (which REALLY needs a NEW name).

This script merely uses md5() on the contents of the page to determine if it’s changed. This means that, if anything on the page changes, you’ll be notified. If the time is displayed (in something other than JavaScript), if there are comments on the page (that aren’t provided by a JavaScript include), if there is a constantly updating list of weblogs.com pings (again, not provided by JavaScript) or any other information on the page that updates even when the page author has not included any new content, it will be counted as an update. This isn’t exactly desireable, however, without inventing a scraper for each site (and updating it when the author changes their layout), there aren’t many other ways.

However, in the event that the page you want to monitor falls into these specifications, then this tool may be for you.

It supports conditional GET on both sides of the communication. If the site you’re trying to check uses Conditional GET, it will recognize that and therefore save bandwidth on both ends. If your RSS reader supports conditional GET, it will recognize that and save even more bandwidth.

The script will also cache the page being checked in order to lessen the bandwidth blow. If the site being checked supports conditional GET, it will be cached for 5 minutes. If it does not, it will be cached for 30 minutes. If the site being checked is broken for some reason (500 error, 404 error, etc), another attempt to retrieve it won’t be made for 12 hours. These times may be altered to provide the best performance and the most flexibility.

The script also supports HTTP Redirect (both 301 and 302). In the event of a permanent redirect, the feed itself will notify you, the reader, that the URL has moved. Additionally, if the site being monitored is broken for some reason, the feed will also note that and let you, the reader, know when it will try to retrieve it again. The script also does its best to fix broken URLs (missing end slash, no http:// provided, etc).

I’ve put up a VERY UGLY page that will allow you to enter the URL and the NAME of the site you’d like to check. It will provide you with the URL to use as the RSS feed. I’ll make the page look nicer later on.

So give it a shot and let me know if you like it. If it tells you a page has updated when it hasn’t, let me know so I can figure out why. And if you can think of a better name, by all means, let me know. Additionally, let me know if you can think of any improvements. Later today I’ll release the source so you can see how it works.

Remember, this isn’t a perfect solution. It’s merely a way of getting around the limitations of other people’s sites.