revjim.net

February, 2003:

GoFoR the aggregator: it’s getting close

For those of you (and you and you) who’ve been waiting patiently for a public release of GoFoR the aggregator, have no fear: it will be soon (read: within the next 20 years).

Today I made some modifications to prevent the aggregator from temporarily removing all the posts from a particular user if that feed can’t be reached on a particular run. There are a few more things I’d like to do before it’s ready to release, however.

First of all, I’d like an “add Feed” page. As it stands currently, adding a feed to the aggregator requires manually appending a line to a file on the server. I’d like a user to be able to simple fill in a box with the URL to the feed she wishes to add, alter a few preferences for that feed if she doesn’t like the defaults, and then see that feed in the aggregator. However, in order to allow additions via the web, a authentication interface is needed which adds another layer of complexity.

I’d also like the “add Feed” page to be able to auto-discover the URL of an RSS feed. This way, merely entering the URL to the front page of a news source will generally be enough to find the feed and add it to the system. With this feature in place, a bookmarklet could be created to make adding Feeds even easier.

The aggregator currently has a “bookmark” feature. This causes GoFoR to remember where you last left off reading so that it can give the user visual knowledge that she is now reading posts she’s already seen. However, this feature currently works using a variable on the URL line. With the authentication system in place, GoFoR should instead store this information on the server so that the “bookmark state” is maintained from browser to browser, whether the user is at home or at work.

GoFoR currently uses feedParser to actually pick out the interesting tidbits of information. Unfortunately, feedParser requires the document to be basically (there are a few exceptions) well-formed. I’d rather the parser be a bit more liberal, doing the best that it can in the event of a malformed RSS feed. This, however, brings lots of new problems to the table. Unless this parser is capable of knowing whether the corrections it has made are appropriate, it is possible that the user will miss posts in the feed. This happens only in a very specific circumstance, so it might not even be worth worrying about.

If a feed is “correct” and it is retrieved, those items will be added to GoFoR’s list of posts. However, if before the user gets a chance to view the most recent feed updates, the post becomes malformed and the parsers attempts to correct the malformed feed cause it to not detect a few items, then they will disappear from the list. Let’s assume that the user then views the GoFoR list while it is in this state. She will reset the bookmark when she is finished. In between this and her next view, let’s say the malformed feed gets corrected. GoFoR will see that some items in that feed have already been parsed and therefore will not present them to the user as “new” items when she returns the next time.

I know that’ll most likely never occur, but I’d prefer it not be an issue at all. I’m not entirely sure how to avoid it, however.

Currently, GoFoR uses the link tag to determine when it sees a new item. This means that if a RSS author changes the location of her permalinks, then these items all appear to be new to the GoFoR user. Additionally, this means that if an update to a post is made, this item is not flagged as “new” again. However, if a link tag is not available, GoFoR does what I believe to be the right thing by showing the user these posts again when they change. I think the best solution would be to show new items regardless of the presence of the link tag, and, if the link tag (or guid tag) is present and GoFoR determines that it has seen that URL before, visually mark the item as “updated”. This is actually fairly easy to do.

GoFoR currently stores cached copies of all the feeds it downloads. This allows it to use Conditional GET when the remote server supports it. Additionally, it has files that hold metadata for items, and files that contain the final parsed data (to keep from having to parse the XML file when the data hasn’t changed). In order to save disk space GoFoR could merely store the needed response header elements to perform Conditional GET as well as the time that the file was last updated in the metadata store. This would remove the need to save copies of My current feed cache occupies about 2MB of space. Regardless, a garbage collection routine is needed. If a feed is removed from GoFoR, the cached feeds, metadata, and parsed data files currently remain indefinitely. Keeping the metadata around longer than the cached XML files and the cached parsed data is a good idea, just in case the user should ever re-add that feed. However keeping them around forever is a little overkill.

When GoFoR adds a new feed, all the items in that feed show up as new items. I’d like to give users the option to add a feed and mark the existing items as “read”. This way, the user doesn’t have to see a bunch of “new” items in her list when they really aren’t new at all. This is also fairly easy to do.

That’s all the changes I can think of for now. If you have any comments, suggestions, or words of encouragement, I’d love to hear them.

Canon will release a 6MP digital SLR for under $1500

Canon’s soon to be released Canon EOS 10-D [reviewed by dpreview.com] looks incredibly nice. A 6MP digital SLR that accepts the same lenses as its film counter parts. All this for around $1500.

As much as I love my Nikon gear, I might be willing to sell it in order to be able to use the same lenses for film and digital photography without having to sell my kidneys on the black market in order to afford the digital body. For about $2000 I could probably pick up a film body, this digital body and a lens.

I wonder what I could get for my current gear (a Nikon N90s, several lenses, and a Minolta dImage 7)? I’d really prefer that Nikon come out with a good digital SLR for under $1500, but I don’t think that’s going to happen any time soon.

untitled

[city portraits]

untitled

Near Downtown
Sudbury, ON
Canada

Minolta dImage 7

A quiet Sunday

[city portraits]

A quiet Sunday

Camp
Noelville, ON
Canada

Minolta dImage 7

how to block spambots

Mark Pilgrim writes regarding how to block spambots, ban spybots, and tell unwanted robots to go to hell. Even if you aren’t having trouble with evil website robots (programs that download every page of your website in search of various pieces of information that are rarely of any value to you) or don’t intend to do anything to fight back it’s worth reading just to see how many of them are out there. Mark has done a great job of compiling a list of such offenders and giving detailed instructions for forbidding them to access the data on your webserver. After all, why should we pay for the bandwidth they use to eventually provide services that hurt us.

Comment: sticky notes for the UNIX shell

If you live in the UNIX shell and are very directory oriented then Comment may be a useful tool. It allows you to leave yourself notes in various directories. It stores its data in plain text, so you can use regular UNIX tools to search, sort, and rummage through the notes you have left.

  • Comment stores comments on a per directory basis. Your comments are relevant to your location
    Leave reminders for…

    • how to compile a source file
    • what state a particular document was in before you had to leave
    • which parameters to use in order to invoke a program to do what you want
    • collaborative working notes
    • …the possibilities go on
  • Comment also keeps a central store of all your own personal comments in your home directory, available at any time for you to search.
  • Comment stores data in a plain text format, allowing you to use existing text tools to search and manipulate as you need
  • Comment uses UNIX file access rights for easy control of permissions

Unfortunately, regardless of how hard I try not to, I tend to save everything to the root of my home directory and then forget later to delete or remove it. Therefore, this tool isn’t very helpful for me.

Python Desktop Server

The Python Desktop Server, aka Radio Killer, looks very promising. If you’re a Python lover, or are looking for a full-featured website management (which includes weblog functionality) then this might be the tool for you.

I deserve it

It’s 22°F out. 15°F if you count the wind chill. But it’s pretty warm in the parking garage where I work simply because it’s underground. So, even though I brought my jacket this morning, I opted to leave it in the truck knowing I wouldn’t need it in the building.

Someone pulled the fire alarm.

“There has been an alarm in the building. Please walk slowly to the nearest exit. Do not use the elevators. BEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEP!”

I figured I trick them and hide out in the parking garage until the “fire” went away. But one of the appointed “Emergency Response Team” members approached me and informed me that where I was standing was not safe in the event of a fire and that I needed to evacute the facility.

“There has been an alarm in the building. Please walk slowly to the nearest exit. Do not use the elevators. Yes, we’re doing this just because it’s cold and you didn’t bring your jacket. Sucker. BEEEEEEEEEEEEEEEEEEEEEEEEEEEEEP!”

It took me a while to figure out what the blue things attached to my body were. Oh yeah… arms.

A few more bricks

[city portraits]


(click to enlarge)

a few more bricks

Heritage Park
Fort Worth, TX

Minolta dImage 7

Don’t be alarmed

Don’t be alarmed if things seem a little strange around here. revjim.net is currently moving servers. I’ll have more details regarding our new setup in a little while. Until then, please let me know if you experience any problems.