For those of you (and you and you) who’ve been waiting patiently for a public release of GoFoR the aggregator, have no fear: it will be soon (read: within the next 20 years).
Today I made some modifications to prevent the aggregator from temporarily removing all the posts from a particular user if that feed can’t be reached on a particular run. There are a few more things I’d like to do before it’s ready to release, however.
First of all, I’d like an “add Feed” page. As it stands currently, adding a feed to the aggregator requires manually appending a line to a file on the server. I’d like a user to be able to simple fill in a box with the URL to the feed she wishes to add, alter a few preferences for that feed if she doesn’t like the defaults, and then see that feed in the aggregator. However, in order to allow additions via the web, a authentication interface is needed which adds another layer of complexity.
I’d also like the “add Feed” page to be able to auto-discover the URL of an RSS feed. This way, merely entering the URL to the front page of a news source will generally be enough to find the feed and add it to the system. With this feature in place, a bookmarklet could be created to make adding Feeds even easier.
The aggregator currently has a “bookmark” feature. This causes GoFoR to remember where you last left off reading so that it can give the user visual knowledge that she is now reading posts she’s already seen. However, this feature currently works using a variable on the URL line. With the authentication system in place, GoFoR should instead store this information on the server so that the “bookmark state” is maintained from browser to browser, whether the user is at home or at work.
GoFoR currently uses feedParser to actually pick out the interesting tidbits of information. Unfortunately, feedParser requires the document to be basically (there are a few exceptions) well-formed. I’d rather the parser be a bit more liberal, doing the best that it can in the event of a malformed RSS feed. This, however, brings lots of new problems to the table. Unless this parser is capable of knowing whether the corrections it has made are appropriate, it is possible that the user will miss posts in the feed. This happens only in a very specific circumstance, so it might not even be worth worrying about.
If a feed is “correct” and it is retrieved, those items will be added to GoFoR’s list of posts. However, if before the user gets a chance to view the most recent feed updates, the post becomes malformed and the parsers attempts to correct the malformed feed cause it to not detect a few items, then they will disappear from the list. Let’s assume that the user then views the GoFoR list while it is in this state. She will reset the bookmark when she is finished. In between this and her next view, let’s say the malformed feed gets corrected. GoFoR will see that some items in that feed have already been parsed and therefore will not present them to the user as “new” items when she returns the next time.
I know that’ll most likely never occur, but I’d prefer it not be an issue at all. I’m not entirely sure how to avoid it, however.
Currently, GoFoR uses the link tag to determine when it sees a new item. This means that if a RSS author changes the location of her permalinks, then these items all appear to be new to the GoFoR user. Additionally, this means that if an update to a post is made, this item is not flagged as “new” again. However, if a link tag is not available, GoFoR does what I believe to be the right thing by showing the user these posts again when they change. I think the best solution would be to show new items regardless of the presence of the link tag, and, if the link tag (or guid tag) is present and GoFoR determines that it has seen that URL before, visually mark the item as “updated”. This is actually fairly easy to do.
GoFoR currently stores cached copies of all the feeds it downloads. This allows it to use Conditional GET when the remote server supports it. Additionally, it has files that hold metadata for items, and files that contain the final parsed data (to keep from having to parse the XML file when the data hasn’t changed). In order to save disk space GoFoR could merely store the needed response header elements to perform Conditional GET as well as the time that the file was last updated in the metadata store. This would remove the need to save copies of My current feed cache occupies about 2MB of space. Regardless, a garbage collection routine is needed. If a feed is removed from GoFoR, the cached feeds, metadata, and parsed data files currently remain indefinitely. Keeping the metadata around longer than the cached XML files and the cached parsed data is a good idea, just in case the user should ever re-add that feed. However keeping them around forever is a little overkill.
When GoFoR adds a new feed, all the items in that feed show up as new items. I’d like to give users the option to add a feed and mark the existing items as “read”. This way, the user doesn’t have to see a bunch of “new” items in her list when they really aren’t new at all. This is also fairly easy to do.
That’s all the changes I can think of for now. If you have any comments, suggestions, or words of encouragement, I’d love to hear them.