revjim.net

October, 2002:

yesterday; today; help!!!

After waking up way too early for a Saturday morning, Jess and I headed out to the Chef’s Catalog warehouse sale with my parents. We acquired lots of high quality kitchen goodies at very good prices. Jess convinced me that I didn’t need to spend $70 on another knife. I don’t really need the best in forged, German high-carbon steel. Instead, I can make do with the best in economical, forged, Spanish high-carbon steel. The knives I have work great, hold a wonderful edge, and sharpen easily. There is no need in fixing something that isn’t broken.

We then went to brunch with my parents at Le Peep (whose slogan is “Le Breakfast, Le Lunch, Le Brunch”) and then came back to our place since my parents have never seen it. Then we ran to the liquor store so my parents could get some Tequila, and then to Sam’s club so my mom could pick up a set of the mixing bowls I have that she envied so much. After that, we stopped by Tuesday Morning for more housewares.

Next, Jess and I went to Thrift Town to look for Halloween costumes. We decided on what we are going to be, and purchased those things we needed to make our costumes. There was one part of her costume that we couldn’t find there for a reasonable price. We then went to my parent’s house to see if they had what we were looking for, but they didn’t. We decided that we would stay for supper. While my parents went to church, Jess and I went to Target, Dollar Tree, and Michael’s, but still couldn’t find what we were looking for at a reasonable price. We picked up some Rose’s Lime and Sweet and Sour and Target, stopped by Linen’s and Things to get some pasta containers, and then met my parents back at their place.

We made carne asada burritos for dinner. They were great. After cleaning the kitchen, we sat down for coffee and a game of Phase 10, which Jess won. In the true spirit of a good winner, Jess took my face into her hands and told me, “You did very well. You tried really hard and played a very good game. I am very proud of you.” Immediately following the last word of her last sentence, she turned around, stuck her butt in my face, shook it from side to side and proclaimed, “I won. I won. I won.”

We went to bed around 11:30pm, and, thanks to the conditional training a wonderful 8am – 5pm bill paying job provides you, I’ve been up since 6:30am (which actually means I got about 8 hours of sleep, since Daylight Savings Time ended last night). Jess is still asleep though, if Toby has things his way, she wont be for much longer.

There are a few more items I’d like to acquire to complete everything, though probably not today: some magnetic hooks to hang utensils from the side of the refrigerator; some wall mountable hooks to hang a few more utensils on the side of the cabinet; a utensil container to put on the counter top and help clear out the utensils drawer; a ceiling mountable pots and pans rack; a dish drying rack; a rug for the living room; a better garlic press; a few picture frames; a few wall tapestries; a nicer couch; a chair; a love-seat.

I’d like to spend today improving the apartment. I’d like to rearrange the Living Room, the Bedroom, and possibly even the Den. I’d like to hang pictures and plants, organize cabinets and drawers, arrange bookshelves and counter tops, lay down rugs, hang up towels, and just make everything look nicer. I’d like to hang the paper towel holder (finally) and the coffee organizer. I’d like to mount my rear-channel speakers. I could easily spend three or four days improving the apartment, and, if given an unlimited budget, several more days picking out the things I want and placing them appropriately.

Also, today, I think we are going to attempt to look for that last piece to Jess’ costume. Personally, I don’t think we’ll have any luck, so I’m enlisting your help. We are looking for large (life size, or slightly smaller) baby dolls. The condition they are in is not important. If they are missing legs or arms, that’s fine. If they don’t have any clothes, that’s fine too. If they are dingy, broken, or drawn on with permanent marker, they’ll work just fine. We need about 7 or 8 of these babies. If you have any we could borrow, or know where we could find some for less than $2 or $3 each, please let me know. If you want to lend us your dolls, we’ll take good care of them. However, it’s probably best not to lend us your million dollar collectible dolls, or that family heirloom that was originally the play-toy of your great-great-great-great-great grandmother that was once kissed by Jesus while he was hanging on the cross. You know… just in case something unfortunate should happen. Thanks in advance.

content-type for RSS elements

Phil Ringnalda talks about the problem of HTML data being present in an RSS feed. He suggests modifying the way aggregators work and the way publishers publish in order to solve this problem. “The only workable solution [...] is for all RSS producers [...] to include a content:encoded element, and for all aggregators to use content:encoded rather than description whenever it’s present. A plain text aggregator should be able to get away with just checking for < in the description, and only turning to stripping HTML from content:encoded if it’s present (an approach which isn’t likely to please a developer who only wants plain text in the first place, I’m afraid).” This sounds dirty to me.

I experienced the same frustration (for different reasons) last month. It seems to me the most proper solution would be to implement Chuck Shotton’s suggestion, originally made for RSS 0.94.

By allowing a content-type attribute for description, content:encoded, and possibly even title, and by setting reasonable defaults (text/plain for title and text/html for content:encoded and description aggregators can make more informed choices about how to handle the data they are seeing. If an aggregator doesn’t know how to display a particular content-type, it can provide it ask the operating system (or browser) to display the information for it. If a publisher chooses to do so, it can make life easier on the aggregator by providing multiple description (for instance) elements with different content-type attributes. Caution should be taken, however, to ensure that this doesn’t break current aggregators.

vim folding

I can’t believe I’ve been using VI/VIM for over 10 years and am just now finding out about folding.

human

Sometimes, I have trouble being human. It seems that those parts of me are the ones I dislike the most. And yet, of all the qualities contained within me, those are the ones I can’t readily change. Being stagnant disgusts me.

5.1 sound from a single source

A few days ago, some friends and I were talking about this device. Early in 2003, Pioneer will release the PDSP-1, a single source speaker system capable of producing discrete 5.1 channel surround sound audio. The device does not use psycho-acoustic techniques to simulate 5.1 channels. Instead, it actually bounces sound off of the ceiling and walls in order to direct the sound the proper locations in the room. [via Gizmodo]

damned if I do, and damned if I don’t

In my quest to develop a good, web-based, RSS aggregator, I’ve learned one thing: RSS is fucked. Not RSS, itself, but RSS as it is used. There are just too many variations of everything to accommodate anything. For instance, one complaint I’ve always had about the available RSS readers, is that feeds were always jumbled all together with, seemingly, no regard for any kind of date order. In my aggregator, I set out to correct this.

First, I try to look for a date on the item. Maybe a dc:date in Dublin Core format, maybe a rss:pubDate in standard format. I try to accommodate multiple timezones and formats. If, for some reason, a feed author doesn’t include item-level dates, I fall back to channel level dates. If an item is new to a channel, and the channel has a rss:pubdate a dc:date or an rss:lastbuilddate, then I can assume that that date can be safely applied to the new item. This isn’t always the case, but it’s a fairly safe assumption. Then, in case that isn’t provided, I also record the date/time when the item is parsed. When a feed is first parsed, all the items will have the same date (which is annoying). However, as new items are added, the date/time becomes closer and close to accurate (+/- the amount of time between updates). This seemed to be working okay, except, when you look into it deeper, it’s still really screwy.

If the system clock of the publisher is off a little (or if they should choose to date an entry 12 years into the future) things get skewed. Some users incorrectly(?) put a Dublin Core style date in a rss:pubdate field. It’s just annoying, more than anything.

I guess I jumped the gun a bit. RSS isn’t really fucked, and neither is the use of it. I just wish there were some consistency to make life a bit easier. I’m now realizing that, all of this time and effort I’ve spent in attempting to deduce a date for each item was futile, because, in some cases, it just isn’t possible.

Aside from that, the RSS aggregator is coming along nicely. I actually use it to do most of my daily blog reading. The HTTP GET portion of it, doesn’t play real nice just yet. It gets a new copy of the feed every time, without bothering to see if the feed has changed any. This is wasteful of bandwidth. The HTTP GET portion of it also doesn’t support basic authentication, though I am uncertain there is actually a need for this (unless, of course, LiveJournal were to modify it’s code to allow users to supply Basic HTTP authentication without requesting it. Then, by hardcoding an LJ username and password into the RSS reader, one could even get updates from those who post “friends only”.

One idea I am tossing around adding is adaptive update frequencies. Generally, if an item is added to a feed, more items will be added soon. This isn’t always the case, but it is likely. Additionally, those who update frequently, tend to continue to update frequently. Again, not always the case, but very likely. With this knowledge, one could adjust the update interval for each feed based on whether there were new items the last time we checked. For instance, if the interval is currently at 60 minutes, and an update is found, cut it in half, therefore making it 30 minutes. Every time the interval occurs and an update is NOT found, increase the interval by, 50% or so. Of course, the “reward” and “penalty” amounts would need to be fine tuned to find what most closely reflects the real world. The reason I suggest this is, as most users would, I’m sure, I like to have the most recent information possible, knowing INSTANTLY when someone updates, without having to check for RSS updates every 10 seconds. I guess I’ve been spoiled by the INSTANT nature of LiveJournal.

Comments and thoughts are always appreciated. And stay tuned for a software release, hopefully today or tomorrow.

a rose by any other name…

I need a name for a piece of software, and I’m enlisting you — yes YOU — for help.

This software, to summarize quickly, will bring its users news. And I use the term “news” loosely. It will bring actually, real, news, yes. It will also bring stock quotes, weather information, software annoucements, and your friend’s journal. For those slightly more technical, it is a news aggregator that manges RSS subscriptions. In the best possible case, I would have two names. One name for the part that fetches the news, and another name for the part that displays the news.

Please keep these guidelines in mind. The name should be short, or have a useable acronym. For instance, SCUBA is an acronym for “Self Contained Underwater Breathing Apparatus”. The longer name is descriptive, the acronym is easy to pronounce. Recursive acronyms are even better. For example, PHP stands for “PHP: Hypertext Preprocessor”. Catchy and/or silly names are also good. For instance, “Rover” might be a good name for the part of this program that fetches the news. Unfortunately, that name is taken already, which brings me to my final point. If at all possible, do a quick check to make sure the name isn’t already being used.

The submitter with the best name (as judged by me) will be honored in as many ways as possible on the software’s homepage.

Oh yeah… one more thing. I need the name today. So hurry.

feedParser v0.3

feedParser v0.3 has been released.

entities will be my death

PHP’s xml_parse_into_struct() function is severely broke, but I can’t tell whose fault it is. It uses the expat libaray for parsing, which has become a sort of standard amongst UNIX based SAX parsers. The problem could be there. Or it could be in PHP’s implementation of expat. Or it might not really be broken at all, just merely unusable. The problem has to deal with Entity parsing.

A little XML background, first. For our purposes, we’ll say an entity can be declared in two places: in the DTD referenced by that ugly “DOCTYPE” line you see at the top of many XML and HTML documents, or in the document itself. Here are the symptoms of this problem:

If a DOCTYPE declaration is given, entities will be substituted with their replacement values if that entity is declared in the document. If they are not declared, they are simply removed from the document. If an xml_default_handler() is set both defined and undefined (locally) entities will be unsubstituted and passed to the handler. This means that handler can do its own entity parsing. If a DOCTYPE declaration is not given (which is the case with almost EVERY RSS feed out there), all non-default entites result in a parser-error. This means that, if a DOCTYPE is present, one can either parse ALL entities manually, or have them thrown out. It is possible to use the parser to detect DOCTYPE declarations and fetch the DTDs in order to obtain the additional entities, however, this means that all entities will have to be handled manually as the DOCTYPE declaration triggers the xml_default_handler(), and, when it is set, all entities are not parsed. Expat has provisions to allow an xml_default_handler() to be set, and still have entities parsed, but PHP does not implement this (xml_default_handler_expand()). If a DOCTYPE is not present, one can only pray that entities are not present. That is unusable, in my book, as MANY, MANY, documents do not have a DOCTYPE declaration.

Sure, a DOCTYPE is required in order for an XML document to be valid, but this isn’t about valid XML. This is about being able to parse what users are actively creating on a daily (if not hourly) basis. All HTML documents are supposed to have a DOCTYPE in order to be valid, but if your browser REFUSED to parse them when an entity showed up, I’d guess well over 80% of the web would disappear. Even Mark Pilgrim‘s and Dave Winer‘s feeds don’t have a DOCTYPE declaration. That doesn’t mean their feeds aren’t well-formed, it just means they aren’t valid. The fact that PHP’s implementation of expat (and perhaps expat itself) dies with a parser error on an undefined Entity is absurd. Expat is supposed to be non-validating.

I think I’ve devised a less than perfect solution. I’ll define a set of entities within feedParser and I’ll parse out the entities before parsing the document for data. This way I can ensure that no entities in the XML data will break the parser, and I can ensure that all entities are parsed.

Wish me luck.

freedom

My country loves me so much that they protect me from hackers and theives by forbidding me from obtaining important security information that may help me defend my electronic resources, and the electronic resources of my corporation from hackers, thieves, and other mischief makers. In order to assure my safety, my country will even arrest innocent people performing actions considered perfectly legal in their own countries to ensure that this security information never falls on my ears, or the ears of any other member of my country. As long as my country is protecting me, I don’t care if the rest of the world takes us off the map.

I’m sure that, since my country is so concerned about me, one of its citizens, my country is also doing something to protect me from being harmed by the malicious acts of citizens of other nations who might have access to this, clearly, useless, for anything other than illegal activities, information. I’m also sure that, in the event that my electronic resources, or those electronic resources of my corporation are penetrated and/or stolen due to my country forbidding me from obtaining the free and public information I need to protect myself, they will be more than happy to accept all financial, criminal, civil, and moral responsibility in a court of law.

America stands for freedom. Freedom that is handed out by my country as it sees fit and taken away when my country decides it should be.

Isn’t America great?

[via kasia]