revjim.net

September, 2002:

pseudo-code: web application framework

I’ve been going over some more of my ideas for this web application framework I’m attempting to design. Based on how I see the system working, I’ve developed some pseudo-code. There are some huge chunks of the code missing, like exactly what it is that processes the incomming HTTP request, and exactly what happens when the core.* functions are called (aside from what obviously happens by looking at the code) but it’s a start. Comments and feedback are greatly appreciated.

I’ve never programmed anything quite like this before. So, I’m looking at this from the perspective of a programmer who might be willing to use it as a toolkit to develop future applications. I’m attempting to make it easy on that person, and yet still force programmers/developers to adhere to certain rules to assure continuity, interoperability, and ease of use for the end user.


time module class:

function gettime($args) {

	return time();
}

function dopage($args) {

	$tpl = new Template();
	$tpl->assign("time",time.gettime());
	$tpl->display($args["template"]);

}

core.register_tag("time.gettime","time.gettime","");
core.register_page("time.gettime","time.dopage",array("template"));

page display class:

function showpage($args) {

	$tpl = new Template();
	$tpl->display($args["template"]);

}

core.register_page("page.showpage","page.showpage",array("template"));

URL association:

node      method	 parameters
time	  time.gettime   template=>timefrompage
showtime  page.showpage  template=>timefromtag

timefrompage template file:

Hello. The time is {$time}.

timefromtag template file:

Hello. The time is {time.gettime()}.

RSS 2.0 feed

revjim.net has replaced its RSS 0.91 feed with an RSS 2.0 feed using Mark‘s RSS 2.0 template for MovableType. The 2.0 format should be compatible with RSS 0.91 readers. If any of you experience difficulty, please let me know.

that’s MR grump to you, asshole

I’m broke. That alone sucks. As of about 6 hours ago, I’m also out of cigarettes. That sucks worse. In fact, if there were some device that recorded everything I have ever said, ever, one might hear this interesting tidbit coming from my mouth less than a year ago: “If I were poor, I mean really poor, I think I’d rather go hungry than go without a cigarette”.

Wish me (and Jess… sorry, babe) luck. I’ll have money again on Friday.

MovableType and the encoding of special characters

In my frustration with RSS I’ve learned a few things.

MovableType, regardless of your instructions to “encode_xml”, will not detect international characters (entered with 8-bit ASCII) and convert them into HTML entities. This means that if you type an 8-bit ASCII character into your “subject” field, you will have an 8-bit ASCII character sent to the browser, an 8-bit ASCII character in your static files, and an 8-bit ASCII character in your RSS feed. This is not good. By making certain that proper character sets were being sent in all the right places at all the right times, it is possible that there is a method in which to send an occasional international character in a method that is decodable and readable by the majority of possible readers, however, I know very little in regard to how to do this and I don’t feel like learning.

RSS is XML. Period. Because of that, proper XML entity encoding can be used. However, proper XML encoding of the letter “e” with an acute accent over it is not “é”. This is the HTML encoding and is not accepted in XML. Instead, the decimal or hexadecimal values of these characters in ASCII should be encoded as an entity: “é” and “é”, respectively. Doing this, however, does not cause MovableType to handle the situation any more elegantly. The & at the beginning of the entity is still encoded by MovableType (because of the “encode_xml” directive) leaving the data still encoded after decoding.

There is no real way to know if HTML is allowed in “title” or “description” sections of RSS. RSS 0.92 (which is supposed to merely be the documentation and specification of RSS 0.91 as it was being used) states that HTML can exist in either of these fields provided it is entity-escaped. RSS 0.91 specifies that HTML cannot exist, and it isn’t mentioned in RSS 0.9 though we can assume no. RSS 1.0 might allow it, but only with the content module. However, in practice, things are much different. HTML is used in “description” sections. HTML is not used in “title” sections. So, to be safe, don’t include HTML in your “title” or “description” sections if you want to ensure that everyone, everywhere, can read everything you produce. If you don’t care about that, then go do whatever you want and stop reading now.

It seems that using UTF-8 (or some other widely accepted international character set) is the most proper way to handle international characters, even on an occasional basis. Further research is required.

MovableType does not currently handle the use of XML/HTML entities gracefully in any context. This means that, if the user chooses not to use UTF-8, or something similar, the only option is to turn off the “encode_xml” attribute on all fields and ENSURE that ALL data that is intended to be delivered via XML be properly XML encoded by hand. This is possibly only because HTML and XML have similar entity encoding methods. If they didn’t, it might very well be impossible.

It is my opinion that MovableType should either perform an XML decode (which includes international character entities) before performing an XML encode (not a good idea, because data can be lost) or that it learn how to properly encode 8-bit ASCII characters. But it really isn’t their fault.

The whole situation is a mess, because it is unclear which content-type should be used to enter data in the various fields of MovableType. For instance, in the entry field, one generally types (or should type) properly encoded HTML. If MovableType then sends the full content of a post in an RSS feed, it should either perform an HTML decode and then an XML encode, or just send it in a CDATA container and let the client know that it is text/html. MovableType is doing the right thing. The data is being encoded to be safe to transmit via XML.

Take this as an example. Let’s assume that I want the title of a post to read “& and its friend &”. This particular line of text looks very different in various encoding methods. In plain text, it appears as above: “& and its friend &”. In HTML and XML encoding, it would appear differently: “& and its friend &”. If I client isn’t decoding the XML data that is being sent to it, then it is just plain stupid. After decoding the XML representation of the post’s title, it should read as we intended: “& and its friend &”. Unfortunately, it isn’t obvious how this should be entered into MovableType. If we type the data as HTML, then MovableType can display it straight to the client browser with no modification. However, if we type it as plain text, it must first be HTML encoded. Additionally, if we type the data as plain text, it must first be XML encoded before being safe to travel via RSS. However, if we type the data as HTML, then it must first be decoded into plain text, and then reencoded into HTML (which, in this case, results to the same string, but it might not always).

Since MovableType doesn’t dictate that ANY field be of any certain content-type, the user is left to choose. Here is what you need to remember, as a user: whatever you do, do it with consistency. The same way, every time, all the time.

If you choose to enter HTML in your fields, then always encode your fields properly. MovableType can’t decode and reencode all characters properly, so we are lucky that HTML and XML encode in similar fashion. The rule is, all XML encoding (except the CDATA container) works in HTML. Not all HTML encoding works in XML. So, use XML encoding and don’t use CDATA containers. Make sure that your templates do not contain “encode_html” or “encode_xml” directives for those fields you choose to enter as HTML or you will end up with double encoded data, which is very bad. Additionally, you should make certain your templates contain the content-type information whenever possible.

If you choose to enter plain text into your fields, then use ONLY plain text. Don’t type any HTML. Don’t use any entity encoding. Let MovableType do it all. This means you must use UTF-8 or some other acceptable international character set, and alter your templates to reflect the use of that character set. Make certain that you are using the “encode_html” and “encode_xml” directives for fields that are being entered as plain text.

You can choose different fields to be entered in different fashions, as long as you always do it the same. You cannot, however, choose different fields to be in different character sets. This means that, if you choose to use plain text for any field, then you should always use a UTF-8 or other international character set for all fields. Either that, or never use ANY special entities in the fields entered as plain text (aside from those that can be typed in plain 7-bit ASCII).

My recommendation is as follows. All titles and category names should be entered as plain text. Special characters should not be used in them unless you intend to use a character set designed for those characters. Because MovableType automatically strips HTML tags from the content of your post if you don’t provide an excerpt, it is best to use plain text when providing an excerpt of your own. This means no HTML can be included, and new lines are useless. The actual body of the post should be entered in HTML as anything else would really defeat the purpose of all this.

My head hurts.

HTML entities in RSS titles and descriptions

I attempted to add my RSS 1.0 feed to Syndic8 and it was rejected because my feed contains a “rogue & in titles”. The problem they are discussing comes from my post about the French language because of the “&” used to make the international entity in the title (ç makes a ç).

First of all, I disagree with the rejection of my feed. What if “français: partie un” was my desired title? My feed validates as proper RSS/XML, why is this reviewer judging my feed based on content (aside from my feed simply being full of test posts)? From what I can tell, it is allowable, and in fact the default, to encode RSS items as “text/html”. The data should be unencoded (therefore, changing “français” back into “français”) and sent to the client that way. The client should then determine, using the specifications of RSS, how to display the content; either as text/plain, as text/html, or in some other encoding all together.

Am I incorrect? Should all RSS data be designed to decode into “text/plain”? Is HTML not allowed in an article title or description? Is it common practice to use HTML inside those RSS tags? Is this strictly forbidden in the RSS specification in some place I haven’t found (which is possible since the specification is a lengthy read, and I haven’t gone over all of it)?

If I am incorrect, and RSS data should not be encoded as “text/html”, then what is the proper way to include an international character in the title of a post, or in the description of a post? If I am typing in plain text, how would I create any of the international characters? I could enter them XML encoded, which I do anyway, and then tell my blogging engine NOT to encode the data as XML. Then, I would be storing XML in my database, and decoding and reencoding for HTML (easy as pie since the two are basically synonymous) would be done during page generation.

The funniest part is that Syndic8.com doesn’t even remain consistent here. On the notes page for my feed, you’ll see that the note is encoded into HTML before display, signifying that the note itself should be entered in “text/plain”. However, on the action log page you’ll see that that same data has not been encoded into HTML, but instead sent straight to the browser, indicating that notes should be entered in “text/html”. Well, which is it? It’s hard to tell exactly how the note author actually typed the note, and how Syndic8 is processing it. Either way, Syndic8 is not doing the right thing, and it’s possible that the note author wasn’t doing the right thing either.

With closer inspection it appears that Syndic8 is really screwed up. On the “action log” page, looking at the HTML source shows the note as “Rogue & in titles”. However, on the “notes” page it shows “Rogue & in titles”. If we assume that Syndic8 stores the note exactly as it was typed by the author, tthen no encoding takes place on the “action log” page, and the data is encoded twice on the “notes page”. Either that or the note is decoded on the “action log” page and encoded on the “notes” page.

Update: I went ahead and entered my own “note” to test Syndic8′s handling of ampersands in note fields. I typed it as “& &”. On the “notes” page, Syndic8 shows that it has indeed encoded what I typed twice. Perhaps it is encoding before it stores it in the database, and then again before it displays it. I attempted to check the “action log” page, but it appears there is a bug in the Syndic8 code, because it is showing the same note twice there. However, based on how the “notes” page is displaying my data, and how the other data was displayed, I can make this fairly safe assumption. Syndic8 encodes the data before it stores it. On the “action log” page, it decodes the data (why, I’m not sure) and then offers it to the browser. On the “notes” page, it encodes the data (again) and then offers it to the browser. Therefore, if my assumptions are correct, my note would display as “& &” on the “action log” page, but only because browsers are not strict. In the HTML source my note would display as “& &”, which is not valid as the page would then contain a “rogue &”, the same thing they are accusing me of when, in actuality, mine does not contain a “rogue &”, merely an “&” that the reviewer did not beleive should be there. Their output would not validate. Mine does.

I think I’ve spent way too much time on this.

subconscious statement

I can’t think of a UNIX command that starts with these letters, or an application that contains these letters in this order in its name. I also cannot remember typing these letters in this order in the past 24-hours or so. Yet somehow, in my commandline bar, they sit there:

numb

scratchpad: application framework

I’m working on an application framework to foster the development of cleaner code and give PHP programmers incentive to do things the right way. When my application framework is complete, I will develop (over time) three modules for it: weblog, gallery, forum.

I really didn’t want to reinvent the wheel on this one, but it doesn’t seem like anything out there is doing things the right way. PHP SiteManager came close. However, their engine assumes that the underlying structure of everything is HTML. They have special constructs in their code to strip javascript code, and to generate forms. This is not wanted because it doesn’t place the flexibility in the hands of the module authors and the website administrators.

Below are my current notes on the design of this system. Comments, and/or assistance are appreciated.


  • Applications are made into modules.
  • All modules extend the same class which provides core functionality.
  • Modules can export Template tags.
  • Modules can export mappable functions.
  • Template tags are called during runtime while processing a template to gather additional data.
  • Mappable functions can be associated with a URL.
  • All GET variables are available as $_GET. All POST variables are available as $_POST. All SERVER variables are available as $_SERVER. All COOKIE variables are available as $_COOKIE. (This is standard PHP behavior. These variables will be exported into template space.)
  • $SERVER['PATH_INFO'] will be automatically made available in $_PATHINFO.
  • When providing parameters to mappable functions through a URL GET, POST, COOKIE, SERVER, and PATH_INFO variables will be made available.
  • Mappable functions can assign variables to template space.
  • Mappable functions should output something: a redirect, an error, or a template reference to parse and display.

Because of mappable functions and template tags, it is possible that there will be more than one way to do the same thing. For instance, a URL “recent” might be mapped to a function called “core:display” with a parameter of “recent” that merely parses and outputs the given template. This template might call a “blog:recent_entries” tag and use $_PATHINFO[0] to determine the offset. Additionally, a URL “recent” might be mapped to a function called “blog:recent_entries” with a parameter of $_PATHINFO[0] (which determines the offset) and this function might parse a template called “recent”. Both of these methods COULD provide the same functionality. However, in the 1st case, the template is parsed BEFORE the engine knows it will be getting blog entries. In the second case, the template is parsed AFTER the engine knows what it is doing. The biggest difference here is that, in the second case, a different template could be used based on the time of day, or the offset of the query. In the first case, this COULD be accomplished with LOTS of logic in template code, where it really doesn’t belong.

Example of a template that calls “blog:recent_entries” to produce a weblog:

[html]
[body]
[blog:recent_entries lastn='5' offset='$_PATHINFO[0]' r='data']
	[$data.date][br /]
	[$data.entry][br /]
	posted by [$data.author][br /]
	[hr /]
[/blog:recent_entries]
[/body]
[/html]

Example of the template that is called by the mapped function “blog:recent_entries”:

[html]
[body]
[foreach var='blogentries' r='data']
	[$data.date][br /]
	[$data.entry][br /]
	posted by [$data.author][br /]
	[hr /]
[/foreach]
[/body]
[/html]

In the second case, the mapped function, could, say, determine what language the blog being referenced was in, and then call, say, a French template, instead of an English one.

This could be FORCED into the template system like this:

[html]
[body]
[blog:get_lang r='lang' /]
[blog:recent_entries lastn='5' offset='$_PATHINFO[0]' r='data']
	[$data.date][br /]
	[$data.entry][br /]
	[if $lang == "es"]
		es de
	[else]
		posted by
	[/if]
	[$data.author][br /]
	[hr /]
[/blog:recent_entries]
[/body]
[/html]

This last method is considered hackish, and is not recommended.

buffy

The season premiere of Buffy is tonight featuring a special performance by the Goo Goo Dolls. According to various fan sites, this is the first episode Whedon, himself, has written since the musical last November, and it’s the first season opener he’s written since season 4.

Joel, do you want to host a little get together?

Other Buffy fans might be interested in Whedonesque: a community blog about Joss Whedon and his work.

writing better software: a world of a million wheels

My children, listen closely, for the ritual practice of today’s message will bring you all one step closer to salvation. The software world of today (specifically, web-software) sucks. There are hundreds — no, thousands — of web “programmers” creating scripts and programs that automate every little task, and implementing every small and large application imaginable. There are guestbooks, webstats, photo galleries, weblogs, directories, email gateways, paging gateways, news readers, friends lists, web-comic archives, update notification scripts, discussion forums, chat clients, online journals, stock quote grabbers, weather information retrievers, bathroom occupancy notifications, webcams, and many many more. This is great. This is wonderful. This is what God wants. Many of these scripts and applications are being released to the Internet public as shareware, under the GPL, or as open-source software in one way or another. This, too, pleases the Lord. However, all this effort is futile when the finished product is incapable of integrating or communicating with anything else, and when it is designed in such a fashion that, unless a user chooses to rewrite massive amounts of the code, all instances of the application look exactly the same, or very similar. Nothing angers our God more than this.

There are hundreds of programming languages employed to make websites come alive with less effort, more dependability, and higher usability. Languages like PHP, Perl, Python, Ruby, C, bash, and others. Some of these languages make developing reusable, customizable code a breeze. Yet the programmers do not use these features, or use them so poorly, they may as well not use them at all. Our Lord understands that applications developed in different programming languages may not communicate and integrate with one another. He accepts this, for now, as a deficiency in our technology and will provide us with proper tools when we are ready. But applications built in advanced, modular programming languages should be capable of integrating with other applications of the same language with ease. And when they don’t God cries. Listen to me, my children. Do you want to make God cry? Of course not.

Modern programming languages are Object Oriented, at least to some degree. If you are building code that you intend to release to the public, and you aren’t authoring it using such modern concepts, you are doing the software world an injustice and, because of this, God hates you. Yes, it may seem like a lot of extra work to write your small script that steals the current stock price of RedHat (RHAT) from Yahoo!’s finance page to display proudly on your own in a way that allows customization and modularity, but it is worth it.

You may be thinking, “Why bother? Plenty of people will benefit from the use of this script as it is.” This may be true, but think ahead just a little, my child. After your first 10 users follow your instructions of mashing your script into their web root, changing the extension of their homepage to .php and adding <?php include("myshittyscript.php"); ?>, you’ll, no doubt, receive an email asking how to make it pull a different stock quote. This will be no problem. You’ll modify your script to get the stock quote to grab from another file and tell the users to place their ticker symbol in there. But the requests and alterations won’t end. After several months, and many more users, your script will most likely enable users to get information on any stock they want, display historical data and graphs on its progress, cache the information to promote quicker display times, and display all of this information in pretty HTML. You’ll even include a nice admin interface for adding new stocks and deciding what information should and shouldn’t be shown.

Your script is large, very useful, and you’ve put a lot of work into it. Go you. But it still sucks. And God still hates you. Why? Because, your script only works your way, it only looks the way you want it to look, and anyone who uses your script must do things your way. If there is a user who only wants to look up the current market value of RedHat, they will be forced to have this information formatted your way, and configured as you see fit. Additionally, let’s assume that another programmer is working on a piece of stock portfolio management software. With your script as it is, it is useless to him. He must reimplement all of your hard work in order to provide stock lookup features in his application. However, had you, from the beginning, written your application modularly, and allowed the output of it to be customizable, the world would be much happier. Stupid users would still have the stock quote, and it would still look exactly as you designed it. More advanced users would retemplate the stock quote and customize it to include the information they want to see, and the programmer creating the stock portfolio management software would be able to reuse your lookup code in his own application with little to no work. You’re happy, your users are happy, programmers are happy, website developers with dreams to include stock quotes on their pages choose your script over others, and, most importantly, God doesn’t hate you any more.

Today’s message is this, my children: modularity and customization are the keys to good, reusable programming. They make the world happy, and they put a nice big smile on God’s face.

To use a more realistic example, take forum software created in PHP as an example. There are at least 50 different software packages to choose from. That means at least 50 people (most likely many more) have taken the time and effort to develop a very valuable tool. They have invested many hours thought, and programming, and debugging to create the application that does everything they need it to do. The funny thing is, all 50 of these applications do just about the same thing. If the very first application of this type were written modularly, it is possible that every author thereafter would have merely used this initial piece of code, customizing it for their own use. If that author found a better, faster way to do something, those changes could be made and then communicated back the original author who would upgrade the software package providing the world with an even better tool in their hands. Users who just want a forum, would download the software and use it right out of the box. Those who want the forum to integrate more closely with their website would customize it appropriately. Programmers could reuse the forum code to enable a commenting system in weblog software, or implement a ticket tracking system for helpdesks.

The world is better when code is reusable. This allows all users to be happy, and allows programmers to make the wheel better, as opposed to inventing it all over again. And, most importantly, if you write reusable code, God wont hate you anymore.

Think about it.

one spot of light among billions

She lies asleep in the sky
buried under a pile of
thick black blankets.
Her face is hidden, leaving only
her long, pale fingers to extend,
out of habit, resting on the tops
of buildings and trees.

Her breath is cool and inviting,
giving rise to goosebumps on my skin.
I watch her sleep. I watch the clouds,
barely visible, rise and fall
with each breath she takes.
Though she is unaware, I know that,
in minutes, the sky will wake up,
and her with it.

And only moments later,
she will fade into the daylight,
just one spot of light among billions
until the night comes again
where she hovers over all the world
as it sleeps.