revjim.net

Inklog theory: the storage system

This has been consuming me for quite some time. I guess I just need to know that other people understand it, and agree that it is both doable and sensible. So, here’s an explanation of the theory of Inklog. I’m sure you’ve all been waiting for this for a while.

There’s a lot of data here. So please read over it carefully. At this point, I envision it all in a relational database. It doesn’t HAVE to be that way. I’m open to any and all suggestions, comments, improvements, ideas, and criticisms.

First, let’s lay out a table in which to store data. This is a UNIVERSAL table which can store ANY data, as you’ll see by it’s design. It has the following fields:

idparenttitledatatypeformatloc_nameloc_path

Let me explain each field one at a time.

id: This is a unqiue identification number. It is generated automatically and is unimportant aside from the fact that it is unique and unchanging. And ID number can be destroyed, however, that number will never be reused.

parent: This is the id of the parent. Used in order to maintain tree structure.

title: This is a nice friendly title for an item. It is indexable, and therefore searchable.

data: This is the data of an item. It can be in any format. It can be a weblog entry, a calendar item, a digital photograph, a user comment, or a page on your website. It’s anything and everything. It is not indexable and, therefore, not very easily searchable. This item could use improving.

type: This is the type of item we’re encountering. It describes which portions of the system should be called to handle this particular item. This might be "weblog" for a weblog entry, or "photo" for a digital photograph.

format: This is the type associated with the data. It is very similar (if not exactly like) a MIME Type. The field is indexable and searchable. At this point, its exact contents are undefined. Perhaps it’ll hold actual MIME Types (like text/html, image/jpeg, and others) or shorter, system specific types (like txt, html, jpeg, etc). In the case of this discussion, I won’t use MIME types. This item could use improving.

loc_name: This tells how an item is identified. Using ID numbers is cumbersome, ugly, and unintuitive. This allows the user to customize the name used to reference an item. It is changable (though that is not recommended) and MUST be unique within each parent. In other words, no two children of the same parent can have the same loc_name.

loc_path: This is the full path (using the loc_names of its ancestors) to the item. It is useful only to speed up the locating of an object. For instance, if we were addressing these items in a web browser, and wanted to find the item located at /revjim/weblog/private/2003-10-02/inklog we could find it in ONE query. As opposed to starting at the /revjim level and progressing forward one level at a time until inklog was reached. This item will not be indexable, as its length could be far too great. Therefore, its not easily searchable thus defeating the purpose of having it. It is possible that, in the end, it may be more beneficial to actually look for an item node by node, in which case, this field could be eliminated. Additionally, it should be noted that if any of an items ancestors’ loc_names are updated, the loc_path on the item will have to be updated to reflect that. This could be a time consuming update operation not worthy of the small amount of time savings this field might add. This item could use improving.

So far, this is fairly simple. In fact, the entire system is fairly simple to understand, it’s the implementation that is difficult.

Now, allow me to explain how this table would be used. Remember that ANY user can tailor the system to work EXACTLY the way that they want it to. For the purposes of this example, I’ll put all documents into the root node. This isn’t really how it should work, but it’ll save some confusion.

So, let’s say you want to make a single page website that contains a page named "hello". So, you’d need to create a single node. That node would, most likely, look like this:

id: 1parent: 0title: My Home Pagedata: hellotype: sitepageformat: txtloc_name: homeloc_path: /

The type field is the real trigger here. It tells the system how to display the data being presented to it. Based on this information, the system would know that it needs to use the "sitepage" module to render this data. It also knows that the data is in "txt" format, and will pass that as a parameter to the module. If one were trying to access this page via a URL, that URL might be http://me.com/home.

The system is smarter than that, however. Let’s say you wanted to access this item in plain text, as opposed to the default format of HTML. That’s easy, in your browser you might access http://me.com/home.txt. Of course this is all adjustable and fine tunable. It’s up to the "sitepage" module to decide how to render this item. However, modules can take another parameter that informs them of which format the user is requesting them in. Don’t get this confused with the "txt" entry in the format field. That tells the module what format it is IN, not what format the user would like to see. The HTTP server agent would be responsible for detecting the ".txt" extention on this request and passing it on the module. If it were desired (by the module designer, or the HTTP server agent designer) it could also be passed in this fashion: http://me.com/home?format=txt. However, in my mind that doesn’t look as pretty. What’s important to remember, however, is that the "sitepage" module acts independantly of the HTTP handling portions. So, if someone wanted to write an EMAIL handling server agent that would email documents to to users that requested them via email, this would be entirely possible using the same set of modules. Only the server agent would have to be written.

Let’s attempt a more complicated example. Let’s store a photograph. It might look like this:

id: 2parent: 0title: a picture of medata: :::JPEG DATA HERE:::type: photoformat: jpeg<loc_name: mypicloc_path: /

This is probably pretty self-explanatory. The "photo" module would be called and passed "jpeg" as a parameter. It would be responsible for rendering the data. However, it seems like a good idea to have "html" be the default output type for this item as well. In the case of an HTTP server agent, when http://me.com/mypic is requested it would call the "photo" module with an input type of "jpeg" and an output type of "html". Which means that, really, the picture isn’t even being sent in this request. However, when that HTML is generated an image is called which is accessed via the URL http://me.com/mypic.jpg. In this case, the same "photo" module is called. The input type is still "jpeg". However, the output type (as set by the HTTP server agent) is now "jpg". This tells the "photo" module to output the actual JPEG data.

Let’s complicate this further. Let’s say that I wanted a nice description to go along with my image. Lucky for me, the "photo" module builder implemented this. If a "photo" item has a child at loc_name description, it will be used as such. The entry for this description in the database might look like this:

id: 3parent: 2title:data: This is me in Italy. Check out the hot girl I'm with.type: photo/descriptionformat: txtloc_name: descriptionloc_path: /mypic/

This one gets a little weirder. Notice that the parent is set to the id of the "mypic" item. This is important. Secondly, this item doesn’t have a title. In this case, a title is unimportant. Also the type, in this case, is really not useful. In fact, I imagine I could just leave it blank. Maybe that’s a better idea. This area could use some improvement

Now let’s make a weblog entry:

id: 4parent: 0title: my first entrydata: this is my first entrytype: weblogformat: txtloc_name: myfirstentryloc_path: /

By now, you should understand what this would do. But I’ll go over it one more time. Let’s assume this is being handled by the HTTP server agent. The URL requested was http://me.com/myfirstentry. This caused the "weblog" module to be called. It was passed two parameters, one to inform it of the input type "txt", and the other to request the output type "html".

But wait. What if I want my entrys to be sorted nicer. By date perhaps. So, I’ll make an item to hold it:

id: 5parent: 0title: October 2003data:type: containerformat:loc_name: 200310loc_path: /

Now I’ll move the weblog entry into it, adjusting it to look like so:

id: 4parent: 5title: my first entrydata: this is my first entrytype: weblogformat: txtloc_name: myfirstentryloc_path: /200310/

Now, this item is accessed via the URL http://me.com/200310/myfirstentry. I hope that’s pretty clear.

Let’s say I also have a "comment" module installed. The "comment" module knows that it should only allow comments on items that have a metadata value for "allowcomments" set to "yes". Let’s add that metadata value to our weblog entry:

id: 6parent: 4title: allowcommentsdata: yestype: metadataformat:loc_name: m_allowcommentsloc_path: /200310/myfirstentry/

In this case, the loc_name is really unimportant. The "metadata" module would never allow this item to be displayed independantly. However, it must be unique to the parent, so I made one up. It doesn’t really matter, however. The html renderer is also smart enough to notice that this item accepts comments, and will provide a link to do so when rendering the page.

I’m sure it’s very easy to see how a comment would look once it was left. But I’ll give you that too:

id: 7parent: 4title: great sitedata: I love your site. It's great.type: commentformat: txtloc_name: c_1loc_path: /200310/myfirstentry/

Now, the portion of the system that actually allows the comment to get into the system… well… that’s a whole other entry.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>