revjim.net

October 2nd, 2003:

Inklog theory: the storage system

This has been consuming me for quite some time. I guess I just need to know that other people understand it, and agree that it is both doable and sensible. So, here’s an explanation of the theory of Inklog. I’m sure you’ve all been waiting for this for a while.

There’s a lot of data here. So please read over it carefully. At this point, I envision it all in a relational database. It doesn’t HAVE to be that way. I’m open to any and all suggestions, comments, improvements, ideas, and criticisms.

First, let’s lay out a table in which to store data. This is a UNIVERSAL table which can store ANY data, as you’ll see by it’s design. It has the following fields:

idparenttitledatatypeformatloc_nameloc_path

Let me explain each field one at a time.

id: This is a unqiue identification number. It is generated automatically and is unimportant aside from the fact that it is unique and unchanging. And ID number can be destroyed, however, that number will never be reused.

parent: This is the id of the parent. Used in order to maintain tree structure.

title: This is a nice friendly title for an item. It is indexable, and therefore searchable.

data: This is the data of an item. It can be in any format. It can be a weblog entry, a calendar item, a digital photograph, a user comment, or a page on your website. It’s anything and everything. It is not indexable and, therefore, not very easily searchable. This item could use improving.

type: This is the type of item we’re encountering. It describes which portions of the system should be called to handle this particular item. This might be "weblog" for a weblog entry, or "photo" for a digital photograph.

format: This is the type associated with the data. It is very similar (if not exactly like) a MIME Type. The field is indexable and searchable. At this point, its exact contents are undefined. Perhaps it’ll hold actual MIME Types (like text/html, image/jpeg, and others) or shorter, system specific types (like txt, html, jpeg, etc). In the case of this discussion, I won’t use MIME types. This item could use improving.

loc_name: This tells how an item is identified. Using ID numbers is cumbersome, ugly, and unintuitive. This allows the user to customize the name used to reference an item. It is changable (though that is not recommended) and MUST be unique within each parent. In other words, no two children of the same parent can have the same loc_name.

loc_path: This is the full path (using the loc_names of its ancestors) to the item. It is useful only to speed up the locating of an object. For instance, if we were addressing these items in a web browser, and wanted to find the item located at /revjim/weblog/private/2003-10-02/inklog we could find it in ONE query. As opposed to starting at the /revjim level and progressing forward one level at a time until inklog was reached. This item will not be indexable, as its length could be far too great. Therefore, its not easily searchable thus defeating the purpose of having it. It is possible that, in the end, it may be more beneficial to actually look for an item node by node, in which case, this field could be eliminated. Additionally, it should be noted that if any of an items ancestors’ loc_names are updated, the loc_path on the item will have to be updated to reflect that. This could be a time consuming update operation not worthy of the small amount of time savings this field might add. This item could use improving.

So far, this is fairly simple. In fact, the entire system is fairly simple to understand, it’s the implementation that is difficult.

Now, allow me to explain how this table would be used. Remember that ANY user can tailor the system to work EXACTLY the way that they want it to. For the purposes of this example, I’ll put all documents into the root node. This isn’t really how it should work, but it’ll save some confusion.

So, let’s say you want to make a single page website that contains a page named "hello". So, you’d need to create a single node. That node would, most likely, look like this:

id: 1parent: 0title: My Home Pagedata: hellotype: sitepageformat: txtloc_name: homeloc_path: /

The type field is the real trigger here. It tells the system how to display the data being presented to it. Based on this information, the system would know that it needs to use the "sitepage" module to render this data. It also knows that the data is in "txt" format, and will pass that as a parameter to the module. If one were trying to access this page via a URL, that URL might be http://me.com/home.

The system is smarter than that, however. Let’s say you wanted to access this item in plain text, as opposed to the default format of HTML. That’s easy, in your browser you might access http://me.com/home.txt. Of course this is all adjustable and fine tunable. It’s up to the "sitepage" module to decide how to render this item. However, modules can take another parameter that informs them of which format the user is requesting them in. Don’t get this confused with the "txt" entry in the format field. That tells the module what format it is IN, not what format the user would like to see. The HTTP server agent would be responsible for detecting the ".txt" extention on this request and passing it on the module. If it were desired (by the module designer, or the HTTP server agent designer) it could also be passed in this fashion: http://me.com/home?format=txt. However, in my mind that doesn’t look as pretty. What’s important to remember, however, is that the "sitepage" module acts independantly of the HTTP handling portions. So, if someone wanted to write an EMAIL handling server agent that would email documents to to users that requested them via email, this would be entirely possible using the same set of modules. Only the server agent would have to be written.

Let’s attempt a more complicated example. Let’s store a photograph. It might look like this:

id: 2parent: 0title: a picture of medata: :::JPEG DATA HERE:::type: photoformat: jpeg<loc_name: mypicloc_path: /

This is probably pretty self-explanatory. The "photo" module would be called and passed "jpeg" as a parameter. It would be responsible for rendering the data. However, it seems like a good idea to have "html" be the default output type for this item as well. In the case of an HTTP server agent, when http://me.com/mypic is requested it would call the "photo" module with an input type of "jpeg" and an output type of "html". Which means that, really, the picture isn’t even being sent in this request. However, when that HTML is generated an image is called which is accessed via the URL http://me.com/mypic.jpg. In this case, the same "photo" module is called. The input type is still "jpeg". However, the output type (as set by the HTTP server agent) is now "jpg". This tells the "photo" module to output the actual JPEG data.

Let’s complicate this further. Let’s say that I wanted a nice description to go along with my image. Lucky for me, the "photo" module builder implemented this. If a "photo" item has a child at loc_name description, it will be used as such. The entry for this description in the database might look like this:

id: 3parent: 2title:data: This is me in Italy. Check out the hot girl I'm with.type: photo/descriptionformat: txtloc_name: descriptionloc_path: /mypic/

This one gets a little weirder. Notice that the parent is set to the id of the "mypic" item. This is important. Secondly, this item doesn’t have a title. In this case, a title is unimportant. Also the type, in this case, is really not useful. In fact, I imagine I could just leave it blank. Maybe that’s a better idea. This area could use some improvement

Now let’s make a weblog entry:

id: 4parent: 0title: my first entrydata: this is my first entrytype: weblogformat: txtloc_name: myfirstentryloc_path: /

By now, you should understand what this would do. But I’ll go over it one more time. Let’s assume this is being handled by the HTTP server agent. The URL requested was http://me.com/myfirstentry. This caused the "weblog" module to be called. It was passed two parameters, one to inform it of the input type "txt", and the other to request the output type "html".

But wait. What if I want my entrys to be sorted nicer. By date perhaps. So, I’ll make an item to hold it:

id: 5parent: 0title: October 2003data:type: containerformat:loc_name: 200310loc_path: /

Now I’ll move the weblog entry into it, adjusting it to look like so:

id: 4parent: 5title: my first entrydata: this is my first entrytype: weblogformat: txtloc_name: myfirstentryloc_path: /200310/

Now, this item is accessed via the URL http://me.com/200310/myfirstentry. I hope that’s pretty clear.

Let’s say I also have a "comment" module installed. The "comment" module knows that it should only allow comments on items that have a metadata value for "allowcomments" set to "yes". Let’s add that metadata value to our weblog entry:

id: 6parent: 4title: allowcommentsdata: yestype: metadataformat:loc_name: m_allowcommentsloc_path: /200310/myfirstentry/

In this case, the loc_name is really unimportant. The "metadata" module would never allow this item to be displayed independantly. However, it must be unique to the parent, so I made one up. It doesn’t really matter, however. The html renderer is also smart enough to notice that this item accepts comments, and will provide a link to do so when rendering the page.

I’m sure it’s very easy to see how a comment would look once it was left. But I’ll give you that too:

id: 7parent: 4title: great sitedata: I love your site. It's great.type: commentformat: txtloc_name: c_1loc_path: /200310/myfirstentry/

Now, the portion of the system that actually allows the comment to get into the system… well… that’s a whole other entry.

screw you and your database

I’ve spent the last day or so writing, testing, and benchmarking a few different methods for data storage. The horrible part is, I’m not any closer to having any real solution. Well, maybe a little.

Through the process of elimination I’m left with two different database schemas and can’t really decide which would be better suited for my purposes since I haven’t any real data to use yet. I generate a bunch of random data but, it’s just that… random. It doesn’t really reflect, at all, how the data might be used in the real world.

Let me explain the two methods. I’m sure you’ve seen them before.

I’m trying to make a universal storage mechanism. Yes, I know that means I’m getting most of the shitty features and very few of the good features that an RDBMS provides me. Oh well.

The first method is the Modifed Preorder Tree Traversal method. The second method is it’s opposite, the Adjacency list method. Both are methods for tracking child -> parent data relationships.

The MPTT method, is fairly complex. The basic version of it is, by adding two additional id numbers to each item in the list, it is possible to predetermine the order (or… preorder) the list items. Every time a new item is added to list it’s position is determined and all of the items to the “right” of it are moved over to make a space for it. It’s mostly equivilent to giving each item two pieces of paper and lining the papers up in a straight line. Whenever you want to add an item inside of another, the older that item is (the more left on the line it sits) the more papers have to moved over and renumbered to make a space for it. However, the advantage is that, once everyone is lined up, figuring out who belongs to who is easy as pie — if number “2″ is to the left AND to the right of you, then you are a member of it.

The AL method is a bit easier to understand. Every child is told who its parent is. Then, in order to traverse the list, you start at the top and ask, “who belongs to person A”. When those are determined you, again, ask for their children to come forward and you continue until everyone’s been asked. This means that no reordering is done when a new item is added. You merely tell that item who its parent is and move on. However, when you want to see who belongs to who, you have to ask a lot of questions because noone knows who their grandfather is… they only know their own parent.

In my test case, with 20,000 random rows in the database, I’m able to obtain the entire tree (which happens very rarely) in 2.1568 seconds using the MPTT method. However, it takes a whopping 22.1202 seconds to gather the same tree using the AL method. Almost seems like a no-brainer, right? Not exactly. You have to remember that very rarely (never?) will someone request to know ALL of the children of the ROOT node. That’s just too much data to process. If, in cases of small amounts of data, it wasn’t too much to process, it also wouldn’t take nearly as long. You see, the length of time required to read using the AL method is directory proportional to the amount of recursion in that list. If there were only ONE parent and everyone else was its child, both methods would be equally as fast.

Writing, however, is another story entirely. With the same test data present, a write on the LEFT side of the tree (the more time consuming side) takes 3.6466 seconds using the MPTT method. Using the AL method, however, only 0.0179 seconds are required. However, one might still be inclined to say that, since reads are generally MUCH more frequent than writes, the MPTT method is worth the wait. However, there are more things to consider.

When the MPTT method is writing, the tree is unreadable. That means that for, in this case, 3.6466 seconds, no one can read anything. Additionally, the AL method’s read time was a truly worst case situation for it. In MOST cases, the entire tree will not be read. A more average example of a query shows the MPTT method taking 0.0094 seconds, and the AL method requiring 0.0680 seconds for the same read operation. These are much more livable numbers.

The real kicker comes when we analyze an actual USE of the data. For instance, because this is a universal data storage method, it stands to reason that it should be able to hold its own permission data (read, write, etc)… and it can. However, this introduces layers of complexity in the MPTT method that are difficult to overcome. It also means that, in some cases, the MPTT method gives more data than it should requiring longer query times and post-query processing. It’s possible that my SQL skills are not up to par and that there is a better way to do this, however. In many cases, once permissions are factored in, the same read is faster using the AL method than it is with the MPTT method simply because of the additional, unneeded, data being returned by the latter.

20,000 rows may seem like a lot of data for the possible uses of a system like this but, in reality, it isn’t. Consider using it for a weblog. Every post is an item. Every post item has several permissions, which are each items. Every comment is an item. Comments also have permissions, which are more items. By design (which could possibly be improved), metadata for items are stored as additional items belonging to the items they describe. This means, even more items in the system. Consider a simple blog like mine. I have about 3000 entries. I also have about 3000 comments. That’s 6000 items. Plus their related permissions…. adds 6000 “everbody can read” permissions, plus 3000 “everybody can comment” permissions, plus 3000 “jim can edit” permissions. That’s 18,000 items, already. They add up quickly, you see. Waiting 22 seconds to get a list of EVERYTHING I have to offer is not too long considering I have 6000 items to offer. However, waiting 3.6466 seconds to post a comment the first post I ever made is starting to look bad.

MPTT has the advantage that, newer items are updated quicker than older items. And, since newer items are most likely the ones being updated or commented on, this makes a lot of sense. However, AL has lighting fast write times and read times that, when coupled with permissions and reality checks, are fast enough that time doesn’t really matter.

I’ve considered placing the permissions in another table. This might increase the write time of the MPTT method as MANY rows would be removed from the system making its writes faster. However, since permissions items don’t have children, they aren’t very hard to move. Additionally, since items are always added at the END of a branch (right side) instead of at the beginning (left side), the case where a lot of things would need to be moved gets less and less likely. Additionally, since permission items aren’t placed on permission items themselves, it really wont save much in terms of the AL method’s read time since it’ll never return or check an item that it can’t find permission to read.

With the MPTT, the worst case would be a root node with two children in which the second (more RIGHT) child has MANY children of its own and an item is added to the first (more LEFT) child. Then all of the children of the second item have to move over. That’s time consuming. In this, very limited, case, adding a child to the root node would be the best case as it would only have to alter the root node itself. Second to that would be adding an item to the second child, as it would only have to move the end points on that second child and the root node.

With AL, the time required to perform a query grows slightly with the number of rows being returned, and grows GREATLY with the amount of recursion invovled in the queries. So the only real determining factor for speed is, how many levels deep is the tree?

It should also be noted that the MPTT method lends itself to doing lots of other less common tasks VERY easily. Fining the ancestoral chain for an item is easy as pie, for instance. Doing so in the AL method requires more recursive queries.

So… what do I do? I have no clue.

Comments and criticism are greatly appreciated. If you need more information on how the particular models are being used in order to provide optimization ideas, or alternate methods of operation, please let me know.