revjim.net

December 21st, 2002:

how to write better code

Writing good, clean PHP code isn’t hard. One just needs a few simple guidelines.

Start every project by building a “Model”.

Before you even consider what it will look like, or even think of one snippet of HTML, think of what your program will actually do. Then, begin to write small snippets of code that do those things. If you have to write the same control structure (the same if statement, for example) more than once, it should probably be in a function of its own.

Don’t use global variables. Ever. If you need a piece of data, provide a way to set it in the class, or pass it to the function as a parameter. Don’t even be tempted to use PHP’s built-in globals ($_GET and friends). They have a purpose, but they don’t belong here.

As you write the various functions, consider whether or not they would work in a different environment. What if they were being called from another instance of PHP running on someone else’s server? What if you decided you wanted to output everything in Flash, or in a PDF file? What if they were going to be used completely outside of a web browser?

Give these little snippets of code useful names. If it gets the time, call it getTime(). If it gets weblog entries, call it getEntries(). If it adds a user to the database, call it addUser(). I think you get the idea.

After you build all of these little functions, store them away in a file all by themselves and include it at the top of the other scripts in the application.

That step alone will make your code much more usable. You can now build many little PHP scripts that access this “Model” to get and store the data that is needed to make the application work. If you find yourself storing any data or getting any data and not using the “Model” you built, chances are it should be in there. Don’t be afraid to build more than one “Model”. For instance, if you are building a weblogging application, you might have one “Model” handle the weblog data, and another “Model” handle the user data (authentication and such).

However, if you’re doing things right, why stop there? The next step is to build “Controllers”.

Begin writing individual PHP scripts that call the functions in your “Model” and prepare the data to be used. They might handle checking to see if a user is logged in and starting a session. Some of these scripts might even access more than one “Model” — a weblog “Model” and a user management “Model” for example.

Some scripts will get data from the “Model”. Others will put data submitted via the web into the “Model”. You should not write any script that does both.

When writing a script that puts data into the “Model”, when the script is complete, it should issue a header('Location: http://something.com/some/path') command to forward the action on to a page that will get data from the “Model”. It should not output anything to the browser. In the case of an error, forward the user to a page (using the header() function) that will display that error message and allow the user to try again.

When getting data from the “Model” simply build up the data that you are getting and it in arrays. When the scripts nears completion it should have done nothing but build arrays of data.

At the end of the scripts you should either call a templating engine (like Smarty), or use include to include a PHP file that contains the actual HTML you wish to display.

Now that wasn’t too hard. You’ve only got one step left: building the “Views”.

If you are using a templating engine that allows logic (if structures, and variable formatters), then you merely need to build an HTML template that displays your data as you desire (Smarty works in this fashion). Since the “Controllers” call your templating engine, the templating engine and the associated templates serve as your view.

If your templating engine doesn’t allow logic (like FastTemplate, for example), then instead of calling the templating engine at the end of your “Controllers”, include another PHP script that will perform layout logic using the constructs for your templating engine. These small PHP scripts should do nothing but use the variables you built in the “Controllers” and call the various functions provided by your templating engine. They should never call the “Model”.

If you choose not to use a templating engine, then your “Controllers” should merely include another PHP script. This script will have HTML in it and should access only the variables built in the “Controllers”. Feel free to use PHP control structures (like if, while, foreach and family), and PHP functions that format data (like htmlentities() and date()). However, do not call the “Model”, or use any PHP functions that store or retrieve data (like mysql_query() and fopen()).

That’s it. You’re done.

What you have now is the same application, but much more usable. If I want to make your application fit the theme of my webpage, I merely change the “Views”. If I want to change the way the URLs look, I change the “Controllers”. If I want to display additional information from a completely unrelated application on the same page, I just alter the “Controllers”. If I want to change the real meat of the application, then I can get dirty and muck around in the “Model”. This allows the code to be separated into its components. The “Model” holds the part that actually does something. The “View” displays the data to the web browser. The “Controller” ties the two together while handling the incoming data that might have been submitted via form or via a QUERY_STRING.

There are other ways to produce clean code. Some of them involve incorporating the “Controllers” and the “Views” into the same step. Others introduce even more levels of separation. In some cases, it’s even okay to not separate code at all. Regardless of what method you use, unless your script is less than 50 lines of code, you should always have a “Model”.

These ideas aren’t difficult. The code works the same, and does the exact same things. It just allows each piece to be isolated so that they can be easily altered, debugged, or included in some other application.

cafelog: a look at bad code

If you program in PHP (or any language, really), and are developing something that you intend to release to the public, please, please, please spend the little bit of extra time it takes to develop your application properly. This means using function or classes to help break down and simplify the logic in your application, using meaningful variable names, and a directory structure that makes sense. When you do, you encourage other developers to improve your product, you encourage the use of your product in ways that you never intended, and you obtain a following from developers and users alike.

Because finding/writing a PHP based weblog application that works the way I think it should work is high on my currently priority list, I’m going to pick on Cafelog. I don’t have anything against its users or its developers aside from the use of poor coding practices. If you are a user of this product, please don’t tell me about how cool it is, or about how well it works. If you read a site powered by this engine, please don’t tell me about how easy you find it to use. If you are the developer of this product, please feel free to email me.

cafelog/b2 supports multiple authors, archives, templates, trackback, pingback, xmlrpc, categories, an RSS feed, and posting via email. It’s open source, written in PHP, and uses MySQL as a backend. It used fried pages (not baked) and a has a growing community. It’s probably a GREAT weblogging application. Unfortunately, I’ll never know because I refuse to use it.

There are some things about it that I don’t like, but they are merely a matter of taste (in most cases). These things are not the point of this article. However, I will enumerate them so that there is a better understanding of why I would want to modify the codebase to begin with. I should also point out that these features might actually exist in cafelog already, and I was just so unwilling to read every line of code written in order to see that.

It doesn’t cache calls to the database. This means that every time the page is requested, regardless of whether new content was published one second ago or one decade ago, my webserver takes a hit, and the database takes a hit. I’d like to see it cache the requests it makes to the database, and then not make them again until it is informed that the data has changed.

It doesn’t support Last-Modified and Etag headers. This means that if a user pounds on the “reload” button on their browser, it will have to pull all of the data from my server every time resulting in greater bandwidth usage. I’d like to see it use HTTP 1.1 features and send a 304 - Not Modified status to the client browser instead of the entire page when the content hasn’t changed.

I don’t care for the template system. I would prefer to use another template system allowing me more flexibility. By using my own template system, I can use the same layout files with this application as I use on the rest of my site allowing me to change the entire look of my site by editing a single layout template.

I don’t like the URLs it generates. I prefer search-engine friendly URLs that are void of question marks and ampersands. This allows the search-engines to index all of my pages, provides cleaner looking URLs, and makes cutting and pasting URLs much easier.

I don’t expect the author to share my vision of features. It’s his project, and he can have it work anyway that he likes. Because I am a PHP programmer, I am able to modify and/or reuse his code to suit my needs and my design ideas. While I would like to see these features in this product, I understand that he may want things his way for a reason. And I have no problem with that. What I do have a problem with, however, is my inability to use his code (without extensive reading and rewriting) to implement my own features.

Let’s dig into this a little bit deeper. From what little bit of code I did read I manage to deduce that most (all?) of the scripts begin with an include of blog.header.php. Opening that file makes me want to scream. This file is responsible for looking at all of the arguments that have been passed to the script and deciding exactly how the SQL query should look to generate the requested page. It then performs the query, and stuffs the results into $results which I can only imagine is used globally (AAAAAAAHHHHH!) in the remainder of the code, which is considered by Cafelog to be a “template”. And, since this file is included just about everywhere, there is actually provision at the end to issue an SQL statement of “SELECT 1=1” in the event that an SQL query is not needed. This page is where the most changes are needed, and therefore it is where we will focus.

It would be fairly trivial to wrap this code in a function that took one “struct” as an argument and returned an SQL query. And while this makes things much easier on a developer trying to use this code, there is an even better way. It should be broken up into several pieces. A class with many functions would be best, however, I’d settle for many functions outside of a class. In fact, though this would not be the best way to go, I would even take one big function taking many arguments, or even one function taking one giant struct as its parameter. These functions (or function) should return the actual MySQL result rows, however, even if it merely returned the SQL query string it could be wrapped a bit easier.

After rewriting this code, we would have a class (perhaps called Cafelog) with many functions. This can be done in many ways. One of which would be to make functions that would return different types of queries. This would result in functions like getRecentPosts($limit,$offset) and getPostsFromCategory($category). Another method would be functions that imposed various restrictions on the returned data with a final execute() function to perform the work. This might involve functions like $cl->limit($number,$offset), $cl->category($category), $cl->daterange($startts,$endts). In the end, however, one would find flexible, easy to use code, ready for modification, improvement, or alternate uses.

Building clean, reusable code is important. It means that you’ll implement new features quicker, require less work, see fewer bugs, and attract more developers. Writing dirty code is the easy way. Given enough examples to cut and paste from, my mom could manage dirty code. (Well maybe that’s giving her too much credit). Consider this the next time you pull out your favorite text editor to write some code.