Next step MySQL: using views to improve performance
August 14, 2010,
When developers start using MySQL as a storage backend, they quickly find out how to insert, update, delete, and retrieve data. But often enough, progress tends to stop there, and problems arise when an application needs to scale. This article demonstrates how creating views may be a life saver for an underperforming application.
One of the first major steps for a developer in the process towards writing web applications is learning to use a database as a storage backend, typically MySQL. Many of these applications are described as CRUD, not because of shoddy work, but because they create, read, update, and delete objects in a database. For many projects, these operations suffice, and developers may not continue their development in using databases. This may lead to problems with their projects when the amount of data and/or users continues to grow. Many times it is a matter of learning a little more about using databases to alleviate scaling issues. For example, views were introduced in MySQL 5, and they provide a relatively easy way of increasing performance and simplifying queries at the same time. The best way to illustrate this, is to use a real-life example.
The blog you're reading at this moment is an application we developed ourselves as part of a larger framework. For the purpose of this article we'll simplify the internals a little bit. The table that holds blog articles essentially looks like below:
CREATE TABLE blog ( bid int(10) unsigned NOT NULL auto_increment, author int(10) unsigned NOT NULL, published datetime NOT NULL, title varchar(255) NOT NULL, summary text NOT NULL, content mediumtext NOT NULL, status enum('edit','publish','hide') NOT NULL default 'edit', PRIMARY KEY (`bid`), KEY `status` (`status`), KEY `author` (`author`), KEY `published` (`published`) );
We've left out a few columns and indexes that are not pertinent to the example, but the core datamodel is very simple. Since we like to track the popularity of our articles, we've also implemented a table in which we track access. This table is defined as
CREATE TABLE tracker ( system varchar(64) NOT NULL, id int(10) unsigned NOT NULL, sid varchar(64) NOT NULL, PRIMARY KEY (system,id,sid), KEY id (id,system) );
This table can also be used for other systems, hence the 'system' column, which contains 'blog' for rows tracking blog article access. The id then refers to the bid from the blog table, while the sid stores the session id of the user requesting a blog article. This last feature is intended to counter duplicates, but of course it is not completely effective.
As an example of a performance problem that grows over time, consider the query that selects the 5 most read articles from a blog. Based on the table structure described above, you could use a query like
SELECT b.*, COUNT(t.id) AS views FROM blog b JOIN tracker t ON b.bid=t.id AND t.system='blog' WHERE (b.status='publish' AND b.published<=NOW()) GROUP BY t.id ORDER BY views DESC LIMIT 5;
At first glance not much seems wrong with this, apart from selecting the full content for an index. The reason for that is that we may be displaying part of the content on a homepage. Now consider the fact that the tracker table will continue to grow over time. To illustrate, this blog is not exceptionally popular, but in little over a year the tracker table has still gathered more that 100,000 entries for blog articles. As a result, the query above takes 1.91 seconds to evaluate, timed in phpMyAdmin on the system that is hosting the blog. That's painful.
Now let's add a view that does the adding up of the views for us:
CREATE VIEW aggregate AS SELECT id, system, COUNT(id) AS views FROM tracker GROUP BY system,id;
If you're not familiar with views, consider them a stored result of a select query, that we can access as if it were a table. MySQL takes care of updating the view when new rows are entered in the tracker table. Now we can rewrite our selection as:
SELECT b.*, a.views FROM blog b JOIN aggregate a ON b.bid=a.id AND a.system='blog' WHERE (b.status='publish' AND b.published<=NOW()) ORDER BY a.views DESC LIMIT 5;
We're no longer grouping the views in the query, but we can simply join on the aggregate view, making the query a bit simpler. But far more importantly, this query gives the same results as the first, but evaluates in 0.16 seconds.
The results presented here are just a simple example of the potential performance benefit of using views, and your mileage will definitely vary. But views not only serve to enhance performance and scalability of your applications. They also help to simplify queries, and reduce the need for consecutive queries in your code. All in all, views are a very useful tool for web developers.