Next step MySQL: using views to improve performance

Joor Loohuis, August 14, 2010, 21255 views.

When developers start using MySQL as a storage backend, they quickly find out how to insert, update, delete, and retrieve data. But often enough, progress tends to stop there, and problems arise when an application needs to scale. This article demonstrates how creating views may be a life saver for an underperforming application.

Tags: , ,

One of the first major steps for a developer in the process towards writing web applications is learning to use a database as a storage backend, typically MySQL. Many of these applications are described as CRUD, not because of shoddy work, but because they create, read, update, and delete objects in a database. For many projects, these operations suffice, and developers may not continue their development in using databases. This may lead to problems with their projects when the amount of data and/or users continues to grow. Many times it is a matter of learning a little more about using databases to alleviate scaling issues. For example, views were introduced in MySQL 5, and they provide a relatively easy way of increasing performance and simplifying queries at the same time. The best way to illustrate this, is to use a real-life example.

The blog you're reading at this moment is an application we developed ourselves as part of a larger framework. For the purpose of this article we'll simplify the internals a little bit. The table that holds blog articles essentially looks like below:

CREATE TABLE blog (
  bid int(10) unsigned NOT NULL auto_increment,
  author int(10) unsigned NOT NULL,
  published datetime NOT NULL,
  title varchar(255) NOT NULL,
  summary text NOT NULL,
  content mediumtext NOT NULL,
  status enum('edit','publish','hide') NOT NULL default 'edit',
  PRIMARY KEY  (`bid`),
  KEY `status` (`status`),
  KEY `author` (`author`),
  KEY `published` (`published`)
);

We've left out a few columns and indexes that are not pertinent to the example, but the core datamodel is very simple. Since we like to track the popularity of our articles, we've also implemented a table in which we track access. This table is defined as

CREATE TABLE tracker (
  system varchar(64) NOT NULL,
  id int(10) unsigned NOT NULL,
  sid varchar(64) NOT NULL,
  PRIMARY KEY  (system,id,sid),
  KEY id (id,system)
);

This table can also be used for other systems, hence the 'system' column, which contains 'blog' for rows tracking blog article access. The id then refers to the bid from the blog table, while the sid stores the session id of the user requesting a blog article. This last feature is intended to counter duplicates, but of course it is not completely effective.

As an example of a performance problem that grows over time, consider the query that selects the 5 most read articles from a blog. Based on the table structure described above, you could use a query like

SELECT b.*, COUNT(t.id) AS views
    FROM blog b
    JOIN tracker t ON b.bid=t.id AND t.system='blog'
    WHERE (b.status='publish' AND b.published<=NOW())
    GROUP BY t.id
    ORDER BY views DESC
    LIMIT 5;

At first glance not much seems wrong with this, apart from selecting the full content for an index. The reason for that is that we may be displaying part of the content on a homepage. Now consider the fact that the tracker table will continue to grow over time. To illustrate, this blog is not exceptionally popular, but in little over a year the tracker table has still gathered more that 100,000 entries for blog articles. As a result, the query above takes 1.91 seconds to evaluate, timed in phpMyAdmin on the system that is hosting the blog. That's painful.

Now let's add a view that does the adding up of the views for us:

CREATE VIEW aggregate AS
    SELECT id, system, COUNT(id) AS views
    FROM tracker
    GROUP BY system,id;

If you're not familiar with views, consider them a stored result of a select query, that we can access as if it were a table. MySQL takes care of updating the view when new rows are entered in the tracker table. Now we can rewrite our selection as:

SELECT b.*, a.views
    FROM blog b
    JOIN aggregate a ON b.bid=a.id AND a.system='blog'
    WHERE (b.status='publish' AND b.published<=NOW())
    ORDER BY a.views DESC
    LIMIT 5;

We're no longer grouping the views in the query, but we can simply join on the aggregate view, making the query a bit simpler. But far more importantly, this query gives the same results as the first, but evaluates in 0.16 seconds.

The results presented here are just a simple example of the potential performance benefit of using views, and your mileage will definitely vary. But views not only serve to enhance performance and scalability of your applications. They also help to simplify queries, and reduce the need for consecutive queries in your code. All in all, views are a very useful tool for web developers.

Social networking: Tweet this article on Twitter Pass on this article on LinkedIn Bookmark this article on Google Bookmark this article on Yahoo! Bookmark this article on Technorati Bookmark this article on Delicious Share this article on Facebook Digg this article on Digg Submit this article to Reddit Thumb this article up at StumbleUpon Submit this article to Furl

Talkback

respond to this article

Re: Next step MySQL: using views to improve performance (Frank, 2010-08-16 12:02 CEST)
A VIEW is _not_ a stored result of a query, it's just a stored query that gets executed every time you select this view. Use EXPLAIN and see how it works. MySQL is a bit limited in the usage of VIEW's, don't JOIN on VIEW's, it will kill performance.

If you need a stored result, you need a materialized view:
http://en.wikipedia.org/wiki/Materialized_view

This is not supported by MySQL but you can create it with the use of a table and some triggers.
Re: Next step MySQL: using views to improve performance (Bart McLeod, 2012-10-30 15:15 CET)
> A VIEW is _not_ a stored result of a query, it's just a stored
> query that gets executed every time you select this view. Use
> EXPLAIN and see how it works. MySQL is a bit limited in the usage
> of VIEW's, don't JOIN on VIEW's, it will kill performance.
>
> If you need a stored result, you need a materialized view:
> http://en.wikipedia.org/wiki/Materialized_view
>
> This is not supported by MySQL but you can create it with the use
> of a table and some triggers.

I read the same thing and I am surprised that this article is not updated. I thougth I would get an enormous gain from creating a view, but all you get is a simplified query. I tested a very complex query with five joins against 1450 rows. I also had three subqueries with joins in them.

What happened was this (timed in MySQL workbench):
With view:
Duration: 0.031 sec / Fetch Time 0.001 sec
Without view:
Duration: 0.003 sec / Fetch Time 0.023 sec

So without the view, the duration is 10 times faster, but the fetch time is 23 times slower (sometimes 12 times slower)
In any case, either can be neglected, which means the longest of the two is significant, which makes the view about a third slower than without the view.