About Scatter Plots

The Qlik platform is all about analyzing data and making discoveries. However, in order to get valuable insights for your organization, you can’t just go around loading any data source and creating random charts. On the contrary, a good QlikView developer will always strive to use the most appropriate objects for each type of analysis.

Even though classic visualizations such as bar, line or pie charts are essential components of most applications, complex inquiries usually require more sophisticated tools to gain full understanding of the situation and make the best decisions possible. In this regard, one of my favorite visualizations is the scatter plot (Well, scatter plots and histograms, but we’ve already talked about those).

Although not very common, when used adequately, these charts can be real eye-openers. Sadly, its usage is still covered in a veil of mystery for the majority of the business users who –for a strange reason– seem to fear its power. But anyways, back to the story…

This chart stands out due its ability to elegantly handle great amounts of data. Though its simplest form only combines one dimension and two expressions plotted along the x and y axes, you can enrich them in several ways. Let’s start with an easy example:


Each bubble in this chart represents one of On Nom Nom Nom’s food trucks. As the y-axis embodies the sales amount, the higher the bubble is, the “stronger” the food truck. On the other end, the x-axis represents the Margin %. Therefore, a bubble far in the right could be categorized as “more intelligent” due to its higher profitability. In this case, the best scenario for the company would be to have most of the bubbles in the upper right corner, meaning that all the food trucks sell a lot but also have good margins.

To make this visualization clearer, we can add reference lines and define static of dynamic thresholds with variables and traditional expressions:


With this minor tweak, it could be easier to see which bubbles are meeting the objectives or which ones are located below the average. Marketing guys LOVE this kind of diagrams, especially because they can give fancy names to each quadrant like “cash-cows” or “question-marks”.

One of my favorite things about scatter plots is that they offer new perspectives of the information and let you get involved with the data not only by providing answers, but also by helping you formulate new questions. For instance, why is the purple dot further to the right? Even though it’s not one of the top sellers, its Margin % is remarkable. All our food trucks have the same menus and prices, so what could be the reason for this phenomenon? Maybe the people that visits this location prefers the gourmet dishes (which we know have a higher margin). Or maybe they’re doing something different in order to reduce their costs. If so, can we replicate that behavior in other food trucks to improve their margins too? Which one could be a good candidate to try it out? Maybe the one that is geographically closer, or perhaps one that shares a similar sales volume. Hey, we could even start an initiative to move up all the food trucks in the lower right corner by getting new customers or increasing the sales of the existing ones. Or would it be better to rescue the lower left quadrant somehow? Sorry, I got a little carried away… but you get the idea. This is a normal reaction when you use a well-designed QlikView app. And yes, that’s also the start of your data discovery journey.

So far this chart looks interesting, but what if we included another metric? The third expression in a scatter diagram changes the size of the bubbles. For instance, let’s add the number of customers to this equation:


Now, the context has slightly changed and new questions will surely arise. For example, why is the orange bubble in the top section so little? Its position implies that it has big sales but it size points out that it has few customers. In other words, its average ticket must be higher. In contrast, the greenish bubble in the lower-left corner must have a lower average ticket (not-so-amazing sales but a lot of customers), so it might benefit from introducing special deals like “Buy three burritos and your soda is free!” (If we have the customers, we only need to convince them to buy more stuff). Pretty neat right?

But wait, there’s more! Let’s say we implement all these new strategies in On Nom Nom Nom Food Trucks Inc. Surely, after a couple of weeks, the top management will want to measure their effectiveness. But, how can we accomplish this? Well, by adding a second dimension, it’s possible to animate the chart in order to perceive the changes in size and position over the last few weeks.

Though I could explain you how an animated scatter plot works, I think it’s better to let the real master do it:

To be honest I don’t like animated charts. Usually they’re more a distraction than a visual help. However, I have to admit that this is one of the few examples where a data-driven business user can really benefit from this functionality.

So that’s 2 dimensions and 3 expressions in a single chart! Furthermore, depending on the nature of the data, this object can help you:

  • Easily visualize clusters (a.k.a. Hey, it looks that these elements somehow belong together!)


  • Spot outliers (a.k.a. What the hell are you doing all the way over there?)

Scatter_05 - Original

  • See correlations (a.k.a. Interesting… when x goes up, y goes up as well!)


As you can see, a good ol’ scatter plots can help you better understand your business in several ways. In my opinion, this visualization has a lot more to give to QlikView users world-wide so, why not to try it in your next app?


By the way, if you want to learn more about data visualization, don’t forget to read my latest (and only) book, Creating Stunning Dashboards with QlikView. Our friends from PACKT Publishing are in a good mood and have given us a 40% discount on this title until December 20th, so don’t miss out!



3 thoughts on “About Scatter Plots

  1. LYC says:

    Nice topic !
    Do you have a particular view or recommendation on the usage of colors in scatter charts? (smarties/rainbow colors vs limited number of different colors?)

    • I usually try to give color a meaning in all my visualizations. For instance, in this scatter plot, you could base your palette on a categorical value like the type of food truck (Red=Burritos, Yellow=Pizza, Blue=Kebab). With this, it’s easy to see if certain category tends to have better margins or higher sales. In other situations (and knowing that it’s one of the least reliable ways to present data), I use color encoding to display a metric. For example, a gradient between red and blue using the colormix1() function that evaluates customer satisfaction.

      If you are dealing with a manageable number of bubbles (let’s say, less than 20), I think it’s OK to use a rainbow palette so the users can easily identify each element. Of course, I’d also recommend to use the Persistent Colors option so that the bubbles remain the same despite the selections.

      On the other hand, if I’m working with dozens of bubbles, I prefer to use a single, transparent color so the user can perceive the density of elements in certain zones. If you use solid colors and there are a lot of bubbles in the same area it’s impossible to interpret the data accurately.

      Now that you mention it, managing colors in this kind of charts if a great topic for a new post. I think I’ll start working on it right away! 🙂

  2. John says:

    Just getting to grips with Qlikview and would love to see more on use of colourmix for scatter charts.
    The default colour pallettes are nice to look at but not helpful for analysis

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s