About Scatter Plots

The Qlik platform is all about analyzing data and making discoveries. However, in order to get valuable insights for your organization, you can’t just go around loading any data source and creating random charts. On the contrary, a good QlikView developer will always strive to use the most appropriate objects for each type of analysis.

Even though classic visualizations such as bar, line or pie charts are essential components of most applications, complex inquiries usually require more sophisticated tools to gain full understanding of the situation and make the best decisions possible. In this regard, one of my favorite visualizations is the scatter plot (Well, scatter plots and histograms, but we’ve already talked about those).

Although not very common, when used adequately, these charts can be real eye-openers. Sadly, its usage is still covered in a veil of mystery for the majority of the business users who –for a strange reason– seem to fear its power. But anyways, back to the story…

This chart stands out due its ability to elegantly handle great amounts of data. Though its simplest form only combines one dimension and two expressions plotted along the x and y axes, you can enrich them in several ways. Let’s start with an easy example:


Each bubble in this chart represents one of On Nom Nom Nom’s food trucks. As the y-axis embodies the sales amount, the higher the bubble is, the “stronger” the food truck. On the other end, the x-axis represents the Margin %. Therefore, a bubble far in the right could be categorized as “more intelligent” due to its higher profitability. In this case, the best scenario for the company would be to have most of the bubbles in the upper right corner, meaning that all the food trucks sell a lot but also have good margins.

To make this visualization clearer, we can add reference lines and define static of dynamic thresholds with variables and traditional expressions: Continue reading


It’s not a bar chart!

Every professional is as good as his toolset and as a QlikView developer, there’s always room for one more trick under your sleeve. Today I will show you one of the most powerful –yet underused– chart for analyzing data: the histogram. Even though it is easy to create it I haven’t seen a lot of developers take advantage of it.

The important lesson here is that histograms are not exactly bar charts. The main difference is that bar charts are used to compare categorical variables whilst histograms represent distributions. Sounds interesting? No? Well, here’s an example.

A couple of days ago I was looking for a data set to try some functions and I got my hands on the ENEM results for 2011. The ENEM is a national exam taken by brazilian high school graduates that evaluates each institution (private and public) in subjects like mathematics, language and natural sciences.

With a traditional bar chart, you can address some questions like: Are there more public or private schools? Which state has the most schools? Which schools are the best ranked in mathematics?


But as your inner data analyst grows more interested, you will start asking more complex questions. When we create a bar chart for the top 10 schools in mathematics, we may realize that there’s a big discrepancy between the best and the worst elements:


Are there some extremely good schools raising the national average? Or alarming bad institutions brining it down? Are they all consistent? Can we separate them in groups (good, normal, bad)? Where’s the majority of the schools? Is there a significant difference between public and private institutions? Or between states?

Let’s see what QlikView can do for us. First, we’re going to change our classic conception of a bar chart by using the X-axis as the grade and the Y-axis as number of schools that got it. As you can see, the data adopts a shape that gives us a better perspective of the situation:


Far in the right, we’ve got great schools (there are not a lot of them, but their scores are pretty high). On the other end, those who might need a little help (grades below 420 points) and in the middle, the majority of the schools. We can appreciate that the curve is skewed to the left, with most of the schools scoring from 440 to 560. Remember, a higher bar represents a bigger number of schools. For example, the red bar (the highest of all the histogram) is conformed by 296 schools that got grades between 500 and 505 points.

Is there a difference between public and private high schools? Well, if we separate our histogram using a second dimension we’ll get something like this:


Some thoughts that will probably cross your mind are:

Continue reading