Image

QlikFreak Football

What happens when you mix QlikView with the best football leagues in the globe? In today’s post, I’ll share a little app I created to analyze data about my favorite soccer tournaments like the English Premier League, the German Bundesliga or the Serie A from Italy. So let’s take a break of business oriented dashboards and spend some time reviewing the wins, losses and goals of the last few years.

If you’re a football fan, you’ll stay for the discoveries. If you’re a QlikView enthusiast, you’ll stay for the tips regarding scripting and visualization. If you’re not neither of those, you’ll visit our Random page in order to see funny videos like a pug playing ‘Enter Sandman’ from Metallica in the drums.

As usual, you can download all the related files here. Ready? Let’s get started!

23_01

The first challenge in this endeavor was to find a good data source. However, a static XLS file wouldn’t do the trick. Since we’re in the middle of the season and there are matches every week, we need a way to update our file constantly. Even though it’s not the most common way to extract tables, don’t forget that QlikView can retrieve records from external websites (Data from Files > Web Files).

23_02

After browsing for a while, I found a great sports portal called SkySports where we could get all the information we needed. In order to load it to our model (and since I didn’t want to copy and paste the same code dozens of times), I relied on one of our oldest friends: the FOR… NEXT loop. The structure is not very complex, so I’ll let the code speak for itself:

23_03

With all the data on hand, it was time to define our KPIs. Whilst most of them were already suitable for analysis, I had a little problem: I couldn’t compare directly some of the metrics. For instance, the French Ligue 1 has 20 teams, but Eredivisie (Netherlands) has only 18. In this regard, it’s also unfair to compare the points obtained by Borussia Dortmund and Lyon since they have played 27 and 31 games respectively. Therefore, I came up with a simple calculation to ‘standardize’ the scales called QlikFreak Score (you know marketing is not my strong suit, but bear with me). This KPI is the result of taking the points obtained by a team and dividing them by the total points disputed. For instance, if I got 6 points after playing 5 games, my QlikFreak Score would be 40% (6 out of 15 possible points).

That being said, lets now focus on the design! The layout is fairly simple: the filters are located on top and the analytical objects in the bottom. We start our journey with an overview of the 116 teams in our model by classifying them into different categories: the ones that classify to the Champions or Europa League, the ‘normal’ teams, and the ones that are on risk of relegation.

23_01

The table in the bottom displays the most relevant metrics but it uses the Gestalt principle of similarity (yeah, the cell background) to divide them in four groups:

  • About the games: Games played, wins, draws, and losses.
  • About the goals: Goals for, goals against and goal difference.
  • About the points: Points obtained, points disputed and QlikFreak Score.
  • About the rank: Team’s rank in the respective league and status.

Since I’m using abbreviations, it won’t hurt to add a little Help Text (Caption > Help Text) so the user can easily remember their meaning.

23_04

Another interesting thing about this app is that it uses two color references throughout all its tabs. The first one refers to the tournaments. As you can see in the filter pane, each league has an associated color chip. Accordingly, each team in the table is accompanied by a colored cell so that even the less savvy users can relate Arsenal with the Premier League and Real Madrid with La Liga (This tip was brought to you by the man with the hat. Thanks Rob!)

23_05

The other one has to do with the status. It associates the teams that classify to the Champions League with a golden star, the Europa League with a green triangle and the endangered squads (relegation) with a red cross. [I don’t really know how to call the color I used for the ‘normal’ teams]

23_06


The second tab focuses on analyzing the performance of a specific team:

23_07.png

In the left, you can see the team’s crest alongside a gauge that displays its QlikFreak Score. Just below those objects, there’s a trend line with the performance of the last 5 seasons (Expressions > Text on axis) and in the bottom, you’ll find a table that provides detailed figures about each one of them. Cool thing, the selected season is always highlighted in the table:

23_08

By the way, did you notice that the color reference for each league prevails in this tab?

23_09

Since I loaded an INLINE table with the RGB codes, it’s pretty easy to assign dynamic colors to all the charts:

23_10.png

Another interesting feature here is the usage of images. I selected two different techniques for this purpose:

  • Team Crest: Reference to an external file (preferably using Relative Paths)
  • League Logo: Bundled images using a SUB (I explain this one in my book, but I’ll try to write a post about it in the following weeks)

23_11.png

Now, let me take a moment to explain the scatter plot in the right (I really like this visualization). The x axis represents the team’s rank in the selected season. For instance, speaking about the Bundesliga, it goes from 1 to 18 (since there are only 18 teams). On the other hand, the y axis depicts the QlikFreak Score, so a higher bubble can be interpreted as a better performance.

23_12.png

The selected squad will be highlighted so you can easily see where it stands and there are three reference lines that let you know who will attend the international tournaments or who’s in the red zone (relegation). See that color association again? In this example, it’s not hard to see that the Germans will be represented by 2 teams in the Europa League, being Bayer Leverkusen one of them.

However, this visualization can tell you more stories. Let’s take a look at Ligue 1:

23_13.png

Can you see the amazing gap between Paris St. Germain and the rest of the teams? Clearly, they’re playing on a different level! It looks like we won’t need Nostradamus to predict where the ‘Coup de la Ligue’ will go this year.

Or let’s go back to the 2014 / 2015 championship in Spain. The roles we had that year were very clear:

  • The Leaders: Barcelona and Real Madrid
  • The Contenders: Atlético, Valencia and Sevilla
  • The rest of the league

23_14.png

However, some might prefer to see a more balanced tournament like the Premier League, displaying almost perfectly staggered bubbles.

23_15.png

Just a little bump in the end, which explains why Aston Villa’s fans are so sad lately.


The third tab highlights the goal difference per team (scored minus received). It also has secondary (yet quite interesting) metrics like the average goals per match.

23_17.png

With the flexibility provided by the Interactive Sort option, we can easily see how Barcelona and Real Madrid score almost 3 goals per match while Troyes, Udinese or Hannover 96 are struggling to get at least one. On the other end, Gianluigi Buffon (Juventus) and Manuel Neuer (Beyern Munich) have helped their squads by limiting their opponents’ strikers to one goal every two games.

When applying visual cues to a column, I recommend you to neutralize the RGBs as QlikView’s default colors are a bit too intense. The classic green / red backgrounds work fine in this scenario.

23_16.png

The twin bar chart in the left-hand side deserves a little warning. Since its axis in located in the middle of the graph, we can accurately compare the blue and orange bars against themselves. However, the total size of the bar (goals scored and received altogether) is harder for us to read correctly because we can only rely on its area (there’s no shared axis, so our perception of distance is not an option here). Nevertheless, it can tell us curious facts. For instance, by the time I’m writing this post, there are 5 teams that share the same goal difference in Italy (-5). However, if we only relied on this figure, we might disregard the fact that Sampdoria has almost 1.5 times more goals than Bologna!

Before moving forward, I’d like to remind you that there’s a Multibox in the filter pane that lets you select the team’s rank. For instance, you could pick only the best team of each tournament and evaluate their offensive / defensive ends by comparing the average goals per match. Even though Lionel Messi, Luis Suarez and Neymar are the best offensive party (2.9 goals per game), Paris St. Germain, Bayern Munich and Juventus display a better defensive performance.

23_18.png


Our fourth tab is also one of my favorites (you know how much I like Scatter Plots). This time, the axes represent the goals scored and received. I included a diagonal line so you could easily see who has a positive or negative goal difference (above or below the line). [See how the color reference prevails?]

23_19.png

This type of visualization is great for spotting outliers and finding weird behaviors. For instance, remember when I said that Paris St. Germain was playing on a whole different level? Look at this:

23_20.png

The distance between Ibrahimovic’s team and its rivals is ridiculous! Not only it’s the highest bubble by far (great offense), but it’s also the closest to the y axis (they don’t receive much goals).

However, more goals not always translates into better results. Just look at the Spanish championship:

23_21.png

Despite having some of the most effective strikers, Real Madrid (87 goals) ranks third in La Liga. Actually, their sworn enemies –Atlético de Madrid– have earned more points and rank second as of march, 21st. However, the funny part is that Cristiano Ronaldo and company have scored almost twice as goals as Simeone’s pupils. It looks like keeping low scoreboards is not a bad strategy after all!


The following tab shows general data about all the tournaments. It contains the number of teams alongside their status and interesting metrics such as Goals per Game. To be honest, I didn’t think Eredivisie would be on top of this list, but it claimed the first spot in the last 5 seasons.

23_22.png

Finally, I tried to explain the behavior of each league by calculating some figures around the QlikFreak Score like max and min values, average, median and standard deviation. At first, I thought that they might tell fascinating tales, but I soon realized that these metrics are easily distorted by the outliers (I also built a Box Plot, but I ended up deleting it due to its lack of relevance).

For instance, I thought that a smaller standard deviation could be interpreted as a more competitive league, meaning that all the teams have more or less the same level and get similar amounts of points. In the same manner, a large standard deviation would imply a bigger difference between the qualities of the squads.

Nevertheless, one cannot blindly assume this kind of things. Just look at the 2012 / 2013 Bundesliga: taking into account all the teams, its standard deviation is 16%. But if you take Bayern Munich out of the equation, it drops down to 12%. These metrics are quite sensitive and since we only have a handful of observations, I’d rather not use them. But hey, it was a good exercise!


Finally, this app wouldn’t be complete without one of my favorite charts: a histogram. After failing with my standard deviation attempt, I thought that a histogram (in fact, several of them) would help me visualize how these tournaments behave. I started by creating simple chart based on the QlikFreak Score. With this, you can see that the majority of the teams get, let’s say, between 30% and 55%.

23_23

In the left tail, we find the not-so-lucky teams who have obtained less than 20% of the points they have disputed so far like Aston Villa, Cambuur Leeuwarden and Troyes. On the other end, the right tail embodies the most efficient teams like Ajax, Barcelona and Paris St. Germain. Just yesterday, Bayern Munich beat Cologne and became the leader in this area, with an amazing 85% of points obtained.

However, this chart described the 6 tournaments altogether. As I wanted a comparison between them, I used an uncommon approach: a Trellis chart.

23_25.png

To be honest, I’m not the biggest fan of Trellis Charts (or small multiples, as some may call them). I’ve always thought that the idea is really good, but QlikView doesn’t allow you to customize a lot of its features, so I usually end up using other tricks. However, this time it worked pretty well. With a continuous (and shared) x axis in the resulting charts, you can clearly see where the majority of the teams are located. Also, since I added an extra expression, if you hover the cursor over a bar, you can  see which teams reside in that particular bin.

23_24.png


So that’s about it, I hope you liked it as much as I did! To be honest, I spent a lot of time playing with this app and picturing new ways of presenting the data. I guess it can’t be helped, maybe it’s one of the side effects of working with QlikView 😛

By the way, I’d love to see how YOU visualize this data as well! Any Dutch team who would like to talk about the Eredivisie (Barry / Frédérique / Wesley)? I bet everyone will go crazy if Ralf Becher used an extension to explain one or two things about the Bundesliga. Perhaps Steve Dark or Richard Pearce can give us some insights about the Premier League as well. And even though it’s not rugby, Byron Van Wyk can probably upload a couple of charts. So, if you have some spare time, why don’t you create your own version of this app?

If you have any comments, be sure to post them in the section below. Thanks for reading. Until next time!

Advertisements

6 thoughts on “QlikFreak Football

  1. evaristo says:

    Great!!!! I did something like that 10 years ago, only for the Spanish league. Also, I uploaded data to fill up the ‘Quiniela’ every week (weekly bets).
    It’s so funny.
    There’s only one thing about the web connection, where it says table$. The format of the web page changes once in a while. They add some new publicity and the table number changes to, making the script to fail. It happens every 6 months or so.
    Congrats¡¡¡

  2. Chirag Perla says:

    Really interesting Julian! Being a football fan and a Qlikview enthusiast, I loved the way you presented the data, especially the scatter plot with a diagonal reference line for Goal differences. Keep it up!

  3. Bravo!!

    Suggestions for improvement:

    1. Color references can be changed to country flag icon for better visual effect.
    2. Data source fetched from OPTA to enable players analysis. I will love to see it happen and use it for fantasy football.
    3. The last enhancement might be a tall order but I’m sure someone will take up this challenge. Build predictive model by using other external app like R to forecast upcoming fixtures result. R was used to predict World Cup 2014 : https://grollchristian.wordpress.com/2014/06/12/world-cup-2014-prediction.

  4. Martin Blanchet-Pedersen says:

    Hi Julian,

    Very nice app – and thanks for the great inspiration. I have extended the model with player stats from current premier league teams, and perhaps other people will find this useful:

    The stat are divided into 5 different tables of the different player types. To split these i just make a tempory type list to get the correct lables.

    //**************************************************************************************//

    Types:
    Load * Inline [
    Id, _Type
    1, Goalkeeper
    2, Defender
    3, Midfielder
    4, Attacking Midfielder
    5, Striker];

    Map_Types:
    Mapping Load * Resident Types;

    Set Teams = “‘arsenal’, ‘bournemouth’,’burnley’,’chelsea’,’crystal-palace’,’everton’,’leicester-city’,’liverpool’,’manchester-city’,’manchester-united’,’southampton’,’stoke-city’,’swansea-city’,’tottenham-hotspur’,’watford’,’west-bromwich-albion’,’west-ham-united'”;

    Set ErrorMode = 0; //Some teams doesn’t have eg. Attacking midfelder – so i just skip these.

    For each Team in $(Teams)
    Let SPLink = ‘http://www.skysports.com/$(Team)-stats’;

    For Type = 1 to 5
    Let P = Peek(‘_Type’, $(Type)-1, ‘Types’);
    Trace Get $(Team): $(P);
    SkySport_Players:

    LOAD
    Replace(Capitalize(‘$(Team)’), ‘_’, ‘ ‘) as Team,
    applymap(‘Map_Types’, $(Type), ‘n.a’) as Type,
    [$(P)] as Player,
    Pld as PlaysDesc,
    subfield(Pld, ‘ ‘, 1) as Plays,
    Textbetween(Pld, ‘(‘, ‘)’) as PlaysPossible,
    G as Goals,
    YC as YellowCards,
    RC as RedCards,
    FF as Fantasy,
    $(D) as %Date,
    ‘SkySport.com’ as Source,
    ‘$(SPLink)’ as SourceLink
    FROM
    $(SPLink)
    (html, codepage is 1252, embedded labels, table is @$(Type));
    Next
    Next

    Set ErrorMode = 1;

    //**************************************************************************************//

    For anyone looking to include other leagues > use the Team list from the competion table and translate it to links by LinkKey = Replace(Lower(Team), ‘ ‘, ‘-‘)

    Best Regards

    Martin

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s