- What we do
- Who we work with
- Who we are
One of the coolest and most powerful data visualisations available in Microsoft Power BI is the Scatter Chart. In this walkthrough, I’m going to show you how you use this chart type to bring together a very complex set of data in a beautifully clear and intuitive visual. Along the way, I’ll also illustrate a few useful tricks for data modelling in Power BI. I’m going to do all of this in the Power BI Desktop app, but it could be done in the web interface just as easily.
I have to give some credit for the ideas in this post to the excellent BBC Four documentary “The Joy of Stats”, with Swedish statistician Hans Rosling. That documentary was where I first saw the type of data visualisation; it was very engagingly demonstrated with a similar, but much longer set of data. You can watch the segment which inspired me here: 200 countries, 200 years, 4 minutes. It’s amazing to me that a data visualisation which took a team of BBC special effects specialists to produce in 2009 can now be created by almost anyone in a matter of minutes!
Here’s what we’re going to create:
For this walkthrough, I’m going to use several datasets from the World Bank Open Data project relating to population, life expectancy, and per capita GDP. Each of these is available in CSV format, broken down by country and year since 1960.
Having downloaded and extracted the datasets from their Zip files, the first thing to do is pull them into Power BI. Clicking “Get Data” gives you many import options, one of which is the option to import from a CSV or text file. I’ll illustrate the process with the Population dataset – all of them share the same layout, so the procedure is the same.
On importing the dataset, the first thing you’ll find is that it’s not in a particularly useful format. There are several rows before we get to the actual data, and it’s laid out in a table that makes perfect sense in a spreadsheet, but isn’t very helpful for analysis in Power BI.
Fortunately, fixing all these issues is really easy. Clicking “Edit” in the import dialog lets you modify the dataset you’re importing in the Query Editor:
The first thing to do is to get rid of those useless header rows. Click “Reduce Rows” and choose “Remove Rows”, then “Remove Top Rows”, and enter “4” when asked how many rows to remove.
Next, use the first row (of the remaining rows) to provide our column headers – simply click “Use First Row as Headers”:
That gives us a much more useful data table, but there’s still one problem. Each year appears as a separate column in this dataset, and what we need for the analysis is columns for the year, and for the data value for that year, for each country. To do that, you can hold Ctrl and click on each of the first four column headers to select the columns. Right-click on them and choose “Unpivot Other Columns” from the context menu:
This gives us a table with the columns Country Name, Country Code, Indicator Name, Indicator Code, Attribute, and Value, with the years in the Attribute column, and the values for each year in the Value column:
Now, all you need to do is rename the ‘Attribute’ column to ‘Year’, and the ‘Value’ column to ‘Population’, and you have exactly the dataset you want:
We also need to ensure that the data value column (Population) is set to the correct data type – here it’s been incorrectly detected as having mixed valued. Click on the datatype indicator in the column header and change it to “Decimal Number”.
At this point, you can click “Close & Apply” in the ribbon to import our Population data, and then repeat the process to import our other two datasets (Life Expectancy and GDP Per Capita). You might also want to rename the imported tables, since the original filenames aren’t particularly useful.
You should now have three data tables available to us in the Power BI designer. We could easily chart each one on a line chart, showing (for example) Life Expectancy vs Year by country. To do this, click the ‘line chart’ button under ‘Visualizations’, then drag ‘Year’, ‘Country Name’ and ‘Life Expectancy’ from Fields into ‘Axis’, ‘Legend’ and ‘Values’ respectively:
However, if you try to use this type of chart to see how the three variables relate to each other over time, you’ll find it very difficult to get any feel for the data. Instead, a scatter plot lets us show all three variables on a single chart, and then animate it over time to see how they evolve.
To relate all three datasets together, you need to define some relationships between them. Go to the Modelling ribbon tab, choose “Manage Relationships”, and try to set up some new relationships – for example, to relate each of the “Year” columns in our tables to each other. Unfortunately, if you try to do this with our data as-is, you’ll quickly encounter an error that one of the columns needs to supply unique values:
What we need to do is create what is known in Data Modelling terms as a dimension table, which provides all of the unique values which our dimension (Year, in this case) can take. Fortunately, we can do this quite easily using Data Analysis Expressions (DAX).
On the Modelling ribbon tab, click “New Table”. This creates a new calculated table in our data model, and places the cursor in the Formula bar where you can enter a DAX expression to define it. We want to take the distinct Year values from one of the existing datasets. To do this, enter this expression:
This gives you a Years table, with one Year column. You can now use the Manage Relationships window to relate the Year column in each of our data tables to the new Year dimension:
You can accept all the default options here, and continue to create relationships for the other data tables.
Let’s define a similar dimension table and set of relationships for our countries. This time, I’ll use a slightly more complicated DAX expression to create a table which contains both the country name and the country code. This is a useful technique if you have data where the labels (country names in this case) are not necessarily unique, but have unique codes or IDs associated with them. Fortunately, Power BI assists you with creating the query by autocompleting fields and functions:
This expression uses the SELECTCOLUMNS function to return a table with named columns.
You can now go ahead and create relationships between the Country Code column in our Countries dimension table and the corresponding columns in the data tables:
Now, you should have all of the relationships we need between our tables:
It’s time to start creating the visualisation!
What we’re going to do is to add a Scatter Chart visual to the report page, stretch it out to cover the full page, and set the appropriate fields. Use the Country Name (from the Countries dimension table) for the Legend, put GDP Per Capita on the X-axis, Life Expectancy on the Y-axis, and Population for the marker size.
Finally, the real magic happens when you set Year (from the Years dimension table) for the Play axis – this creates a slider for Years across the bottom of our chart, with a “Play” button. If you hit play now, you can watch our chart evolve over time!
However, a small problem quickly becomes apparent – most of the data points cluster very close to the left-hand side of the X-axis. The obvious issue here is that some countries (like Lichtenstein) are such outliers on per-capita GDP that they compress the entire axis. The solution to this is to use a logarithmic scale – by selecting the formatting options for the visual, you can change the type of the X axis.
Now, it’s easy to see the evolution of life expectancy, GDP, and population from 1960 to 2014, in a very clear and visually arresting way:
As a final bonus, if you click on any of the bubbles, you can see a trend line and the complete history of its development:
We’re running a special Power BI Breakfast Briefing in Edinburgh on 19th April. We’ll introduce Power BI, show how easy it is to use, and demonstrate some of the powerful business intelligence and reporting features you can use today. Interested? Sign up for this free event now.
Future thinking Cloud Microsoft Collective intelligence Intranets Business intelligence Office 365 Office 20th Anniversary Technical Digital Transformation CRM Gold Partner WPC Business Transformation Privacy Partners Change Management Director's Briefing Public Sector Websites