I updated this post on the 25th May as my code for split axis was improved following a recommendation from Adam Pearce via Twitter.
For my makeover, I wanted to create chart that could also allow people to search for a channel and see it in context of the overall dataset. In trialling different charts, I experimented with linear, log and split scales for the axes.
The data about the channels' video views and subscribers was heavily skewed towards the lower values on both dimensions, which results in a scatter plot that is dense in one quadrant:
The scatter plot also features convex hulls (the colour shaded areas), which group channels based on the ranking provided SocialBlade.
As the data points were so dense at the lower values, I wanted to see if other scales may help users position their data point, when its appears at the lower end of the scales.
The chart below shows the same dataset, but both the X axis (subscribers) and Y axis (video views) use a logarithmic ('log') scale. On a log scale, each step in the axis is an order of magnitude larger than the previous step (in the chart below, the scale doubles)
Using a log scale does allow you see the data points more clearly at the lower ends of the scale. The drawbacks with a log scale are that
- It is unfamiliar to some people
- Personally, I find it hard to compare values as the relationship between the distance on the page/screen and the relative value changes as you move along the axes.
- At the higher end, the channels with the very high subscribers or video views lose their visual impact.
An alternative I tried is to split both the X axis and Y axis scale into half. The first half of the scale covers a small range of values, and then the second half covers a large range of the higher values.
My aim was to allow users to see more clearly the datapoints at the lower end of the scales, and:
- not lose the visual scale at the higher end
- keep the scales simple to understand
I think the scale does succeed in meeting my aims. There are drawbacks to this approach:
- It may not be obvious that the scales abruptly change partway along the axis. This could addressed with clear labeling and gridlines. I'm not sure the gridlines alone are enough.
- It is harder to compare data points which appear in different quadrants of the scatter plot. This is may be less of issue where people want to compare channels with those close to them. Knowing that DanTDM and PopularMMOs way above them in terms of views and subscribes may be enough.
Code for the split axis
I have updated my code for this based on Adam Pearce's recommendation via Twitter. Thank you! I should have realised D3 could handle this out of the box.
My code for this type a split scale creating a linear scale with three values for the range and domain, the middle of which sets the cut off point between each half, eg:
xScale = d3.scaleLinear() .range([0,width/2,width]) .domain([0,xAxisCutOff,maxSubscribers]);
width/2 sets the scale to divide exactly half way along the axis. This could be anywhere along the axis if you wish.
The xAxisCutOff sets which datapoints will be in the first or second half of the scale.
I'm not 100% sold on either of the plots shown in this posting. I would love to hear your approach to similar datasets.