Data Viz 101: Always use accurate, compelling data

Welcome back to Data Viz 101, Beutler Ink’s introductory course on the wonderful world of data visualization.

In our last installment, we looked at the bar chart, one of the most commonly used types of data visualizations. Today, we're going to transition from writing about form to execution and ethics.

Welcome back to Data Viz 101, Beutler Ink's introductory course on the wonderful world of data visualization. In our last installment, we looked at the bar chart, one of the most commonly used types of data visualizations. Today, we're going to transition from writing about form to execution and ethics. 

Transforming static figures into a professional-looking visualization confers a degree of authority to the data. After all, if somebody has taken the time to create an eye-catching graph or bar chart, then the numbers must be meaningful. But too often, the sort of graphics you may see in presentations on shared on social media are created using cherry picked figures from sources that may be unreliable or out-of-date.

Good data
Bad data

Below, we identify the components of good data—whether for use as a visualization or in any other sort of brand messaging or strategic communication. 


Reliable 

Good data should come from a publisher or organization with a reputation for fact-checking and accuracy. Strong sources include data from government agencies (e.g., Census Bureau  Environmental Information Agency, Department of Labor), major newspapers and media outlets (The New York Times, Economist, BBC), university and academic publishers (Harvard University Press, Routledge, Palgrave Macmillan), and industry-leading research outlets (Nielsen, comScore, Deloitte). 

This should go without saying, but avoid data from any source that seems remotely untrustworthy. That includes any perceived partisan sources that would raise eyebrows from an impartial observer. 


Primary 

If you come across an interesting statistic or chart, always track down the original data before you even think about creating your own visualization. Note that interesting stats will often circulate around the web long after they stop being timely or relevant. This is especially for true for eye-catching numbers about marketing and social media. You need to find the original survey or data (or a write up about said survey or data from the organization that collected or produced it) to ensure that it is recent and the methodology seems solid. You also need to ensure that important nuances weren't lost in translation as the figure moved from site to site.  Remember: It's your responsibility to represent the data in as a close to its original context as possible. 

Charts included in published articles will always have a citation at the bottom listing the data source. Again, you need to be able to find and review that original information to make sure it's correct. Then, when and if you reproduce the visualization, your cited source should be the primary source (e.g., Energy Information Agency), not the secondary one (e.g., New York Times). 

Also note that many common claims accepted as conventional wisdom are always backed up by research. For instance, the claim that "nine out of ten businesses fail" (or "nine out of ten startups fail") is often stated in articles as though it were a factual claim, but actual surveys looking into the matter found mixed and contradictory results. 


Verifiable

Per the suggestion above, you should always cite the original source of data. But beyond that, you should also ensure that said data is available for others to find and analyze on their own. The easiest way to do this is by clearing labeling the data source. In most cases, a simple "Source (Year)" citation at the bottom of the graphic is sufficient. But when you're dealing with granular data, or citing a government agency that produces reams of data, try to be more specific—e.g., "United States Energy Information Administration, 'Annual Energy Review 2018,' Table 1.2, 'Primary Energy Consumption by Source, 1949-2018.'" You want to make it as easy as possible for readers to track down the information. 

Showing the whole picture

If the data you're using isn't publicly available—say, if it's proprietary or otherwise confidential—you should clearly state that in your presentation. In instances where audiences can't check your work, always ensure that your visualization include all relevant contextual data points. Be wary, for example, of narrowing in on a narrow range of dates to depict a trend or spike that might not seem significant when a larger range of dates is shown. (This is a good practice generally, but especially so when you're holding all of the data cards.) 


Current

Using out-of-date data is by far the most common and (in our opinion) negligent error in not only visualizations, but also blog posts and other types of content marketing. It's so easy to latch onto a data point that seems compelling, especially if you see in on what appears to be a reliable website. But, per our advice above, it's always worth the effort to click through to the original data source and check the date. 

Ideally, you want to avoid data that's more than a few years old—especially if you're talking about social media, marketing, or other industries that are constantly in flux. When more recent data isn't available, you should at minimum note that you're using the most recent figures available. 

Keep in mind that government agencies like the Bureau of Labor Statistics frequently update their data (often monthly or annually). 

Previous
Previous

Here’s to 2019

Next
Next

Data Viz 101: Clearing the Bar Chart