A number of times, when I have been talking with people about their data, I have been told “I don’t want you to do anything complicated – I just want you to do something simple so that I can have a result, or story, that everyone can understand”.
This is often said when people have a large (and often unwieldy) set of data from which they want to extract a clear message to give to the public, the press, policy makers or others in their own organization.
I agree entirely that we want to present results that tell the story of the data clearly and simply. But it is important to separate out the message you present to others from the methodology that is used to create that message.
Your message could be presented using an indicator, graph, table or headline statement – something that is as easy to interpret as possible whilst staying true to the story in the data. But the methods should be whatever is needed to best make sense of the data; be it a simple tabulation of the data, complex statistical modelling, data mining or whatever else is necessary to extract that story.
Without using the most appropriate methods to understand the data our message may be clear… but wrong!
For example, last week I showed that simply describing percentage changes over time in the number of seizures of illegal goods can be a misleading approach to learning about how illegal trade is changing over time. Instead, we have had to use complex statistical methods to learn about trends in the illegal ivory trade. From these complex methods we are able to produce simple interpretable results. Specifically indicators such as the Global Transactions Index which allowed us to provide simple messages such as “the illegal ivory trade tripled between 1998 and 2011”.
We see examples like this all the time: the many different indicators of the socio-economic status of a country such as the Human Development Index (HDI) of UNDP; estimates of the number of individuals of species of animal – for example African elephants – you can’t count them all(!); long-range weather forecasts to help farmers decide on their planting regime for the season. In all of these cases large amounts of information are combined together using the most appropriate methodology. The results are packaged to the public in a simple and straightforward manner and quoted to be able to make comparisons between years or countries. Discussion focuses on the outputs. The methods used to produce these outputs are not really discussed at this point.
Of course, it doesn’t mean that the methodology can be a black box, brushed aside, or used as an excuse to confuse and confound people. Instead methods are presented separately from the results sometimes in peer-reviewed journals or on-line technical and methodological reports. Ideally there should be a clear non-technical explanation so that anyone who is interested can understand and question the methodology. And where necessary improve on it. In addition, it is of course also important that people understand the caveats and limitations of the results. But within these limitations the message must not get lost in what can often be a technical debate on methods.
To return to my main point though, there is nothing wrong in employing a complex method if it is the best way to untangle a messy set of data. That way, complexity can lead you through the muddle to a clear and simple message.
If you have an example of where complex methods have been used to produce simple results or where simple methods have been used to produce the wrong answer I’d be interested to know in the comments below.