Science, evidence and data in government
Speech by Sir Mark Walport at the 2013 Royal Statistical Society annual conference.
Thank you very much indeed John [Pullinger]. It’s a great pleasure to be here. Indeed, it’s a pleasure also to give a named lecture, because the opportunity to give a named lecture means that you need to find out something about the individual behind the name. In this case Sir Harry Campion.
And in delivering this lecture I shall use the typical approach of the historian for the first part and of the scientist for the second - in that I shall read from a script for the first part and for the second will speak more freely to slides - or to Powerpoint to be precise.
History of statistics
Before I come to Campion himself I want to make a few remarks from a historical perspective about statistics. I have always liked collecting and classifying things and statistics and statistical analysis provide an enormously powerful toolbox for such activities.
As with many activities there is evidence that, even if they didn’t get there first, the Romans were amongst the pioneers of collecting statistics for administrative purposes - they conducted quinquennial censuses of both people and their property. The etymology of the word statistics is especially relevant to a scientific adviser to government. Anders Hald traced this to 16th century Italy, the assembly of data of interest to a statesman (statista).
But of course modern statistics is much more than the collation of facts and figures - it is about analysis and reasoning about data, it is about sampling and inference from incomplete data sets. It is about the analysis of probability.
Patrick Geddes, a man with a big vision, took an optimistic view of the role of statistics in got it right when he wrote a paper in 1881 entitled: ‘The classification of statistics and its results’, published in the ‘Proceedings of the Royal Society of Edinburgh’:
All social phenomena of every kind may be investigated by comparisons of the different causes from which they arise, under different conditions, and in countries presenting wide spheres of observation and opposing influences at work. Knowledge will thus be increased, laws of social life eliminated, true scientific enquiries promoted, the work of government simplified, and the progress and prosperity of nations fixed upon sure bases of observation and reason, instead of dangerous experiments or doubtful theories.
We can all sign up to that. Geddes was the very unusual polymathic combination of zoologist, statistician, social scientist and city planner - and as part of the great Victorian enterprise of taxonomy - he applied a taxonomists mind to the organisation of statistical knowledge.
I think it is interesting that, in biology taxonomy has once again become rather fashionable in the context of genome data - and indeed the availability of huge data sets in other areas gives renewed opportunities to think about how data can be used to provide taxonomic clarity.
Sir Harry Campion
Harry Campion cut his statistical teeth in the textile industry and in the collection and analysis of economic statistics. He moved from an academic position in Manchester to government to lead the Central Economic Statistical Service - and the second world war provided the need and the impetus to provide an integrated statistical service for the war cabinet. Harry Campion was the man that led this - the Central Statistical Office. War and indeed other types of crisis, sadly, have often provided an important impetus for collaboration, innovation and tackling really difficult questions in science, engineering and technology.
After the war Campion was recognised internationally and led the formation and was the first Director of the United Nations Statistical Office. I was taken by 1 of his obituaries, which noted that he was taciturn - a man of few words - but one of those favourite words was ‘thing’ - and I agree this is a useful word - particularly for deciding on the nature of an organisation - is it a thing or entity in its own right? But I am not sure that it is a word I would guess to be a favourite word of a statistician. He left a half of his estate to the Royal Statistical Society and I am honoured to be able to commemorate him by the delivery of this lecture, named in his memory.
Government in the UK has been interested in statistics for a long time - and a number of eminent politicians have addressed this society over the years. Harold MacMillan delivered a speech in 1959 to commemorate the 125th anniversary of the Society. He said something that is certainly as important today as has ever been the case:
To have any hope of carrying out their policies, the government of the day must have knowledge of the facts as they are, together with such information as will help it to deduce future trends.
He went on to say:
I have myself ventured to urge the need to get these statistics quickly as well as accurately - so that we can keep pace with changes in economic conditions - so that, as I put it, we do not always have to be looking up our trains in last year’s ‘Bradshaw’.
Here he was quoting his own budget speech of 1956. He went on to say:
Of course, statisticians are faced with difficult decisions in determining how much detail is really necessary for each exercise. But it has always been my belief that statistics must be available quickly if they are to be of real use in guiding those concerned with policy questions. Hence the importance of ‘sampling’ and the ‘spot check’: a sample is an instrument of policy, when the whole record may merely be a piece of economic history.
These remarks chimes well with a reference by Patrick Geddes in his 1881 to a famous aphorism by the 18th century German historian, August Ludwig von Schlözer;
Statistics is history in repose, history is statistics in movement.
If this was famous then, I am not sure that it is any more, because searching for it on Google comes up only with the paper by Geddes. But I would emphasize the point from my own experience since I have started to work in the Civil Service - advice to be of use has to be of high quality and timely.
In 1972, another British Prime Minister and another Harold, Harold Wilson, I suspect there is little of inferential value that both were named Harold, gave a speech to the Royal Statistical Society subtitled ‘Bradshaw revisited’. Wilson in his past was a professional statistician, who, during the war was first head of the Manpower Statistics Branch of the Ministry of Labour, and noted that he;
…temporarily presided over the largest computation then known in Britain, with 15 million punched cards at Wembley, slow but sure.
This comment diverts me briefly to another statistician, M G Kendall, who read a paper before this society in 1942 in which he noted:
We must, I think, look for some inventive endeavours by statisticians themselves, if there is to be any further saving in labour. In this connection I may mention the remarkable machine invented by Mr. Mallock for solving linear equations. Mallock’s machine abandons the cog-wheel entirely in favour of electrical circuits… There will never be any large commercial demand for such machines, but perhaps as computing becomes more centralized we may expect to see some of these specialist engines come into existence.
This is a much earlier of the possibly apocryphal comment about the need for large numbers of computers made by a CEO of IBM.
But back to Wilson, who in what was his presidential address to the Society made another very important point when he quoted John Jewkes, who was Professor of Economic Organisation at Oxford, and who wrote a book appealingly titled ‘Ordeal by planning’, which is another matter altogether:
What I learnt in those days, in the immortal words of John Jewkes - and it is true in every area of our national life - was this: you can always get someone to find the answers to the questions, what you need in government (it is equally true in industry, and the social services) is the man who knows the questions to the answers.
And I will come back to this in the next part of my talk, something that really came home to me during my time at the Wellcome Trust, the common denominator of the best scientists is that they ask important questions - and this remains the case in my present job as the Chief Scientific Adviser to the government.
And just before I move to the present and to the future, I commend to you the speech that Klaus Moser, the successor to Harry Campion, and predecessor to Jil Mathieson made in 1980, in which he highlighted a problem that still bedevils us today - and I was struck whilst fossiling around in the past that many policy issues that were in the past tough or intractable remain so - and that includes problems in the education of statisticians and in the sharing of administrative datasets, a problem that Richard Thomas and I tried to address in the Data Sharing Review that we conducted for a previous government.
Progress in social statistics has been inhibited by the failure to make data linkages between departments. For policy purposes, one may wish to link, for example, income with education, social security with work absenteeism and so forth. If these are all statistical data, no problems arise. But if it is a question of linking administrative data between 2 departments, or administrative data in 1 with statistical data in another, present confidentiality constraints will prevent linkage.
So far, political sensitivities have stood in the way of progress, and even feasibility studies to explore methods and safeguards have been ruled out. This is a pity. Our statistical system could be made more efficient, and the public saved from too many questionnaires and forms, if data linkages were permitted; and enough work has been done abroad to show how easily the confidentiality and anonymity of individual information can be safeguarded.
In this area, the opportunities for data linkages have increased massively with the use of modern IT and this delivers both opportunities and challenges. Sir Alan Langlands led the Administrative Data Taskforce last year to look at this question yet again and I commend his report to you.
So moving to the here and now - and abandoning my text - let me turn to the present and the future.
Government Office for Science
I’ll start by just saying a little bit about my role in the provision of scientific advice to government.
The job of the Government Chief Scientific Adviser is to advise the government on all aspects of science, engineering, technology and social science as they apply to government policy. This is how the Government Office for Science is thinking about the role at the moment. But you could look at it another way, which is to ask; ‘what are the major concerns of government?’
Government has 2 over-arching concerns. The first of these is for the health, the wellbeing, and the resilience of its citizens. The second - and it is not un-related - is the economy. And that drives a significant part of my agenda, because on the economy it is about bringing together science, engineering, technology and social science with academia and government to ensure that our strong intellectual and research roots are turned to economic advantage.
The area around health, wellbeing and resilience I tend to think of in terms of infrastructure. It is infrastructure on which modern societies depend. If you look around this building there are all sorts of infrastructure. In some ways the most crucial infrastructure is the energy infrastructure - if the lights go out then you get a very different sort of talk! So I tend to think of infrastructure as being our engineered, our built infrastructure, and then the natural infrastructure of the planet; the environment, climate and our relationship with other species. Science, engineering and technology is crucial in terms of how we maintain the resilience of both our built and our natural infrastructure.
A third part of that resilience and wellbeing is about providing the best scientific advice in emergencies.
So those are the 3 under-pinning scientific parts of the advice. But then there is a general issue, which is that we will only have the best policy if we collect and use the best evidence. It is about working across government to make sure that policies are under-pinned by the best evidence.
Finally there is a role for the Government Office for Science, and for the GCSA, in providing an advocacy and a leadership role for science.
That is the job, and I am very ably supported by the splendid Government Office for Science.
There are several themes which are emerging as cross-cutting issues. We are bringing together the different activities of the Government Office for Science - including Foresight – to look at issues such as;
- ageing, cities and demography
- energy, climate change and the natural world
- the economy, the City, and the world in trade
- manufacturing, materials science and innovation
- data, infrastructure, identity
- principles of government decision-making; risk resilience, contingency
I am only as good as the network of advice that I have. One of the things that has happened in the last few years is that there is now a network of Chief Scientists in every department of government. They bring a whole range of different domain skills; David Mackay is a distinguished physicist and engineer, Robin Grimes is a nuclear physicist, and Bernard Silverman is a statistician. Each of the Chief Scientists provides advice, not only to their own department, but uses their domain skills to provide advice across the whole of government.
At the centre of the network is the analytical community. This is the community that is broadly employed by government, some of them in Whitehall, but some of them in laboratories around the country. There are roughly 12,000 scientists and engineers employed across government, including the scientists at GCHQ, at Porton Down, in Public Health England etc. Among the analytical and scientific professions in government we have statisticians, operational researchers, economists, social researchers and scientists and engineers. Then there are the scientific advisory committees, academia and business. I can only do my job effectively if I am working with that whole network, in partnership, to provide the best advice.
From data to knowledge to society
There is a view that we must open up all data sources. But it is very important to recognise that there is a distinction between data, which is the substrate for information, and information, which is the substrate for knowledge.
Of course knowledge is only really valuable when it is applied. One can look at the application of that sort of knowledge in terms of societal benefit. I look at it in terms of 3 big areas;
- policy and service delivery
- data for resilience (Met Office data, Ordnance Survey data etc)
- data knowledge and its application for economic growth
This slide shows an example of how data can be turned into knowledge and then into practical application. Here you have a genome sequence, which requires an awful lot of interpretation. It is the graphic representation of the sequence from a cancer - a malignant melanoma. The benefit of that knowledge is that by identifying a sequence difference in a protein called BRAVF in malignant melanomas it was possible to turn that into drug discovery. This shows an application of a drug called Vemurafenib in 2 patients with malignant melanoma. You don’t need to be a professional radiologist to appreciate that these are the tumour deposits in yellow and red. This is the patient before treatment and after treatment.
So this is an example of how you take a sequence (the data) and turn it into information about the tumour, into knowledge about tumours, and then into practical application. That is ultimately what data analysis is all about.
Now the role of statisticians is in all stages of this process. I’ve already talked about the importance of defining the question. In medicine statistics is extremely important in working out how to design the study. If you are doing a clinical trial, there is no point in doing it if it is not powered to give you an answer to the question you are trying to ask. It is about;
- data collection
- the analysis of that data
- the visualisation of the data
- the inferences from those data and how you turn them into information
One of the challenges is that work is not completed until it is communicated. If the statistical world is going to have the maximum impact then it is going to be through the effectiveness of your communication.
Old ways of collecting data
Of course great advances in science come when new when new techniques and new ways of approaching questions become available. New ways of collecting data allow us to ask new questions. That is what has happened on a grand scale as a result of the revolution in IT. However, one shouldn’t neglect old ways of collecting data.
This [the Rothamsted Broadbalk winter wheat experiment] was a fascinating experiment. One of the privileges of my job is the opportunity to visit interesting scientific and engineering establishments around the country. At a recent visit to Rothamsted I saw an experiment which has been continuously in progress since the 1840s.
On the right you can see a strip of land which has been treated with no herbicides or fertilizers. The wheat yield is lower than the equivalent plot which has been treated with herbicides and the plot which has been treated with herbicides and fertilizers.
The yield of untreated land in the 1840s was about 1 tonne of wheat per hectare and it is still 1 tonne of wheat per hectare on precisely the same piece of land. But, with the introduction of new crop varieties, new fertilizers, herbicides and pesticides, the yield of wheat has gone up to 8.5 tonnes per hectare.
This is a beautiful, very easy to understand, experiment, which has been going on for a very long time. So, with the excitement of the new, we mustn’t throw out the old.
New ways of collecting data
But there are now amazing new ways of collecting data - and they can make phone calls too. We all carry in our pockets geo-positioning devices and accelerometers – the most amazing tools. They are now being used around the world to collect important scientific information.
Ash Tag is an example of an app which is being used to collect and understand how chalara fraxinea, which is the fungus affecting ash trees, is spreading around the UK.
Another rather attractive example is an app called Street Bump. This is an application that Boston City is using in the United States. It takes advantage of the accelerometers and GPS in smart phones. You put your phone on the dashboard of your car and as the car goes over a bump it triggers the app to send a signal to Boston’s Mayor’s Office. What this does is to crowd source the information. If lots of cars send back a signal of a bump at a particular place then it is a pretty good sign that there is a pothole in the road. So here is a mechanism for crowd source collection of potholes.
But this does raise interesting privacy questions. By telling Boston City where a pothole is, you are also telling Boston City that your particular phone is going over that pothole at a particular time. Now of course Boston City is not interested in that information, but it is a very good example of the sorts of things that we need to consider.
Here is another example taken from a talk that Stuart Peach from the MOD gave the other day. Mapping and geo-positional data underpins a lot of the important data sets that people use today. We use it through the devices that tell us where the nearest restaurant is etc.
This is an example from Helmand Province, where people wanted to detect the illicit movement of narcotics. It is important in a war situation to provide real-time advice for soldiers to help them cross dangerous terrain; where IEDs might be found, where they can crawl through culverts, where they can duck into ditches etc. You can now pin on to map data all sorts of layers;
- demographic information
- tribal distribution
- population density
- population movements
- topographical information
- terrain elevation data
- hydrographic data
- meteorological data
- aeronautical data
By using all of this you can now get a combined operating position for the military of a sort that was not conceivable before.
This is the extraordinary power of very large data sets combined together, particularly when you include mapping information with it.
Analysing big data
In health 1 of the important parts of data is that it holds the system to account and helps us to deliver better health care.
This is a good example from Scotland. The Scottish population is about 5 million and there are about a quarter of a million people with diabetes, of whom only a very small fraction have the type 1 insulin dependent diabetes. These patients are now registered onto a single database used in 38 hospitals where the data is captured nightly. This enables accountability and the delivery of better services.
If you look at 2003, I find it slightly astonishing that less than 50% of patients with diabetes were having their blood pressure measured, or having their cholesterol measured, or haemoglobin A1C - which is a measure of chronic exposure of haemoglobin to high blood sugar levels. But by using data you can see how each of these measures has been driven up to completion rates of 90% or above and that has turned into health benefit. When you now look at the need to give laser therapy to people who have ocular problems from their diabetes, you can see that it is going down. The number of amputation of limbs due to vascular disease has also gone down by about 40%.
Here are examples of how raw data is turned into accountability, is turned into knowledge, is turned into health improvement. This is the fundamental argument for the use, and sharing, of health data in medicine. It holds the system to account and it delivers better health care.
Now, I’ve already intimated that the important part of all of this is how you turn the data into information. The visualisation of the data is overwhelmingly important. I would argue that you are not doing your jobs properly unless you enable the visualisation of your data in a manner that the relevant audiences can understand.
Here is a slide looking at carbon emissions. This shows countries scaled, not according to their geographical area, but by their total carbon emissions. If you look at the UK on this map you can see it is quite large compared to other countries - we amount to about 2% of global carbon emissions.
But if you look historically then you see a very different picture. Because we kicked off the industrial revolution our total contribution to the carbon dioxide that is in the atmosphere is very substantial indeed. It puts us about fifth in the league in terms of our historical contribution to carbon emissions. I think that provides a much greater imperative for there to be a bit of leadership from the UK in responding to the challenges of climate change.
But the real point of this slide is not to make that point, but to show that how you visualise data is critically important.
This next slide will be extremely familiar to this audience. This is ‘gapminder’ created and led by Hans Rosling. You can plot this over time and there is an enormous amount of information here.
The challenges throughout all of this are how we;
- communicate and avoid the misuse of statistics
- deal with privacy issues
- manage the skills gap
There is a need for us all to be extremely clear in our communications. The times we need to communicate most clearly, are in the situations where it is the hardest to do it. A lot of my work is about communicating to government at times of uncertainty when we don’t necessarily know a lot of the answers to the questions. Especially where:
- the science or statistics are complicated
- there are uncertainties
- there is media interest
- during emergencies
It is about being clear, honest, and speaking in very direct language - but trying to get across the nuance and the uncertainty as well.
Here is an example. These are both figures of the possible paths of hurricanes, and they are both perfectly valid ways of expressing the data. The first is looking at it in terms of a ‘cone of uncertainty’. Here is what meteorologists think is the most likely path of the hurricane, and the confidence limits of that. So it could be anywhere within this ‘cone of uncertainty’. The problem that they found with this type of presentation was that people living away from the central line looked at it and thought they didn’t need to do anything. They tended to look and say; ‘That is where the line is. I don’t need to worry out here’.
What is now being tried – and I don’t know if it is more effective or not – is to present hurricane data in terms of possible paths. You have now got a series of possible paths, which don’t look too dissimilar to the limits of the ‘cone of uncertainty’. The question is whether this will be more effective at getting people to take notice.
Risk vs hazard
A problem that I am constantly coming across is a failure to distinguish between hazard and risk. We live in a world surrounded by hazards. Our kitchens are full of hazards; knives, bleach etc. The issue is that risk = hazard x exposure. That is very difficult to get across. We are moving towards a world where things are regulated by whether they are hazardous or not. The important thing is to minimise the exposure to hazards appropriately, and that is how we need to regulate.
The communication of radiation is a good example of where people think about the hazard without understanding the importance of different exposures. This slide shows 1 way of trying to get it across. Up here we have levels of exposure in micro-sieverts;
- 0.1 micro-sieverts is what you get from eating a banana
- 1.0 micro-sieverts is what you get from using a cathode ray monitor for a year
- 3.5 micro-sieverts is the extra dose from 1 day in an average town near Fukishima
- 10 micro-sieverts is the background dose received by an average person on an average day
- 40 micro-sieverts is a flight from New York to Los Angeles
- 100 micro-sieverts is a chest x-ray
- 400 micro-sieverts is the yearly dose per person from food per year
- 1,000 micro-sieverts is the yearly limit on radiation exposure to a member of the public
This starts to put radiation hazards in relation to exposure. You can see what the real risk associated with living near Fukishima actually was.
Here is an example from David Spiegelhalter. He has tried to communicate transport risks in terms of micromorts per 100 miles travelled. You could look at this and say cycling and walking are equally dangerous because you get 4 millionths of a chance of expiring per 100 miles walked compared to 5 millionths in cycling. But of course the difference is that cyclists tend to cycle a lot further than people walk. If you look at the overall risks of cycling it is a bit more dangerous than walking.
The theme of my message is that communication matters hugely. We have got to work constantly on how to do it in the best possible way.
Understand your audience
One of the challenges is that we talk about public communication as though there is a single public - there are many different publics. One of the things Nick Pidgeon, from Cardiff, has shown is that public values play a key part in how people respond. And we need to understand those public values. The predictable fright factors are;
- feelings of a lack of control in the face of monopolies
- challenges of building trust
- the impartiality and credibility of a commentator
And we have to respect the feelings of those that we are working with. It is about public engagement - tt is 2 way communication.
This slide shows work that he has done on different public values around energy. There is a general consensus that we;
- should be efficient and not wasteful
- need to take care of the environment and nature
- need security and stability
- worry about autonomy and freedom, choice and control
- worry about social justice, fairness, honesty and transparency
- worry about the trajectory of the future, interconnected improvement and quality
Only if we start understanding those values are we going to understand how people think about energy.
- 73% of respondents in a survey said that we should reduce our energy use and use of fossil fuels; people recognise the unsustainable nature of fossil fuels and their environmental harm
- 83% of respondents to the survey were fairly or very concerned that in the next 10 or 20 years electricity and gas would become unaffordable for them
If we are going to present and debate energy policy, it has to be in the context of these public values.
Misuse of statistics
This slide shows how not to visualise data. This is a figure from an Australian health authority, showing how they were recruiting new nurses. The problem here is that they are not showing the denominator properly. The figure at the start is 43,000 and the end figure is 47,000. The bottom of the bar chart would actually appear somewhere down here. We have to be honest in our communication. Not showing the origin of the denominator is an extremely common crime.
We have an increasing number of safeguards, in the UK, against the misuse of statistics. Sir Andrew Dilnot’s job, as the Chair of the National Statistics Authority, is to tell people off for the egregious misuse of government statistics.
Increasingly we have a community who use social media to comment much more actively about the abuse of statistics. People like Ben Goldacre do a very good job.
I think we have to promote a culture of open access so that people can get at the raw data. There is work being done to improve the accessibility of the ONS website for example. You have already heard about the letter that the Council for Science and Technology sent to the Prime Minister on algorithms. We have got to work on the skills pipeline, and on improving statistics, not only for professionals but for the public.
The privacy point is worth reiterating. The privacy argument and debate does matter. We have to balance the right to privacy and the necessity to hold and share data. We need an effective framework to;
- protect individuals
- build and maintain confidence
- facilitate research
I think that you and the Royal Statistical Society need to contribute to that debate.
My final message is that if we are going to work effectively we need to work together. I need to work with you, with the learned academies, with research councils, with universities, and with industry.
My overarching question when I have meetings with people is; ‘how can we help each other?’
Thank you very much indeed for your attention.