How do I verify worded forecasts?

Ian Jolliffe, University of Aberdeen

The simple answer to this question is ‘with difficulty and with caution’. A sample of extracts from worded forecasts is ‘a shower or two but mainly fine’, ‘best of sunshine in southeast’, ‘partly cloudy’, ‘le temps restera très orageux sur le sud Auvergne’. These are taken from Australia, the UK, the USA and France on one day in July 2003. What is common to all is that the forecast as it stands is not quantitative or even categorical, and different users of the forecast can, and most likely will, interpret it differently. Hence there is inevitable subjectivity in deciding whether or not the forecast was a good one. Such forecasts are often aimed at a general audience – users with specific needs will usually be given specific quantifiable information relevant to those needs.

Many worded forecasts can be made more definite. For example, ‘windy in the northwest later’ could mean ‘at some time in a 12-hour period, the mean wind speed at station A in the northwest will exceed a specified threshold’; ‘frost is likely’ may mean ‘the probability of frost exceeds 0.7’, and so on.. The forecaster may, or may not, have such a definition in mind, but to issue the more technical version would be unattractive for a general audience. In circumstances where a technical definition underlies a worded forecast, the worded forecast can be verified by going back to technical definition. Depending on the nature of that definition (binary, continuous, probabilistic, …) an appropriate verification strategy can be chosen.

If no underlying technical definition is available, verification is inevitably subjective. Consider again ‘windy in the northwest later’, and a number of possible outcomes. It may be windy everywhere, not only in the northwest, or it may be windy later in the forecast period, but even windier earlier. It is quite possible to treat these outcomes as corresponding to ‘good’ forecasts, ‘poor’ forecasts, or somewhere in between.

A typical worded forecast is several sentences or paragraphs in length. To get a measure of the overall skill of the forecast it is necessary to divide it into simple phrases and assess each. To get a full picture it may be desirable to assess implicit, as well as explicit parts of the forecast. For example to tackle the problem described in the previous paragraph, the phrase ‘not windy later in areas other than the northwest’, although not explicit in the forecast, is implicit and should be assessed.

To decrease the subjectivity involved in assessing a forecast phrase an approach that dates back to Wright and Flood (1973) is to create an ‘anti-forecast’ that is notionally the opposite of the forecast phrase. By comparing the forecast with a baseline (the anti-forecast) that should be much worse (have negative skill), a more objective way of assessing the forecast is achieved.

Another approach is use a baseline a forecast/observation pair where there should be zero skill. For example the current weather could be compared with the forecast made for today, and with one or more forecasts made for completely different days, say one year ago. The proportion of times that the relevant forecasts appear to be better than the irrelevant ones gives a measure of the forecasts’ skill. Jolliffe and Jolliffe (1997) considered two variants of this approach. Without a technical definition it is still impossible to get away from subjectivity, and the person assessing the forecasts would need to be blind to which of the forecasts was the relevant one, if potential bias is to be avoided. A study by Jolliffe and Jolliffe (1997) showed that two independent, but ‘non-blind’, assessors came to substantially different conclusions for the same set of forecasts, even though both thought that their views were free from bias.

To re-iterate, if underlying technical definitions are unavailable and subjectivity therefore remains, worded forecasts are difficult to verify without the potential for bias, and claims for the skill of worded forecasts therefore need to be treated with caution.


Jolliffe I.T. and N.M.N. Jolliffe, 1997: Assessment of descriptive weather forecasts. Weather, 52, 391-396.

Wright P.B. and Flood C.R., 1973: A method of assessing long-range weather forecasts. Weather, 28, 178-187.


July 2003