banner image

The RTVS Statistical Tool

The RTVS Statistical Tool is an interface that allows a user to query a wide variety of statistics and summary information using arbitrary date ranges and stratification options (e.g., regions, forecast issuance times).

The following diagram provides an overview of the RTVS convective Statistical Tool:
Convective stats tool

  1. Product - The product for which you wish to see verification results.
  2. Beginning Date - The beginning date of the verification period.
  3. Ending Date - The ending date of the verification period.
  4. Temporal Window - A Time window about the forecast valid time that is used for collecting observations.
  5. Output - The type of output you would like to produce (e.g., a time series).
  6. Group results - How to aggregate the statistics.
  7. Forecast Length - The forecast length (or projection time) of interest.
  8. X-Axis - The statistic for the x-axis. Applies to certain displays such as scatterplots.
  9. Y-Axis - The y-axis statistic. The user will change this for most displays.
  10. Grid - The horizontal grid that is used for verification.
  11. Motion - Certain forecasts have motion attributes for the forecast area. Whether or not to move them in time is chosen here.
  12. Region - The forecast region of interest.
  13. Output Data - The button to press to generate output.
  14. Add Data - Allows a user to add additional datasets for comparison purposes. Note: not all comparisons are meaningful!
  15. Clear Form - This resets the interface to its default state.
  16. Observation - The observation dataset used for verification.
  17. Issuance Time - The time at which the forecast is produced and disseminated to users.

How to use the RTVS Statistical Tool

The basic usage of the RTVS Statistical Tool is quite simple:

  1. Choose the product and date range that you want verification results for
  2. Choose an output type (scatterplot, summary table, etc.)
  3. Choose the relevant statistic(s)
  4. Press the Output Data button

In many situations that is all the action that is required to start generating results.

There may be occasions when it is meaningful to intercompare one forecast product with another and this is when the Add Data button becomes important. The figure below highlights what happens to the interface after the user has pressed the Add Data button and selected a different forecast product.
Adding a dataset
All data are copied (if possible) to initialize a new dataset interface that looks much like our initial interface with two important differences. The first, labeled 1 in the figure below, shows that the output type and X-axis are determined by the first interface. If you start to produce a time series and realize that you want to make a scatterplot instead, you would change the output type on the first interface. Second, two new buttons with the text Remove Data have appeared on the right side of each dataset interface (these are shown by the number 2 on the right side of the figure) and allow the user to quickly remove individual datasets from the output.

Tool options

The following information provides more details about choices for the various selections that affect output from the tool.

Dates
Beginning Date

The Beginning Date will default to either the previous date chosen by the user or to the earliest date for which data are available.

Ending Date

The Ending Date will default to either the previous date chosen by the user or to the latest date for which data are available.

In some situations you may notice that the date that you had selected is changed when you choose a different product. This behavior occurs if the date you had selected is not a valid date for the new product chosen. The interface chooses the closest available date in the new product to the date that was previously selected for the old product.

Times

The Time option refers to the time the forecast was issued (or produced). All times are UTC.

The forecast length is the lead time of the forecast (it is sometimes referred to as the projection time as well). For example, for the CCFP, there are forecast lengths of 2-, 4-, and 6-hours. The 2-h forecast will be valid 2 hours after the issue time, the 4-h forecast will be valid 4 hours after the issue time, and the 6-h forecast is valid 6 hours after issuance.

The time window specifies a period around the forecast valid time over which observations are collected for verification. Any forecast/observation pairs that are collected in that time window are used in the statistical analysis.

Region

Statistics are generated for several specific regions. Each of the regions has been defined to follow AWC requirements. The regions are shown in the figure below. The National region is the entire domain represented by the combination of the East, Central, and West Regions. The Northeast Corridor is shown in light blue and contains portions of the East and Central regions.


Grid

Grid refers to the spatial resolution of the grid that is used to for verification. This may not necessarily be the native resolution of the forecast or observation.

Motion

This option is only available for the C-SIGMET product. Options for the Motion parameter are either Included or Not Included. This refers to whether the motion of the convective forecasts is included or not in the evaluation of the forecast. When motion is included, each forecast shape is moved as specified by the forecast speed and direction from initial location to its position at the valid time. The motion is applied for the time period between the issuance time and the forecast length.

Group Results

Users can determine a time period that is used to aggregate the forecast/observation pairs. Choices include daily, weekly, monthly, quarterly, or yearly. For example, if weekly is chosen, the forecast/observation pairs are summed over each 7-day period starting from the beginning date. If data is missing within the 7-day period, only the available data within that period are used to compute the weekly statistics. The same approach is taken for the larger aggregates (e.g., monthly) as well.

Output Options
Output Type

The following types of output may be generated through the RTVS Statistical Tool:

Time Series

Time series plot for weekly Heidke Skill Score for the CCFP forecast, verified with NCWD for the period 1 March 2005 - 10 April 2007.

Scatterplot

Scatterplot of PODy vs Bias for Convective SIGMETs (C-SIGMETs) using NCWD from 1 May - 31 August 2005.


Box Plot

The box portion of a box plot encloses the region between the 0.25th and the 0.75th quantiles, and the line inside the box represents the median value, for which 50% of the values are larger and 50% are smaller. The ends of the whiskers extending above and below the box are the minimum and maximum values.

Box plot of the 2-h CCFP Final and 3-h CCFP Preliminary forecasts, both verified with the NCWF Detection Field (NCWD) for the period 10 March - 10 April 2007.

Forecast Length

The forecast length plot allows users to plot a statistic as a function of forecast length. An example of the forecast length plot is shown below and illustrates Percent Area for the 2-, 4-, and 6-h forecast lengths of the CCFP.

Forecast length plot of Percent Area for CCFP from 1 March - 10 April 2007.

Summary Table

The summary table allows users to view tabular statistical results for a large number of summary statistics (described below) for the selected date range and period.

Example summary table of weekly statistics for NCWF 1-h forecasts verified with NCWD for the period 1 - 30 March 2007.

Statistics

The forecast/observation pairs used to create the statistics are summarized in the table below. The rows in the table represent the forecasts, the columns in the table represent the observations, and the elements in the cells represent the counts of forecast/observation pairs. Each grid box represents a location for which a forecast and observation pair is defined. If, for example, a given grid box had no forecast of hazardous weather, but had an observation of hazardous weather, a Forecast No/Observed Yes (NY) event would be added to the appropriate cell in the contingency table.

Table 5.1 Contingency table for evaluation of dichotomous (Yes/No) forecasts. Elements in the cells are the counts of forecast-observation pairs.

Forecast Observation Total
Yes No
Yes YY YN YY+YN
No NY NN NY+NN
Total YY+NY YN+NN YY+YN+NY+NN

Bias

Bias is the ratio of the number of Yes forecasts to the number of Yes observations. It is a measure of over- or under-forecasting. An unbiased forecast has a bias value of 1. Note that a biased forecast can still be very inaccurate.

Bias = (YY + YN) / (YY + NY)

CSI

The CSI, or Critical Success Index, is the proportion of hits that were either forecast or observed. It is also known as Threat Score.

CSI = YY / (YY + NY + YN)

FAR

The FAR (False Alarm Ratio) is the proportion of Yes forecasts that were incorrect.

FAR = YN /(YY + YN)

Gilbert

The Gilbert Skill Score is the Critical Success Index (CSI) corrected for the number of hits expected by chance.

Gilbert = (YY - C2) / (YY - C2) + YN + NY

where C2 = (YY + YN)*(YY + NY) / N

Heidke

The Heidke Skill Score (Heidke) is the percent correct (Forecast Yes, Obs Yes or Forecast No, Obs No) adjusted by the number expected to be correct by chance.

Heidke = (YY + NN - C1) / (N - C1)

where N = YY + NY + YN + NN
C1 = [ (YY + YN)*{YY + NY) ] + [ (NY + NN)*(YN + NN)] / N

Percent Area

The Percent Area is the percentage of the forecast domain area where convection is expected to occur. It is the percent of the total area that had a Yes forecast. This measure does not depend on the observations.

% Area = (Forecast Area / Total Area) * 100

PODn

The PODn is defined as the proportion of No events that were correctly forecast.

PODn = NN / (NN + YN)

PODy

The PODy (Probability of Detection) is the proportion of Yes events that were correctly forecast. This is commonly referred to as the POD.

PODy = YY / (YY + NY)

TSS

The True Skill Statistic, (Doswell et al. 1990) is a measure of the ability of the forecast to discriminate between "Yes" and "No" observations. It is also known as the Hanssen-Kuipers discrimination statistic (Wilks 1995).

TSS = PODy + PODn - 1

Where to get more information

Doswell, C.A., R. Davies-Jones, and David L. Keller, 1990: On summary measures of skill in rare event forecasting based on contingency tables. Wea. and Forec., 5, 576-585.

Hudson, H.R. and F.P. Foss, 2002: The Collaborative Convective Forecast Product from the Aviation Weather Center's Perspective. Preprints, 10th Conference on Aviation, Range, and Aerospace Meteorology, Oregon, WA, Amer. Meteor. Soc., 73-76.

National Weather Service, 1991: National Weather Service Operations Manual, D-22. National Weather Service. (Available at this web site: www.nws.noaa.gov)

Weather Applications Workgroup, 2003: Statement of User Needs CCFP/2003. FAA, CDM, CR-Workgroups. February 2003. 26 pp.

Wilks, D.S., 1995: Statistical Methods in the Atmospheric Science. Academic Press, 467 pp.



Back to top