Documentation

The following documentation pertains to running CVDP v6. For instructions for previous CVDP versions, consult the CVDP_readme.pdf file distributed within the codebases.

To view details of the calculations used within the CVDP, simply click on the Methodology and Definitions link at the top of any CVDP comparison. Note that as of CVDP version 6.0.0, the stated methodology changes depending on the detrending methods selected.

Step-by-Step Running Instructions for CVDP v6

The CVDP was designed with ease-of-use in mind. There are only three text files that need to be set up and/or edited. The required underlying software packages (NCL and Image Magick) are free. 

1) Set up paths for the reference datasets in the namelist_obs file. The data specified in namelist obs are usually (but not necessarily) observations. Each row of text in namelist_obs must follow the same format:

Variable | Dataset name | Dataset path | Start year of analysis | End year of analysis

Variable must be set to one of the following: TS, TREFHT, PSL, PRECT, aice_nh, aice_sh, MOC or SSH. TS equals (sea) surface temperature (CMIP name = ts). TREFHT equals 2m air temperature (tas). PSL equals sea level pressure (psl). PRECT equals total precipitation (pr). aice_nh equals Northern Hemisphere sea ice area (siconc), and aice_sh equals Southern Hemisphere sea ice area (siconc). MOC equals the ocean meridional overturning mass streamfunction for the atlantic/arctic oceans (msftmz). SSH equals the sea surface heigfht above geoid (zos).  

Dataset name is simply the name the dataset is referred to as. 

Dataset path is simply the path of the dataset. Wildcards are allowed, as the dataset can be spread amongst multiple files.

Start year of analysis is the start year of the analysis for this dataset. Note that while the first year of a dataset can be incomplete (short of 12 months), the first year of analysis must be in a year where 12 months of data are present.

End year of analysis is the end year of the analysis for this dataset. Note that while the last year of a dataset can be incomplete (short of 12 months), the last year of analysis must be in a year where 12 months of data are present.

An example of a namelist_obs file:

TS | ERSSTv5 | /project/cas/DSets/ersstv5.185401-202412.nc | 1900 | 2023
PSL | ERA20c_ERA5_comb | /project/cas/ECMWF_reanalysis_comb/era20c_era5.mon.msl.190001-202312.nc | 1900 | 2023
TREFHT | GISTEMP | /project/cas/observations/gistemp.tas.188001-202402.nc | 1900 | 2023
PRECT | GPCC | /project/cas/observations/gpcc.pr.10.v2022_monitoring_2021-2024.189101-202402.nc | 1900 | 2023
aice_nh | NASA CDR | /project/cas/seaice_conc_nh_NASA_Team.nsidc.v04r00.197811-202312.nc | 1979 | 2023
aice_sh | NASA Bootstrap | /project/cas/seaice_conc_sh_NASA_Team.nsidc.v04r00.197811-202312.nc | 1979 | 2023
MOC | CESM1 Forced Ocean Sim | /p/cas/g.e11_LENS.GECOIAF.T62_g16.009.pop.h.MOC.194801-201512.nc | 1979 | 2015
SSH | ORAS4 | /project/observations/ORAS4/zos_oras4_1m_grid_1x1.195801-201712.nc | 1979 | 2017

TS | ERSSTv5 | /project/cas/DSets/ersstv5.185401-202412.nc | 1950 | 2023
PSL | ERA5 | /project/cas/observations/ERA5_monthly/era5.msl.194001-202312.nc | 1950 | 2023
  • Filenames must end with the syntax "YYYYMM-YYYYMM.nc" following CMIP naming conventions. 
  • For CVDP version 6, at least one reference dataset must be specified for TS, TREFHT, PSL and PRECT. Observational datasets are available on the CVDP website
  • Reference datasets for aice_nh, aice_sh, MOC and SSH can be on rectilinear or curvilinear grids, while datasets for TS, TREFHT, PSL and PRECT must be on rectilinear grids.  
  • One can specify as many reference datasets for each variable as one would like. One can also specify the same dataset twice covering different time periods.
  • One can supply as many observational datasets for each variable as they would like. Note that as in the example above one does not have to supply equal number of datasets for each variable.  
  • In order to get complete ENSO metrics for reference datasets specified in namelist_obs, the start analysis/end analysis years should match for each set of reference datasets for the TS, PSL, TREFHT, PRECT and SSH variables. For example, if the first supplied TS (ts) dataset in namelist_obs is set to be analyzed from 1900-2023, set the first supplied PSL (psl), TREFHT (tas), PRECT (pr) and SSH (zos) datasets to be analyzed from 1900-2023 as well. In the namelist_obs example above, the analysis years match for the first specified TS, TREFHT, PSL and PRECT files, but SSH is different. This will result in ENSO composite plots not showing up for the SSH variable. 

2) Set up paths for the simulations in the namelist file. Each row in namelist must follow the same format:

Dataset name | Dataset path | Start year of analysis | End year of analysis | Ensemble (optional)

Dataset name is the name the dataset is referred to as. 

Dataset path is the path that points to the variable files (specifcally CVDP variables ts, tas, psl, pr, siconc, zos, msftmz) for a simulation. Wildcards are allowed, but not required. In general, it is best to try specifying the most limited path possible. For example, if all variables for a run are in a single directory, specify the directory. If the dataset path ends with a subdirectory specified, the syntax should end with a "/". One can differentiate a simulation's data from other simulations in the same directories if needed. See examples below.

Start year of analysis is the start year of the analysis for this dataset. Note that while the first year of a dataset can be incomplete (short of 12 months), the first year of analysis must be in a year where 12 months of data are present.

End year of analysis is the end year of the analysis for this dataset. Note that while the last year of a dataset can be incomplete (short of 12 months), the last year of analysis must be in a year where 12 months of data are present.

Ensemble is a number followed by a dash and the name of the ensemble that the simulation is a part of. (ex. 1-CESM2 Large Ensemble) This setting is only required when the driver.ncl setting runstyle is set to 2. (See driver.ncl setup details below.). The number and name should be the same for each member of an ensemble. 

An example of a namelist file with the driver.ncl setting of runstyle = 1:

CESM2 Control | /project/cesm2/b.e21.B1850.f09_g17.CMIP6-piControl.001/ | 600 | 1199
CanESM5 r1i1p1f1 | /project/cmip6/piControl/{Amon,LImon,SImon,Omon}/*/CanESM5/r1i1p1f1/gn/ | 5201 | 6200
CESM2-LENS 1161.009 | /project/cesm2/LENS/{atm,ice,ocn}/month_1/*/*1161.009.{cam.h0,pop.h,cice.h}* | 1850 | 1915

In the first line above, all variables for that simulation are in the stated directory and there are no other simulation's data present, so specifying a single directory is all that is needed. In the second line above, data files for the simulation are organized by variable subdirectories, so specifying "/*/" tells the CVDP to check all variable subdirectories for files required by the CVDP. In the third line above, data from multiple simulations is organized by variable subdirectories (hence the /*/ syntax again). But as multiple simulations are present in each directory additional syntax is required ("*1161.009.{cam.h0,pop.h,cice.h}*") to isolate the particular simulation. 

An example of a namelist file with the driver.ncl setting of runstyle = 2:

CESM2 1151.008 | /p/LENS2/{atm,ice,ocn}/month_1/*/*1151.008.* | 1900 | 2100 | 1-CESM2
CESM2 1171.009 | /p/LENS2/{atm,ice,ocn}/month_1/*/*1171.009.* | 1900 | 2100 | 1-CESM2
CESM2 1191.010 | /p/LENS2/{atm,ice,ocn}/month_1/*/*1191.010.* | 1900 | 2100 | 1-CESM2
CESM1-1 | /p2/*/b.e11.B*.f09_g16.001.{cam.h0,pop.h,cice.h}* | 1920 | 2100 | 2-CESM1
CESM1 2 | /p2/*/b.e11.B*.f09_g16.002.{cam.h0,pop.h,cice.h}* | 1920 | 2100 | 2-CESM1
CAS r1i1p1f1 | /pc6/{historical,ssp585}/{Amon,SImon,Omon}/*/CAS-ESM2-0/r1i1p1f1/ | 1900 | 2100 | 3-CAS
CAS r3i1p1f1 | /pc6/{historical,ssp585}/{Amon,SImon,Omon}/*/CAS-ESM2-0/r3i1p1f1/ | 1900 | 2100 | 3-CAS

The above namelist file indicates that there are three ensembles (CESM2, CESM1 and CAS). Note that ensembles can be of any size and can span different years from one another, and any number of ensembles can be specified in the namelist file. "/*/" syntax is again used when files are organized by variable subdirectories. 

3) Set desired options in driver.ncl. Note while there are 21 available options, only a small number need to be set time after time. The following explains each option in detail.

runstyle = 1    

(Commonly altered setting.) The runstyle option allows the user to choose which version of the CVDP to run. When runstyle=1, this equates to running the CVDP in individual simulations mode. Ensembles metrics are not calculated in this mode, while the webpage output will be one webpage that shows metrics for each simulation and differences between that simulation and the first set of reference data for each variable listed in namelist_obs. When runstyle=2, this equates to running the CVDP in ensemble mode. Ensemble metrics are calculated, while webpage output will be two webpages with one showing ensemble metrics, and the other showing individual metrics for each simulation. Note that when runstyle is set to 2 an extra column of information must be present in the namelist file identifying which ensemble each simulation belongs to. See the namelist instructions above.

outdir = "/project/diagnostics/external/Multi-Case/TestComp1/" 
webpage_title     = "Title goes here"                          

(Commonly altered settings.) The outdir setting is the local directory path where one would like all CVDP output to be stored. Note that the total size of all CVDP output for a single comparison can be greater than 10GB. webpage_title is the title shown at the top of all CVDP generated webpages.

remove_trend_obs  = "QuadraticTrend"  
remove_trend_model= "QuadraticTrend" 
ensemble_mean_dir = "/project/CVDP-EM_ncl/"     

(Commonly altered settings.) remove_trend_obs and remove_trend_model set the type of detrending to be applied to the reference datasets (specified in namelist_obs) and simulations (specified in namelist). The detrending method will be applied after the annual cycle has been removed for that dataset. (Note #1: The selected detrending methods are not applied to Climatological Averages or Linear Trends metrics. Note #2: For non-anomalous sea ice extent calculations, prior to the extent being calculated, the annual cycle is first removed, the selected detrending methods are applied, and the annual cycle is then added back in.)

The valid settings for remove_trend_obs and remove_trend_model are:

"None" : No form of detrending will be applied.

"LinearTrend" : The linear trend is removed by month.

"QuadraticTrend" : The quadratic trend is removed by month. 

"30yrRunningMean" : A weighted running mean is applied by month using low-pass filter weights from the IPCC-AR4. The 30 weights can be found in the CVDP codebase within the functions.ncl file under the remove_trend function. These weights are roughly equivalent to a 30-yr low pass smoother. After the weighted running average is applied, the weighted array is subtracted from the raw array to effectively remove the long-term trend.

"rmGMST_EM" : The ensemble mean global mean tas array is regressed upon the variable relevant for that particular metric, and then the regression map is scaled by the value of the ensemble mean global mean tas value at each timestep. This new scaled regression array is then subtracted from the raw array to effectively remove the long-term trend. This method is only available when setting remove_trend_model.

"rmEM": The ensemble mean is removed at every timestep. This method is only available when setting remove_trend_model.

ensemble_mean_dir is the local directory where created ensemble mean files will be placed. This option only needs to be set when remove_trend_model is set either to "rmGMST_EM" or "rmEM".

namelists_only       = "False"   

(Commonly altered setting.) The namelists_only setting (valid options = "True" or "False") allows one to run the CVDP to create the individual variable namelists and to stop. This allows one to examine the derived namelists found in the namelist_byvar subdirectory, and to verify that the correct files have been identified for each simulation and variable. The hardest aspect of running the CVDP is setting the syntax within the namelist file to correctly identify all the files for a particular simulation. As this can take some trial and error, setting the namelists_only option = "True" allows one to quickly tweak the namelist syntax and to test it. Once one is comfortable with the identified files specified in the namelist_byvar directory, setting namelists_only = False will result in the complete CVDP package being run.

create_graphics      = "True"  

(Not commonly altered.) Setting create_graphics = "True" tells the CVDP to create graphics and form ensemble mean files (when runstyle = 2). Setting create_graphics = "False" instructs the CVDP to calculate metrics for individual simulations and to stop. netCDF files for individual simulations and reference datasets are created with either setting.  

max_num_tasks = 6           

(Not commonly altered.) max_num_tasks is set to the number of processes that one would like to be run at once. The CVDP employs a simple parallelization scheme (using python's subprocess functionality) to run multiple metric scripts at once. 

modular =  "True"            
modular_list = "amoc,amv,pr.trends_timeseries"  

(Commonly altered settings.) Setting modular = "False" instructs the CVDP to run every metrics script. Setting  modular = "True" tells the CVDP to only run those metric scripts specified in the modular_list setting. The following metric scripts can be listed in modular_list separated by a comma: amv, sst.indices, siconc.trends_timeseries, psl.trends_timeseries, pdv, pr.trends_timeseries, siconc.mean_stddev, soi, nam, nao, sam_psa, pna_npo, amoc, tas.trends_timeseries, ipv, sst.mean_stddev, psl.mean_stddev, pr.mean_stddev, sst.trends_timeseries, tas.mean_stddev, zos.mean_stddev, zos.trends. Note that the CVDP writes out netCDF files of the calculated metrics, and will reuse the calculations if they are present in the output netCDF files. Thus, if the CVDP fails in the middle of a script (say when computing zos trends), but runs all the other metrics scripts successfully to completion, one can set modular = "True", and modular_list = "zos.trends", and resubmit driver.ncl. The zos trends that have already been calculated will be read in and used, while new zos trends will be created where needed. modular and modular_list are frequently used to fill in gaps in the calculated metrics, or, to only run the CVDP for a specified set of metrics. 

  zp = "scripts/"                            

(Not commonly altered.) zp is the directory path (full or relative) to where the CVDP metrics scripts are located.

png_scale         = 3.0               

(Commonly altered setting.). png_scale should be set to a value between .1 and 5., and corresponds to the resolution of the output .png files (if  output_type = "png"). Recommended setting is between 2 and 3.

tar_output = "False"   

(Not commonly altered.) Setting tar_output = "False" will tar up all the contents of outdir once the CVDP comparison is complete.

regrid_check      = "False"    
regrid_to_res     = "/project/mojave/cesm2/PSL.206501-210012.nc"  
regrid_dir        = "/project/CVDP-regrid/"           

(Not commonly altered.) These settings are used to check if the input data is on an unstructured grid. regrid_check should be left = "False" unless one knows that some of the input data is on an unstructured grid. (Reason: There is a time hit for the CVDP to examine every input file.) If one knows there is unstructured data present, set regrid_check = "True", which will cause the CVDP to check input data for unstructured grids, regrid the unstructured data to the rectilinear grid found in the file specified via the regrid_to_res setting, and put the final regridded dataset to the directory specified in regrid_dir

colormap          = 0       
output_type       = "png"   
ncl_exec          = "ncl"                                                          
machine_casesen   = "True"  

(Not commonly altered.) Setting colormap = 0 results in the CVDP using it's default colormaps, setting colormap = 1 results in the CVDP using colormaps that are better those with color blindness. Setting output_type = "png" results in the CVDP creating .png files directly, setting output_type = "ps" results in the CVDP creating ps files first and then using Image Magick to convert those ps files to .png files (for viewing on the web). If one wishes to change the NCL executable command to something besides NCL, one can set ncl_exec. If your system is case sensitive, set machine_casesen = "True". For those with insensitive systems, set machine_casesen = "False".

4) Make sure that namelist_obs, namelist and driver.ncl are in the same directory. Once one is ready to start, one can submit the comparison a number of ways from the terminal command line.

A - "ncl driver.ncl" - this submits the job in the foreground.

B - "ncl driver.ncl >&! a.out" - this submits the job in the background and directs all terminal output to a.out.

C - One can submit the comparison as a batch script, specifying "ncl driver.ncl" as the command to be run. Note that on some systems you may need to load the NCL and Image Magick libraries when you submit the batch job.

5) Once the comparison is complete, all output including graphics, netCDF files, metrics, and webpages are written to the user specified driver.ncl outdir path.