Does I/O routines in Module M3UTILIO run slow?

Hello,
I am now writing Fortran code with I/O API to handle gridded emission netCDF files. It seems that using READ3() and WRITE3() to read and write data is not as fast as using netcdf.getVar() and ncwrite() in MATLAB. And at last it takes a few seconds to shut program down with M3EXIT(). So I wonder if this speed is normal for the I/O routines in module M3UTILIO. And is there some way to speed up the code with I/O API?

My gridded emission files has 184 columns, 124 rows, 13 layers and 69 variables.

Below is my code:

PROGRAM region_cut
!  Purpose:
!    To cut emission for specific regions.
!
!  Record of the revisons:
!      Date         Programmer            Description of change
!      ====         ==========            =====================
!   2021/06/11      Peng Zimu             Original code
!
USE M3UTILIO
IMPLICIT NONE

! Data dictionary:
REAL, ALLOCATABLE, DIMENSION(:,:,:,:) :: indata, outdata
REAL, ALLOCATABLE, DIMENSION(:,:) :: ratiodata
CHARACTER(16) :: INPUTFILE = 'INPUTFILE'
CHARACTER(16) :: OUTPUTFILE = 'OUTPUTFILE'
CHARACTER(16) :: PGNAME = 'region_cut'
CHARACTER(256) :: XMSG, CUTORPRSV
INTEGER :: i, j, k
INTEGER :: istat, LOGDEV
INTEGER :: gdate, gtime
REAL :: start, finish

! Initialize I/O API
LOGDEV = INIT3()

! Open ratio file
IF (.NOT. OPEN3('ratiofile', FSREAD3, PGNAME)) THEN
  XMSG = 'ERROR: Could not open ratio file'
  CALL M3EXIT(PGNAME, 0, 0, XMSG, 2)
ENDIF

! Open netCDF input file
IF (.NOT. OPEN3(INPUTFILE, FSREAD3, PGNAME)) THEN
  XMSG = 'ERROR: Could not open ' // INPUTFILE // ' for input'
  CALL M3EXIT(PGNAME, 0, 0, XMSG, 2)
ENDIF

! Get the file description
IF (.NOT. DESC3(INPUTFILE)) THEN
  XMSG = 'ERROR: Could not get ' // INPUTFILE // ' file description'
  CALL M3EXIT(PGNAME, 0, 0, XMSG, 2)
ENDIF

! Allocate variables array dimensions
ALLOCATE( indata(NCOLS3D, NROWS3D, NLAYS3D, NVARS3D), STAT = istat)
ALLOCATE( outdata(NCOLS3D, NROWS3D, NLAYS3D, NVARS3D), STAT = istat)
ALLOCATE( ratiodata(NCOLS3D, NROWS3D), STAT = istat)

! Set date and time
gdate = SDATE3D
gtime = STIME3D

! Read in ratio
IF (.NOT. READ3('ratiofile', ALLVAR3, ALLAYS3, gdate, gtime, ratiodata)) THEN
  XMSG = 'ERROR: Could not read ratio file'
  CALL M3EXIT(PGNAME, 0, 0, XMSG, 2)
ENDIF

! Make sure ratiodata being perserving ratio
CALL GET_ENVIRONMENT_VARIABLE('CUT_RATIO',CUTORPRSV)
IF (CUTORPRSV == 'T') THEN
  ratiodata = 1 - ratiodata
ENDIF

! Open Outputfile
IF (.NOT. OPEN3(OUTPUTFILE, FSCREA3, PGNAME)) THEN
  XMSG = 'ERROR: Could not open ' // OUTPUTFILE // ' for output'
  CALL M3EXIT(PGNAME, 0, 0, XMSG, 2)
ENDIF

! Create output file in loops
DO i = 1, MXREC3D
  
  CALL CPU_TIME(start)
  ! Read in all vars data for a time
  IF (.NOT. READ3(INPUTFILE, ALLVAR3, ALLAYS3, gdate, gtime, indata)) THEN
    XMSG = 'ERROR: Could not read ' // INPUTFILE // ' for all vars input'
    CALL M3EXIT(PGNAME, 0, 0, XMSG, 2)
  ENDIF

  CALL CPU_TIME(finish)
  WRITE(*, 90) finish - start
  90 FORMAT ("Time for READ3 = ",F6.3," seconds.")

  CALL CPU_TIME(start)

  DO k = 1, NVARS3D
    DO j = 1, NLAYS3D
      outdata(:,:,j,k) = ratiodata * indata(:,:,j,k)
    END DO
  END DO

  CALL CPU_TIME(finish)
  WRITE(*, 100) finish - start
  100 FORMAT ("Time for calculating = ",F6.3," seconds.")

  CALL CPU_TIME(start)

  ! Write output file
  IF (.NOT. WRITE3(OUTPUTFILE, ALLVAR3, gdate, gtime, outdata)) THEN
    XMSG = 'ERROR: Could not write all vars to ' // OUTPUTFILE // ' for output'
    CALL M3EXIT(PGNAME, 0, 0, XMSG, 2)
  ENDIF

  CALL CPU_TIME(finish)
  WRITE(*, 110) finish - start
  110 FORMAT ("Time for WRITE3 = ",F6.3," seconds.")

  CALL NEXTIME(gdate, gtime, 10000)

END DO

CALL CPU_TIME(start)

DEALLOCATE(indata, STAT = istat)
DEALLOCATE(outdata, STAT = istat)
DEALLOCATE(ratiodata, STAT = istat)

CALL CPU_TIME(finish)
WRITE(*, 120) finish - start
120 FORMAT ("Time for DEALLOCATE = ",F6.3," seconds.")

! Normal termination and exit I/O API
XMSG = 'I/O API normal termination'
CALL M3EXIT(PGNAME, 0, 0, XMSG, 0)

END PROGRAM region_cut

Thanks!

No it is not normal. And the I/O API is (should be) calling the same netCDF as MatLAB, for that matter. There is a little overhead from the fact that M3EXIT waits while it flushes data to disk, but that should be less than a tenth of a second for most systems.
A few of questions:

  • What platform are you running on? (details, please…)
  • What netCDF version are you using? …is MATLAB using?
  • How is your netCDF compiled (is it a “debug/check-everything” compile, while MATLAB is using an optimized-compiled package)?
1 Like

Carlie, thanks a lot for your reply! Following your questions, I do some tests and guess the problem comes from NAS.

Yesterday, I ran the Fortran program on CentOS 6.5 with Intel 11.1, on which the netCDF and I/O API was installed by my former group members. The version of netCDF is 3.6.3 and I/O API is 3.1.But I do not know the details of compiling. When running the program, input/output was stored on NAS.

And today, I do some tests on my own Linux workstation. The distribution is CentOS 8.3. The compiler is Intel OneAPI 2021.2.0. The version of netCDF-C is 4.8.0 and of netCDF-Fortran is 4.5.3. When compiling, I add these options:–disable-netcdf-4 --disable-dap. The version of I/O API is 3.2. I first run the program with input/output on local disk. The result of time command is

real	0m4.017s
user	0m1.855s
sys	0m2.155s

And then I mount the NAS, set input/output path to the NAS and rerun the program. It takes longer time than input/output is on local disk.

real	0m29.052s
user	0m1.769s
sys	0m3.789s

So I think this problem comes from NAS.

The network status of the NAS shows ‘1000 Mbps, Full duplex, MTU 1500’. And I enable asynchronous for NFS.

As for READ/WRITE speed, it seems that I get a wrong impression of the comparison between I/O API and MATLAB. They seems to have similar speed. And I do not know if MATLAB is using an optimized-compiled package. I have not installed netCDF on my Windows.

I’m not an expert on NFS-mounted NAS, but… a little googling turns up:

For metadata-intensive workloads NFS is often quite slow, due to the need for cache revalidation and commit-on-close.

which describes what is happening with netCDF (and probably made worse by the fact that I/O API synchs after writes, to avoid possibilities of losing data): not only do you have the data-reads and writes to do, but also file-header updates with each write.

And be aware that almost always the bottleneck in workstation/desktop GigE is the network card – either on the workstation, the server, or both – which is generally much slower than the disk-controller.

1 Like

Thank you very much for your explanation!

And as an aside: since Linux uses extra available RAM as disk-buffer, often you can improve performance by adding extra memory. I recall a few years ago benchmarking a 1.3GB-resident job on otherwise identical machines with 8GB and 32GB RAM. The 32GB machine was 20% faster…

So if you’re specifying a modeling machine, make sure you get plenty of RAM.

1 Like