Saturday, November 14, 2009

Notes Intel Fortran


msvcrtd.dll issue
dumpbin, editbin, stack
Fortran routines for DLL
Example Fortran routines for DLL called by C#
Fortran C# wrappers and data compatibility
Setup IMSL
Setup MKL
Setup - Intel Fortran / IMSL Environment Variables
Intel Fortran 10.1 and VS. Net 2005
Intel Fortran 11.0
Managed Code
Building DLLs (Fortran DLLs used in Fortran apps)
!DEC$ ATTRIBUTES directives
Passing module variables and functions in DLL
Best Practice
Errors - Debugging
How to Add Version and other Metadata to DLL or EXE
Using VTune
Compiler Options
Build Macros (eg $(OUTDIR))
Using MKL
Using LAPACK95 & General Comment on DLLs, LIBs, Mod files
Mixed language programming
Stack Checking
Enable Vectorization and Report
Enable OpenMP
Using Thread Profiler
Using Thread Checker
Profile Guided Optimization
Using code coverage

msvcrtd.dll issue

Performance Tools for Software Developers
libmmdd.dll is dependent on msvcrtd.dll which is no longer distributed.

Note: This only applies to the compilers for Intel® Extended Memory 64 Technology and for the Itanium® Architecture.
Applications or DLL's that are built with /MDd or directly link against Intel's libmmdd.dll may emit the runtime error.
This application has failed to start because msvcrtd.dll was not found. Re-installing the application may fix this problem.

The Platform SDK distributed with Microsoft* Visual Studio* 2005 does not contain msvcrtd.dll. Using /MDd links against the Intel math library libmmdd.dll which has a dependency on msvcrtd.dll.

This is a known issue that may be resolved in a future product release. As a work-around, use the msvcrtd.dll distributed with the Microsoft* Platform SDK available at  † .

May need to get msvcrtd.dll from somewhere to be put into c"\windows\system32"

dumpbin, editbin, stack
To run these command line tools,
- go to "Start" button -> "All Programs"
  -> "Intel Software Development Tools"
  -> "Intel Compiler 8.0"
  -> "Build Environment for Fortran IA-32 Applications"

To check the stack size of a program.
Run "dumpbin /headers executable_file", and you can see the "size of stack reserve" information in "optional header values".

To enlarge the stack of a program:
Run "editbin /STACK: program.exe"

The Easiest Way ( .NET 2.0 )
In .NET 2.0 and newer you can simply specify thread size in a thread’s constructor. Unfortunately, this method is only compatible only with Windows XP and newer operating systems. You can specify this parameter on those platforms but it will have no effect; the stack size in the binary header will be used.
using System.Threading;

Thread T = new Thread(threadDelegate, stackSizeInBytes);

Fortran routines for DLL

    subroutine my_sub(I)
        !DEC$ ATTRIBUTES C, ALIAS:"_My_Sub" :: my_sub
        integer i
    end subroutine
  end interface

- Case Sensitive: Fortran is not, C/C++ is.
- Arrays are always passed by reference
- ATTRIBUTES for an argument may be: VALUE, REFERENCE
- C or STDCALL makes passing all arguments by value, except arrays.
- the VALUE or REFERENCE argument options, overide the routine option
  of C or STDCALL.
- for IA-32 system, need to put underscore for routine to be called by C.
- cannot call internal procedures from outside the program unit that contains them.
- To pass Variable number of arguments, need C and VARYING, not STDCALL

Example Fortran routines for DLL called by C#
! Public wrapper for status_msg_get_code
integer*4 pure function StatusMsgGetCode(msg)
    StatusMsgGetCode = status_msg_get_code(msg)

- uses STDCALL (can't handle optional argument (see Intel F User Guide)
- uses alias with leading underscore
- wrap and rename code to get rid of underscore in function name, eg.
    status_msg_get_code  --> StatusMsgGetCode
- uses REFERENCE to pass arguments

Fortran C# wrappers and data compatibility
This is best illustrated by example:

Fortran function in Fort.dll:
    subroutine Foo_dll()
    end subroutine

C# declaration
#if x64
        [DllImport("Fort.dll", EntryPoint = "_Foo_dll")]
        [DllImport("Fort.dll", EntryPoint = "Foo_dll")]
        private static extern void Foo_dll();

public static void CallFoo_dll()

Below lists the Fortran to C# data type declarations, with X as the variable name:
Fortran C# C# C#
integer(4) X [In, Out] ref int    X ref int      X ref X
real(8)    X [In, Out] ref double X ref double X ref X
real(8)    X(N)     [In, Out] double[] X ref double[] X              X

Setup IMSL
Documentation -
1. Start -> Programs -> IMSL Fortran Library 5.0. This contains:
QuickStart, Readme, User's Guide
2. PDF docs contains
Math Library V1, V2, Statistical Libraries, Special Functions

IMSL is not Thread safe. It is still safe to use, provided that calls to the
IMSL routines are made from a single thread.

VS.Net integration
1. In VS.Net, goto Tools -> Options -> Intel(R) Fortran -> Project Directories ->
type in the Include and Libraries directory path.
2. Specify the following include statements;
   include 'link_f90_static.h'
   include 'link_f90_dll.h'
   include 'link_f90_static_smp.h'
   include 'link_f90_dll_smp.h'
or go to Projects -> Add Existing Item ... browse to add the library.

The link*.h files contain directives to point to certain *.dll files. For example,
link_f90_dll.h contents are:
!dec$objcomment lib:'imsl_dll.lib'
!dec$objcomment lib:'imslscalar_dll.lib'
!dec$objcomment lib:'imslblas_dll.lib'

3. Inside the code, in addition to the include directives in step 2, need to include
some USE statements. For example, to use the random number generator rnun, we need:
i) use rnun_int; or
ii) use imsl_libraries; or
iii) use numerical_libraries

iii - is used to provide backward compatibility with previous IMSL libraries and Fortran77
version of the library. It may not be necessary to use iii and calling the functions as before
will continue to work.

Using ii provides access to all the IMSL functions, so individual use statements are not needed.
However, some may choose to use i because it shows explicitly which functions are called.

Using BLAS
1. Intel MKL Blas library used automatically when IMSL is linked with the
SMP (ie. multiprocessing) option.
2. See ia32 or ia64 Readme to link 3rd party blas with IMSL.

IMSL version 6.0
- Env Var - run ia32\bin\fnlsetup.bat .
- MUST remove old references, eg. include 'link_f90_dll.h'   (because new headers have diff name)
- MUST rename directory of older installations of IMSL, so that any old env vars cannot
  accidentally point to it.
- Add include statement in the relevant source files:
     include 'link_fnl_shared.h'       ! for dynamic dlls
include 'link_fnl_shared_hpc.h'   ! for dynamic dlls and SMP (OpenMP)
- Add include directory in VS.Net
    Project - Properties - Fortran - Include Directories: $(FNL_DIR)\ia32\include\dll
- Add library directory in VS.Net
    Project - Properties - Fortran - Library Directories: $(FNL_DIR)\ia32\lib
- Run the ASSURANCE tests provided by IMSL in ...\examples\eiat. Note that
  in run_test.bat, need to use %LINK_FNL_STATIC%

Setup MKL
Linking to MKL can be done either statically *.lib or dynamically *.dll

For ia32 apps, when linking statically, link to mkl_c.lib or mkl_s.lib
For ia32 apps, when linking dynamically, link to these STATIC libs:
   mkl_c_dll.lib or mkl_s_dll.lib
that will provide interfaces to the correct DLLs.

For MKL v 10.0
- Major changes, MKL divided into layers: Interface, Threading, Computation, RTL.
- Support for 64bit via ILP64/LP64.
- Use of OpenMP for threading and MPI and Scalapack for distriubuted computing.
- Env Vars: The following variables would have been set by running tool/environment/mklvars32.bat
  $(MKLPATH) = root location of MKL directory - eg D:\programs\Intel\MKL\10.0...
- Visual Studio config
  Project -> Properties -> Linker -> General -> Add additional Library Directories
  Project -> Properties -> Linker -> General -> Add additional Include Directories

- Linking
  Intel advises to link libguide and libiomp dynamically even if others are linked statically.
  Link items to consider:
      Interface:            Threading:              Computation:         RTL:         Description:
 mkl_intel_c_dll.lib   mkl_sequential_dll.lib      mkl_core_dll.lib           Dynamic, non-parallel, 32bit

  Actual linking done in code by using the !$dec attributes such as:
        !dec$objcomment lib:'mkl_intel_c_dll.lib'
   !dec$objcomment lib:'mkl_sequential_dll.lib'  
!dec$objcomment lib:'mkl_core_dll.lib'
   !dec$objcomment lib:'mkl_lapack95.lib'
- You are advised to link with libguide and libiomp dynamically even if other libraries are
linked statically. (MKL user guide, Chap 5)

- To use THREADED / PARALLEL / OPENMP Intel MKL, it is highly recommended to compile your code with the /MT
option. The compiler driver will pass the option to the linker and the latter will load
multi-thread (MT) run-time libraries.
- For multi-threading based on Intel OpenMP
(many many bins....)
lib\libguide40.lib, OR lib\libiomp5md.lib,
bin\libguide40.dll, OR bin\libiomp5md.dll

Setup - Intel Fortran / IMSL Environment Variables
user defined:
C:\Program Files\VNI\CTT5.0\CTT5.0\INCLUDE\IA32;C:\Program Files\Microsoft Visual Studio .NET 2003\SDK\v1.1\include\
C:\Program Files\VNI\CTT5.0\CTT5.0\LIB\IA32;C:\Program Files\Microsoft Visual Studio .NET 2003\SDK\v1.1\Lib\
C:\Program Files\VNI\CTT5.0\CTT5.0\LIB\IA32;%PATH%;d:\DATA\UsercheeOnD\tools\NixTools\bin;%MSNET_C%\bin;C:\Program Files\Microsoft Visual Studio .NET 2003\Common7\IDE;C:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin;%PStill%;D:\Program\UnderstandF90\bin\pc-win95

system variables:
C:\Program Files\VNI\CTT5.0\CTT5.0\LIB\IA32;%PATH%;d:\DATA\UsercheeOnD\tools\NixTools\bin;%MSNET_C%\bin;C:\Program Files\Microsoft Visual Studio .NET 2003\Common7\IDE;C:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin;%PStill%;D:\Program\UnderstandF90\bin\pc-win95
"C:\Program Files\VNI\CTT5.0\CTT5.0\examples\IA32"
Intel(R) Fortran Compiler for 32-bit applications, Version 8.1
Microsoft Windows XP/2000/2003
/w /I:"C:\Program Files\VNI\CTT5.0\CTT5.0\include\IA32" /fpe:3 /nologo
/w /I:"C:\Program Files\VNI\CTT5.0\CTT5.0\include\IA32" /fpe:3 /nologo
C:\Program Files\VNI\CTT5.0\CTT5.0\INCLUDE\IA32;%INTEL_FORTRAN80%\ia32\include;C:\Program Files\Microsoft Visual Studio .NET 2003\SDK\v1.1\include\
"C:\Program Files\VNI\CTT5.0\CTT5.0\include\IA32"
C:\Program Files\Intel\Fortran\Compiler80
C:\Program Files\Common Files\Intel\Licenses
C:\Program Files\VNI\CTT5.0\CTT5.0\LIB\IA32;%INTEL_FORTRAN80%\ia32\lib;C:\Program Files\Microsoft Visual Studio .NET 2003\SDK\v1.1\Lib\
imsl_dll.lib imslscalar_dll.lib imslblas_dll.lib
imsl_dll.lib imslscalar_dll.lib imslblas_dll.lib
/Qopenmp /F6000000 /fpp imsl_dll.lib imslsmp_dll.lib mkl_c_dll.lib /link /nodefaultlib:libc.lib
/Qopenmp /F6000000 /fpp imsl_dll.lib imslsmp_dll.lib mkl_c_dll.lib /link /nodefaultlib:libc.lib
imsl.lib imslscalar.lib imslblas.lib imsls_err.lib
/Qopenmp /F6000000 /fpp imsl.lib imslsmp.lib mkl_c_dll.lib imsls_err.lib /link /nodefaultlib:libc.lib
d:\Program\Intel\VTune\CGGlbCache;d:\Program\Intel\VTune\Analyzer\Bin;d:\Program\Intel\VTune\Shared\Bin;C:\Program Files\PC Connectivity Solution\;c:\program files\vni\ctt5.0\ctt5.0\lib\ia32;%systemroot%\system32;%systemroot%;%systemroot%\system32\wbem;c:\program files\ibm\trace facility;c:\program files\personal communications;c:\program files\ati technologies\ati control panel;c:\program files\common files\adaptec shared\system;c:\program files\ibm\trace facility\;c:\program files\intel\fortran\idb80\bin;%intel_fortran80%\ia32\bin;c:\program files\host integration server\system;c:\program files\ibm\personal communications\;c:\progra~1\ca\shared~1\scanen~1;c:\program files\ca\sharedcomponents\scanengine;c:\program files\ca\sharedcomponents\caupdate\;c:\program files\ca\sharedcomponents\thirdparty\;c:\program files\ca\sharedcomponents\subscriptionlicense\;c:\progra~1\ca\etrust~1;C:\Program Files\MATLAB\R2007a\bin;C:\Program Files\MATLAB\R2007a\bin\win32;C:\Program Files\Common Files\Roxio Shared\DLLShared\;C:\Program Files\Microsoft Visual Studio .NET 2003\Common7\IDE

C:\Program Files\VNI\CTT5.0\CTT5.0\..
C:\Program Files\VNI\CTT5.0\CTT5.0\BIN\IA32

Intel Fortran 10.1 and VS. Net 2005
Manually add this to SYSTEM VARIABLE -> Path from Control Panel

C:\Program files\MPICH2\bin;
C:\Program Files\Common Files\Intel\Shared Files\Ia32\Bin;
D:\Program Files\Microsoft Visual Studio 8\Common7\IDE;
D:\Program Files\Microsoft Visual Studio 8\VC\BIN;
D:\Program Files\Microsoft Visual Studio 8\Common7\Tools;
D:\Program Files\Microsoft Visual Studio 8\Common7\Tools\bin;
D:\Program Files\Microsoft Visual Studio 8\VC\PlatformSDK\bin;

Manually add this to SYSTEM VARIABLE -> Lib from Control Panel
C:\Program files\MPICH2\LIB;%IFORT_COMPILER10%Ia32\Lib;%MSVS8%\VC\atlmfc\lib;%MSVS8%\VC\lib;%MSVS8%\VC\PlatformSDK\lib;%FNL_DIR%\IA32\lib;

Intel Fortran 11.0
1. New: Floating Point Model, some are not compatible with Floating Point Speculation
2. New: OpenMP 3.0 standard included
3. New: Fortran 2003 features included
4. Some functions may fail -> use macro like CBAEXPMODTEST=1 to mark out certain things.
5. See Fortran User / Ref Guide -> Building Apps -> Using Libraries -> Using IMSL
6. IMSL Readme.txt -> KAPMR does not behave in thread safe manner.
        Use OpenMP critical region around KAPMR to be safe.

Managed Code
Mixed-Language Programming and Intel Visual Fortran Project Types
This version of Intel Visual Fortran produces only unmanaged code, which is architecture-specific
code. You cannot create an Intel Visual Fortran main program that directly calls a
subprogram implementing managed code. To call managed code, you can call an unmanaged
code subprogram in a different language that does support calling managed code.

Blas is implemented by IMSL - details are found in Chapter 9: Basic Matrix/Vector Operations.
Blas is also implemented by the hardware vendor - in this case Intel - in Intel's MKL library,
which may be written in machine code.

The BLAS API, i.e. the calling convention of the routines, are the same whether they are
implemented by MKL or IMSL. For example, SDOT is the routine that finds the dot product of two

To use different implementation, the program has to link with different libraries.
For IMSL: imslblas_dll.dll
For MKL: mkl_p4.dll

By default, when using link_f90_dll.h, it include's IMSL's BLAS (see section "Setup IMSL")
By default, when using link_f90_dll_smp.h, it include's MKL's BLAS (see section "Setup IMSL")

If we want to use MKL without the SMP (parallel processing) feature, then instead of using
link_f90_dll.h, we have to manually add the directives and point to the correct BLAS, eg:

!dec$objcomment lib:'imsl_dll.lib'
!dec$objcomment lib:'imslscalar_dll.lib'
!dec$objcomment lib:'mkl_ia32.lib'

The DLL (*.dll) can be placed anywhere the system knows of, eg:
c:\windows\system32\ mkl_def.dll, mkl_p3.dll, mkl_p4.dll
(IMSL provides these 3 dlls from the MKL package)

The mkl_ia32.lib contain STATIC INTERFACES to dlls including BLAS, cblas, FFTs, VML.
However, there is no corresponding single mkl_ia32.dll. Instead it is spread over a few DLLs,
such as mkl_def.dll, mkl_vml_def.dll, mkl_lapack32.dll, etc.

If a function (eg vsinv from VML package of MKL) is included in the library mkl_ia32.lib,
but the dll does not exist, then the code WILL COMPILE. But during runtime, a fatal error
would occur because it cannot find and use the dll.

NOTE: the IMSL dlls and libs are installed in
C:\Program Files\VNI\CTT5.0\CTT5.0\lib\IA32

Building DLLs (Fortran DLLs used in Fortran apps)

When a DLL is built the output are two files:
1) *.dll - has the library's executable code
2) *.lib - the import library providing interface between the program and the dll.

The notes here presents two cases:
Case A: DLL to be created in its own separate VS solution, called solnA, in project projA.
        The two generated output will be projA.dll and projA.lib
Case B: DLL to be created in a project (projB) in the same solution (solnB) , as the
        application project (projC).
(The application project contains the code that uses the DLL.)
The two generated output will be projB.dll and projB.lib

1. Build DLL project in its own solution
- Say we call this Solution solnFoo, and Project projFoo

Case A:
- From VS.Net - in new solution, create a new DLL project by:
  File -> New -> Project -> Intel Fortran projects -> Dynamic link library

Case B:
- From VS.Net - in existing solution, create a new DLL project by:
  File -> New -> Project -> Intel Fortran projects -> Dynamic link library

2. Write a subroutine and expose it, eg:
subroutine hello()
    (do blah blah)
end subroutine hello

- put this subroutine by itself into a file (eg. hello.f90) or into
a module (eg hello_mod.f90)

- DLLEXPORT needed to expose the name of the routine
- alias is needed for compatibility with Intel Fortran and VS.NET environment

3. Build the DLL in VS.NET by:
- Build (menu) -> Build or Build Solution
- Copy the *.lib and *.dll files and put them into same directory as the
executable code for the application; i.e. same directory as projC.exe

4. Link the DLL via the lib file by:
- Go to the application project "program" file or "module" file and put this near the
start of the file:
CASE A:       !dec$objcomment lib:'projA.lib'
CASE B:       !dec$objcomment lib:'projB.lib'

CASE B only:
- Ensure that the dependencies eg projB is UNchecked in the Project Dependency dialog box of ProjC

- in the solution explorer in VS.NET, click on the application's
project name, eg projC.
- From the Project menu or right clicking on the project, go to "Add existing item ..."
- Browse and choose "projB.lib" to add. The lib file should appear under solution explorer.
- From the Project menu or right clicking on the project, go to "Project Dependencies..."
- Alternative to the "Add Existing item..." way is to specify through the linker by:
  with the project name highlighted, go to Project menu -> Properties -> Linker
  -> "Additional Library Directories" -> type in dir path where *.lib is located.

5. Add interface to DLL routine in the application.
- goto into the subroutine of the application and add the following:

module projC_app
    subroutine app()
            subroutine hello()
            !DEC$ ATTRIBUTES DLLEXPORT, STDCALL, ALIAS:'_hello' :: hello
            end subroutine hello
        end interface
     end subroutine
end module

- DO NOT ADD the interface on the top level, eg DO NOT add in the starting part of a module. Instead
add the interface inside the module's subroutine that makes the call to the DLL routine.

- compile and run. Ensure that building mode is RELEASE, not DEBUG.

!DEC$ ATTRIBUTES directives
1. C vs STDCALL - for controlling the stack of passed variables.
- both of these will try to make variables pass by value, rather than the Fortran default of
passing by reference.
- arrays are always passed by reference
- C -> the calling routine controls the stack. larger code.
- C -> possible to call variable number of arguments, MUST use "C, VARYING" to let
  Fortran know that multiple arguments are called.
- C -> is default in C/C++ code. to use with fortran code, either
  i) change the c code to STDCALL; or
     extern void __stdcall foo_f90(int n);
  ii) change the f90 code to use C convention
     !DEC$ ATTRIBUTES C :: foo_f90
- STDCALL -> the called routine controls the stack.

- for fortran, C or STDCALL will change default to passing by value, except arrays which will
  be passed by reference
- But, each argument of the subroutine can be declared with VALUE or REFERENCE to override the
  default mode, eg:
     subroutine foo(a, b)

Passing module variables and functions in DLL
Consider passing the variable 'foo' and calling method fooA() defined in a module 'mod_foo'

1. Expose the variable foo and fooA()

Do not use ALIAS.

2. Build and Copy the following files from the DLL build directory to the application directory.
mod_foo.dll, mod_foo.lib, mod_foo.mod

3. In the application that uses 'foo', add the statement:
use mod_foo

This technique is only useful when both application and DLL are written in Fortran. The variable
names will have leading underscore "_". This is transparen to the user who uses "use mod_foo".
Such DLL are not convenient for DLLs that are to be used with other languages because of the leading
underscore on variable names.

Best Practice
1. For optimized code:
- use /fast
- use "Configuration Properties -> Fortran -> Optimization -> Require Intel Processor Extension

2. To check for stack overflow
- /Qfpstkchk
- /Ge, /Gsn, /Fn

3. Fortran DLL structure
- Put constant data into a module, say mod_consts, and expose to data as:
  Note: do not ALIAS
- Put subroutines into another module, say mod_funcs and expose data:
use mod_consts
subroutine blah()
  Note: use alias so it is accessible outside
- Construct interface modules for application:
        module interface_mod
   use mod_consts
subroutine blah()
- Include interface in the application
         use interface_mod

This technique allows other Fortran projects to make use of both data and functions in DLLs.
However, other languages will not be able to make use of the data directly (may need to have underscore
for variable names in the other languages calling this Fortran DLL).

Errors - Debugging

General Sources:
"List of Run-Time Error Messages", Intel Visual Fortran compiler doc
- from Building Applications -> Error Handling -> Handling Run Time Errors ->

Cryptic LNK errors
1. When using a function from another place, eg DLL, etc; ensure that an "interface" block is written
for at the code which calls the function.
2. Ensure the library path is defined. Eg. In VS.Net -> right click project -> Properties -> Linker
-> General ->  "Additional Library Directories"

Access Violation
1. Passing integer*4 into a subroutine with parameter declared as integer*8
2. Subroutine A in a module is DLL exported. Another subroutine within the same project uses subroutine A from another module WILL cause a CONFLICT. Since it is being used within the same project, subroutine A need a wrapper which is NOT DLL exported. This wrapper can be called by other module subroutines within the same project.
3. When an ARRAY of derived type contains components which are also derived types, then it must be declared with fixed size (i.e. hardcode dimension) or the variable must be a dynamic array (i.e. declared ALLOCATABLE). It cannot be declared with size specified by a parameter.
function foo(a, b)
  real :: NestedDerivedTypeA(4)                ! GOOD
  real, allocatable :: NestedDerivedTypeB(:)   ! GOOD
  real :: NestedDerivedTypeA(b)                ! BAD
4. Crash pointing to problem with allocatable arrays which are used in OpenMP region. Message: "Subscript #x of the array has value xxxx which is greater than the upper bound of ..."
Reason: Known bug in Intel Fortran Compiler that occurs when code compiled using the /check:pointer option (under the Runtime category in project properties).

Derived Data Type - Nested
1. Complicated derived data types that involves nested derived types will not be able to be displayed in the debuger / variable watch space. The displayed numbers are grossly in error.

DLL not exposed properly -
When calling a function in a dll, but that function has not been exposed, then the following error may occur:
"The procedure entry point ..... could not be located in the dynamic link library ....dll"

VSL/MKL errors
MKL ERROR : Parameter 2 was incorrect on entry to vslNewStre
using MKL, VSL, VML routines from intel, and having directives like:    
!dec$objcomment lib:'mkl_c_dll.lib'
    !dec$objcomment lib:'mkl_ia32.lib'
   are missing the path to the ...mkl\ia32\lib
In VS.Net, within the dll/console project that uses them, add the path to the library files in:
Project -> Properties -> Linker -> General -> Additional Library Directories

IMSL Errors
Error: There is no matching specific subroutine for this generic subroutine call.
   IMSL documentation shows Fortran90 version with D_RNCHI, but unless using somehow, still obeying
Fortran77. So use Fortran77 name which is DRNCHI.
        Instead of using Fortran90 style -> D_RNCHI
we use -> DRNCHI

ThreadChecker Errors:
Problem Description:We recently received several problem reports. If the size of user's application is extreme big, the user complained that the application (launched by Thread Profiler) ran slowly.
Cause:Thread Profiler's engine uses 600MB (default) in the heap. If the application also needs to consume higher memory space in the heap and the user works on lower hardware (memory) configuration, it causes this problem
Resolution:  Use Configure -> Modify -> "Execution" tab -> "Limit the size of the heap of the heap used by the analysis engine to [ ] MB", adjust to smaller number. Note when Thread Checker reaches the memory limit, it may discard older statistics, causing some loss of results.

How to Add Version and other Metadata to DLL or EXE
Assume platform is Intel Fortran 8.1 and VS.Net 2003, but may apply to later versions too.
1. Go to Solutions Explorer and right click on the project name.
2. Choose Add New Item. In the Add New Item dialog, choose resource. A resourceX.rc will be created in the "Resource Files" folder directly under the project directory. Perhaps if this file already exist, we can skip to the next step.
3. Double click to open the resourceX.rc file.
4. In the resourceX.rc file, right click on the name resourceX.rc and choose "Add resource..."
5. In the "Add Resource" dialog, choose Version.
6. Fill in the relevant versioning and metadata info that is required.
7. Then build the project.
8. Check by right-clicking on the dll or exe file.

Using VTune
To use VTune, the following needs to be set up:
1. From VS.NET -> Project -> Properties -> Linker -> Debug -> Generate Program Database File
.... ensure this pdb file is defined.
From VS.NET -> Project -> Properties -> Fortran -> Debugging -> Debug Information Format
.... Full(/Zi)

2. Put this "/FIXED:NO" in:
VS.NET -> Project -> Properties -> Linker -> Command Line -> Additional Options
.... this is to ensure that VTune's Call Graph can be used. This only applies to the executable project.

3. Application to Launch - select and app or driver/dll that is already running.
Call Graph - must specify application to Launch.
Sampling and Counter may select "No App to launch"

4. Counter Monitor - Intel recommend using this first.
- uses native Windows performance counters, eg. processor queue, memory, processor time
- Has the following info:
   - the Logged Data view
   - the Legend
   - the Summary view
   - the Data Table - click on Logged Data View first to access
- Two main monitors to check are:
   - %Processor Time: The closer to 100% the better. This is calculated by taking amount
     of time spent in the Idle thread and subtracting from 100%
   - System Processor Queue length - There is a single queue for processor time even on
     multiprocessor systems. This counter should be less than 2. It measures how many
threads are waiting to execute.
- Intel Tuning Advice - to get the advice, from the Logged Data View, highlight the
  section of the graph of interest. Then click on the Tuning Assistant button.
- Drill Down to Correlated Sampling Data View.
   - To use sampling data, need to collect sampling data when collecting counter data.

5. Sampling Mode
- Look at Samples or Events of CPU_CLK_UNHALTED.CORE --- CPU cycles when a core is active
This shows where most cpu cycles are used.

Event Code: Counted by fixed counter number 1
Category: Basic Performance Tuning Events;Multi-Core Events;
Definition: Core cycles when core is not halted.
Description: This event counts the number of core cycles while the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios.
In mobile systems the core frequency may change from time to time. For this reason this event may have a changing ratio with regards to time. In systems with a constant core frequency, this event can give you a measurement of the elapsed time while the core was not in halt state by dividing the event count by the core frequency.

Event Code: Counted by fixed counter number 0
Category: Basic Performance Tuning Events;
Definition: Instructions retired.
Description: This event counts the number of instructions that retire execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. The counter continues counting during hardware interrupts, traps, and inside interrupt handlers.

Clocks per Instructions Retired - CPI
Category: Basic Performance Tuning Ratios; Ratios for Tuning Assistant Advice;
Definition: High CPI indicates that instructions require more cycles to execute than they should. In this case there may be opportunities to modify your code to improve the efficiency with which instructions are executed within the processor. CPI can get as low as 0.25 cycles per instructions.

SAV = Sample After Value
This is the sampling frequency used for the sampling process. Typically it is 2,000,000.

Compiler Options
Default: /iface:nomixed_str_len_arg

Specifies the type of argument-passing conventions used for general arguments and for hidden-length character arguments.
Possible values are:
/iface:mixed_str_len_arg: The hidden lengths should be placed immediately after their corresponding character argument in the argument list, which is the method used by Microsoft* Fortran PowerStation.
/iface:nomixed_str_len_arg: The hidden lengths should be placed in sequential order at the end of the argument list. When porting mixed-language programs that pass character arguments, either this option must be specified correctly or the order of hidden length arguments changed in the source code.

See also Programming with Mixed Languages Overview and related sections.

Compiling - Diagnostics.
To perform diagnostics such as using Vtune, Thread Profiler or Thread Checker, some of these options may be needed:
/Zi - include symbols   = /debug:full
/Od - disable debugging  
/fixed:no - linked to make code relocatable
/MDd - to build with thread safe libraries =   /libs:dll /threads /dbglibs

Build Macros (eg $(OUTDIR))
See Intel Visual Fortran - User Guide - Volume I: - Building apps from MS Visual Studio.Net - Supported Build Macros.

   In the project properties - Linker - Output File, the value is "$(OUTDIR)/xxx.dll".
   The macro $(OUTDIR) has a value defined in:
       project properties - Output Directory
   Similarly $(INTDIR) is defined in
       project properties - Intermediate Directory

Using MKL
1. CBA desktop PC - Pentium 4 CPU 3.8GHz
   - from intel website:
       CPU No.: 670;   90 nm;   Cache: 2 MB L2;   Clock Speed: 3.80 GHz;  FSB: 800 MHz
  Hyperthreading, Enhanced SpeedStep, Intel64 (need Bios and OS), ExecuteBit Enabled

2. Installation Directories:
   - c:\Program Files\intel\mkl\8.1.1
   - tools\environment -> mklvarsXXX.bat to build environment variables.
   - 3 options: ia32, em64t, ia64; within these are dlls and libs files
   - for the ia32 option, ia32\bin contain:
mkl_lapack_YY.dll, mkl_XXX.dll, mkl_vml_XXX.dll, mkl_ias.dll, libguide40.dll    
YY = 32,664
XXX = def, p3,p4, p4p, p4m
   - for the ia32 option, ia32\lib contain:
mkl_X.lib, mkl_X_dll.lib, mkl_lapack.lib, mkl_solver.dll, mkl_ia32.lib, libguide40.lib, libguide.lib
X = c (for c), s (for Fortran)

3. Configuring to use MKL
- at installation time, say yes to add vars to PATH, LIB, INCLUDE.
- alternatively, run mklvars32.bat

4. Using Fortran95 BLAS or LAPACK
    - Need to build from Intels sources, go to mkl\8.1.1\interfaces\blas95,lapack95
- nmake PLAT=win32 lib -> a *.mod file will be created
- or go to INCLUDE directory and: ifort -c mkl_lapack|blas.f90
- Or to make it in the user's directory:
 1. copy mkl\8.1.1\interfaces\blas95,lapack95 into
 2. copy from INCLUDE to these files: mkl_lapack|blas.f90
 3. run in the blas,lapack directories: nmake PLAT=win32 INTERFACE=mkl_blas|lapack.f90 lib
for 64 bit
    - nmake can be found in C:\Program Files\Microsoft Visual Studio 8\VC\bin\
- from the Start Menu, open Intel Visual Fortran Build Environment using Intel 64.
- nmake PLAT=win32e lib
- mod files will be automatically copied to ..../em64t

5. Linking to library:
a) see "Linking your application with Intel MKL" in "Getting Started with the Intel Math
Kernel Library 8.1.1 for Windows" for reference.
b) In VS.Net, go to Project menu -> Properties -> Linker -> General -> Additional Library Directories
   and put:
C:\Program Files\Intel\MKL\8.1.1\ia32\lib

6. Errors
a) Compile error:
SortProj1  error LNK2019: unresolved external symbol _VSLNEWSTREAM referenced in function _MAIN__.L
   1. put the following in the code at start of module or program, NOT subroutine or function
    use MKL_VSL_TYPE
    use MKL_VSL
    !dec$objcomment lib:'mkl_c_dll.lib'
    !dec$objcomment lib:'mkl_ia32.lib'
   2. Could also be sometimes need DLLIMPORT rather than DLLEXPORT, especially in RELEASE version????
   3. If the function is a Fortran95 function, such as gemv, then the solution is to "call dgemv.." rather
      than "call gemv..."

b) Runtime error:
MKL ERROR: Parameter 2 was incorrect on entry to vslNewStre
   In VS.Net, go to Project menu -> Properties -> Linker -> General -> Additional Library Directories
   and put:
C:\Program Files\Intel\MKL\8.1.1\ia32\lib

7. Prerequisite Directories - these need to be put in Project -> Properties or command line or etc...
  1. Include Directories: C:\Program Files\Intel\MKL\8.1.1\include
  2. Library Directories: C:\Program Files\Intel\MKL\8.1.1\ia32\lib
  3. Put the following line in the start of one of the source code, before the program or module keyword.
 include ''    ! This is a full-fledged module by MKL
  4. Put the following at the start of a module or program, not within a function or subroutine
    use MKL_VSL_TYPE
    use MKL_VSL
    !dec$objcomment lib:'mkl_c_dll.lib'
    !dec$objcomment lib:'mkl_ia32.lib'    implicit none

Using LAPACK95 & General Comment on DLLs, LIBs, Mod files
   To illustrate the usage of Lapack functions with Fortran95 interface,
 suppose we want to use subroutine GESV
Fortran77 call: sgesv, dgesv, cgesv, zgesv
Fortran95 call: gesv

gesv is an Interface in mkl_lapack.f90(module MKL95_LAPACK)
gesv interface overloads wrappers like DGESV_MKL95, etc....

Only two items are needed by the user -> *.lib and *.mod

- not needed because we will be using explicit interfaces.
- Also F95 lapack routines have optional arguments which REQUIRE interfaces (eg gesv).

- mkl_lapack95.lib needed (created once off by administrator or first user)
- Use in the code as:
!dec$objcomment lib:'mkl_lapack95.lib'
    !dec$objcomment lib:'mkl_c_dll.lib'
    !dec$objcomment lib:'mkl_ia32.lib'  
- Don't need
!dec$objcomment lib:'mkl_lapack.lib'
- must be linked during compile time either
    i) ifort ..... mkl_lapack95.lib; or
ii) specify the path in "Additional Library Directories"

- mkl95_lapack.mod needed (created once off by administrator or first user from mkl_lapack.f90)
- contains the collection of interfaces to be used in the code by having:
- must be present during compile time in the directory path of either:
    i) same location as application source files.f90
ii) INCLUDE directories as specified in VS.Net as "Additional Include Directories"

Mixed language programming
Hi Clinton,
It looks like library format incompatibility problem. We adhere to microsft format.
Please follow following steps as a work-around ;
Once you generate .dll from intel FORTRAN compiler; follow the following steps,

1. D:\>pedump /exp MatlabFunctions.dll > MatlabFunctions.exp

D:\>notepad MatlabFunctions.exp (Edit this file and replace MATEXP with _MATEXP)

D:\>buildlib MatlabFunctions.exp MatlabFunctions.lib

D:\> lcc hello.c

D:\>lcclnk hello.obj MatlabFunctions.lib


Stack Checking
Checking and Setting Space
The following options perform checking and setting space for stacks (these options are supported on Windows only):

The /Gs0 option enables stack-checking for all functions.
The /Gsn option checks by default the stack space allocated for functions with more than 4KB.
The /Fn option sets the stack reserve amount for the program. The /Fn  option passes /stack:n to the linker.

Enable Vectorization and Report
To enable automatic vectorization, use these switches:
   /Qx...  or /Qax....
To enable report, use:

Enable OpenMP
1. To enable openMP;
  by Command line: /Qopenmp /Qfpp
  by  Project -> Properties -> Preprocessor -> OpenMP conditional compilation -> Yes
              Project -> Properties -> Preprocessor -> Preprocess source file -> Yes (/fpp)
              Project -> Properties -> Language -> Process OpenMP directives -> Generate Parallel code (/Qopenmp)

Note: preprocessor must be enabled for the OpenMP directives to be processed.

2. For diagnostic report:
   by Command line: /Qopenmp-report

3. Compile OpenMP but in sequential mode;
   by Command line: /Qopenmp-stubs

or to Compile for single thread, use the preprocessor /Qfpp, but not the OpenMP /Qopenmp.

4. DO NOT USE /Qprof-genx with OpenMP - spurious errors like array out of bounds will result.

5. To use OpenMP functions like, omp_get_num_threads(), instead of using
     include "omp_lib.h",
   better to use:
        external omp_get_num_threads
        integer omp_get_num_threads

Using Thread Profiler
1. Compiler options to enable Thread Profiling:
a) /Zi         - full debugging format
b) /fixed:no   - linker option to make code relocatable
c) /MDd        - option tells the linker to search for unresolved references
               in a multithreaded, debug, dynamic-link (DLL) run-time library.
               This is the same as specifying options /libs:dll /threads /dbglibs.
d) /Qopenmp-profile - enable profiling of OpenMP.
   WARNING: this option should not be used with IMSL since IMSL will link to libguide or libguide40, but
   this option creates code that will link to libguide_stats or libguide40_stats

Using Thread Checker

Add the following library path:
without this compiling error occurs stating that libassuret40.lib was not found.

Options for Thread Checker
/ZI, /Z7 (Fortran - General - Debug Information Format - Full)
/Od (Fortran - Optimization - Optimization - Disable)
/libs:dll /threads /dbglibs (Fortran - Libraries - Runtime Library - debug Multithreaded Dll)
/Qtcheck - to enable use by Thread Checker

To run Intel Thread Checker, run VTune first in a NEW project. When the VTune is finished analysing, then run thread checker from the SAME project, by running as a NEW Activity.

- ensure /Qtcheck is only on the EXECUTABLE, not other dlls.
- check working directory is correct.
- when EXECUTABLE has /Qtcheck, it cannot be run from console mode.

Profile Guided Optimization
This is a 3 step process:
1. Compile with /Qprof-gen. Using /Qprof-genx allows Code Coverage tool to be used.
   DO NOT USE /Qprof-genx WITH OPENMP.
   Note: For Code Coverage, new option is /Qprof-gen:srcpos

2. Run the code one or many times with different data sets.
   This will create .dyn files.
3. Compile with /Qprof-use. This uses the .dyn file created in step 2.
4. Usually specify /O2 in step 1, and more aggresive /Qipo in step 3.
5. Need the following:
C:\Program Files\Microsoft Visual Studio 8\Common7\IDE;
C:\Program Files\Microsoft Visual Studio 8\VC\BIN;
C:\Program Files\Microsoft Visual Studio 8\Common7\Tools;
C:\Program Files\Microsoft Visual Studio 8\Common7\Tools\bin;
C:\Program Files\Microsoft Visual Studio 8\VC\PlatformSDK\bin;
msvcr80d.dll -> C:\Program Files\Microsoft Visual Studio 8\VC\redist\Debug_NonRedist\x86\Microsoft.VC80.DebugCRT

Using code coverage
Ref: Intel_compiler_code-coverage.pdf

To use code coverage which is available for Intel compilers, the code needs to be prepared during compilation, then the application need to be run. The following is the general method.

1. Compile source code with /Qprof-gen:srcpos option.
By default, pgopti.spi, a static profile file is created. This name can be changed using the -prof-file option.
2. Run the application. This will create multiple dyn files for dynamic profile information.
3. Use the profmerge tool to merge dyn files into pgopti.dpi file.
     profmerge -prof_dpi
4. Run code coverage using both static and dynamic files
     codecov -spi -dpi
5. The results are published into CODE_COVERAGE.HTML

Note that these commands should run in the same directory as the source code and execution directory.

No comments: