Department of Mathematics - University of Utah
Home • Computing • Course Schedules • CSME • Current Positions • FAQ (Computing) • Forms • Graduate • High School • Lecture Videos • Mailbox Access • Math Biology • Math Education • Newsletter • People • Research • RTG Grants • Seminars •

Using the NAG Fortran libraries at the University of Utah Mathematics Department from C and C++

Last update: Tue Nov 19 18:03:58 2002

Comments, and reports of errata or bugs, are welcome via e-mail to the author, Nelson H. F. Beebe <beebe@math.utah.edu>. In your report, please supply the full document URL, and the title and Last update time stamp recorded near the top of the document.

This document discusses issues in calling Fortran code from C and C++ programs. However, before you start, please consider these possibly simpler alternatives:

writing new code entirely in C and C++, or
translating your existing Fortran code to C, using either the freely-available f2c translator, or the commercial Cobalt Blue translator,

and then using the NAG C Library so that your code avoids language mixing entirely.

If your Fortran code will never be modified again, and contains no I/O statements (for example, the EISPACK, LINPACK, and LAPACK libraries are I/O-free), then you may be satisfied with the translation to C. However, if you anticipate having to modify that code, you should look at the translated code carefully to decide whether or not you are introducing a future maintenance nightmare: machine-translated code is never as clean as good hand-written code could be, and if there were I/O statements, their translation requires additional support code that must be carried around.

If you unfamiliar with interlanguage calling issues, then you should definitely start by reading the NAG tutorial on the subject.

What follows is a summary of the major points that need to be considered by C and C++ programmers who wish to call routines written in Fortran.

Reader comments are invited, and may be communicated via e-mail to the author.

Why this discussion is necessary
In what language is the main program?
Array storage order
Array indexing
External names
Data structures
Data types: general issues
Data types: real floating-point
Data types: complex floating-point
Data types: integer
Data types: logical
Data types: character
Data types: nonstandard
I/O: language view
I/O: file handles
I/O: who manages I/O buffers
I/O: formatted (text) files
I/O: unformatted (binary) files
Exception handling
Argument passing conventions
Recurson

Why this discussion is necessary

Since there are several ISO Standards for Fortran, C, and C++ that have been published between 1966 and 1998, you might wonder why these Standards have not specified precise rules for interlanguage calling. Several such requests have certainly come before their respective ISO Committees, but nothing has been done, in fear of putting an unfair Standards-compliance burden on some vendors, of hindering future language development, and of opening up a Pandora's box of interlanguage compliance issues: if those three languages should interoperate, what about the several other programming languages covered by ISO Standards? It appears unlikely that these issues will ever be resolved, and so interlanguage calling is likely to remain an issue requiring benevolent vendor support, and tutorials like this one.

In what language is the main program?

The choice of language for your main program influences how you must compile and link your program: you will probably have to provide additional library names on the compiler command line so that it can find the runtime routines needed by the other language. These names are strongly compiler-dependent, and their locations are frequently nonstandard too. The easiest way to find out what they are is to compile and link a small test program in the subordinate language, supplying a compiler option that asks for verbose output of the program stages. This flag is frequently called -v, or sometimes, -#. On UNIX, you then look for -L and -l options in that verbose output.

Array storage order

Fortran arrays are stored in column order, with the first subscript varying most rapidly. C and C++ arrays are stored in row order, with the last subscript varying most rapidly. The C/C++ program may therefore need to transpose multidimensional arrays before and after the call to a Fortran routine.

Array indexing

Fortran arrays are normally indexed from 1, although other values are possible in Fortran 77 or later if the array has dimensions of the form of ranges, start:end. C/C++ arrays are always indexed from 0. Thus, for a vector v(1:n), a Fortran loop will begin DO 10 j = 1,n, while the corresponding C/C++ loop will begin for (k = 0; k < n; ++k). Fortran v(j) corresponds to C/C++ v[k], with k = j - 1.

Fortran is distinctly superior to C in its handling of multidimensional arrays, because Fortran routines can be passed dynamic array dimensions, while C functions cannot. The C programmer is either forced to live with static compile-time dimensions, or to put ugly, and highly-error-prone, multidimensional array subscripting code inline, or inside macros, or inside element access functions. C++ programmers would probably define an array class with suitable element access functions.

If C/C++ functions call Fortran routines, dynamic dimensioning is not an issue, but it may be a significant problem for C/C++ functions called from Fortran.

External names

Starting with Stuart Feldman's original AT&T UNIX System V f77 compiler (the first Fortran 77 compiler ever written on any operating system), Fortran compilers from most UNIX vendors transform Fortran external names by appending an underscore. The motivation for this was admirable: because Fortran and C/C++ have different argument passing conventions, different names for the same thing are required. Thus, both C/C++ and Fortran programmers could use library calls like fgetc(handle): the runtime library for C/C++ would have a routine fgetc, and the Fortran runtime library (which is written in C) would have one named fgetc_. Mixed-language programs could be created without confusion between the two versions of that library routine.

Regrettably, vendor-provided Fortran compilers on Hewlett-Packard HP-UX and IBM AIX do not follow this practice: they use the same external names for both Fortran and C/C++, with no trailing underscore. Not only does this complicate mixed-language programming, it is also a nuisance for software portability of mixed-language programs. To make matters even more complex, the GNU g77 compiler on these systems does supply the trailing underscore, unless it is suppressed with the -fno-underscoring compiler option.

One historical UNIX vendor, Ardent, later renamed Stardent after its merger with Stellar in the early 1990s, used yet another variant: Fortran fgetc() was mapped to the uppercase external name FGETC.

These variants are best handled by concealing the name mappings in C/C++ header files. Here is an example from a real program: the named routines are all written in Fortran, and this extract is from the C/C++ header file that defines the interface to them.

#if defined(ardent)
    /* Stardent (now defunct) uppercased
       Fortran names */
#define gjqf    GJQF
#define gjqfd   GJQFD
#define glqf    GLQF
#define glqfd   GLQFD
#define deps    DEPS
#define dgamma  DGAMMA
#define dpsi    DPSI
#define dpsum   DPSUM
#elif defined(_AIX) || defined(__hpux)
    /* IBM RS/6000 AIX and HP HP-UX use
       identical names in C and Fortran */
#else
    /* Everyone else adds a trailing underscore
       to Fortran names */
#define gjqf    gjqf_
#define gjqfd   gjqfd_
#define glqf    glqf_
#define glqfd   glqfd_
#define deps    deps_
#define dgamma  dgamma_
#define dpsi    dpsi_
#define dpsum   dpsum_
#endif

Data structures

Fortran (at least before the Fortran 90 Standard) has only two main data structures: scalars, and arrays. C and C++ have those too.

Fortran offers two other statements that provide control over storage order: common blocks, for global data, and equivalence, for local data.

While common blocks are essential for information hiding in library routines, modern Fortran code seldom uses them for communication at the user level, and C/C++ code should be able to remain ignorant of how they are implemented. Consult your Fortran compiler documentation for details. On many systems, you can make the correspondence like this:

Fortran:

      double precision a, b, c
      common /cshare/ a, b, c

C/C++ (drop the trailing underscore on cshare_ on HP and IBM systems):

struct
{
    double a;
    double b;
    double c;
}
cshare_;

equivalence statements were widely used in Fortran code to reduce memory requirements before 32-bit (and larger) address spaces became available, because memory storage was expensive (about US$1/byte in 1965, and a million times cheaper in 2000), and therefore a scarce resource. Modern Fortran code should rarely use the equivalence statement, and then only to get at the details of storage bits, much like the C/C++ union structure, which is how you would map that Fortran statement for interlanguage use.

Data types: general issues

We discuss below specific data types, and show their typical correspondence between Fortran and C/C++. However, you should never hard code assumptions about this data-type correspondence. Instead, you should always introduce new data types using the C/C++ typedef statement, or perhaps the preprocessor #define directive, to provide synonyms, and then use suitable type casts using your new type names when calling Fortran routines. Here is a short example of both approaches:

typedef double fortran_double_precision;
typedef int fortran_integer;

#if !defined(fortran_double_precision)
#define fortran_double_precision double
#endif

#if !defined(fortran_integer)
#define fortran_integer int
#endif

Although C++ strongly deprecates use of the C preprocessor, #defines offers one advantage over typedef: you can test whether a name is defined, allowing a user to override a definition, either by a prior definition in the code, or by a compile-time definition passed on the compiler command line.

In C and C++, a typedef only introduces a synonym, not a new type, so neither scheme is superior from the point of view of catching type errors at compile time.

Data types: real floating-point

Fortran has real floating-point data types real and double precision, which correspond exactly to C/C++ float and double. Prior to the 1989 C Standard, float scalar arguments were always promoted to double by the compiler, but for Standard-conforming code with proper function prototypes, this is no longer the case, and float now receives no special handling compared to other scalar data types.

Data types: complex floating-point

Fortran complex is equivalent to an array, or C/C++ structure, of two floating-point values, the real part, followed by the imaginary part. In C/C++, you probably want to access it via a structure type declared like this:

typedef struct
{
    float re;
    float im;
} fortran_complex;

Objects of this type can be assigned, passed as function arguments, and returned as function values. Had you used a two-element array instead, you would have lost assignment and function return of these objects.

Prior to Fortran 90, the language stupidly lacked a double precision complex data type, but almost all compiler vendors provided it. Most followed IBM in allowing it to be declared as complex*16, and most also permitted it to be called double complex. You can best represent this in C/C++ as

typedef struct
{
    double re;
    double im;
} fortran_double_complex;

The NAG tutorial notes that several C/C++ compilers are unable to handle Fortran functions returning double complex values. If you have such functions, you should provide a Fortran subroutine wrapper for them that provides the function result in an argument.

Data types: integer

Fortran has only one integer data type integer. On all current UNIX architectures, this corresponds to the C/C++ data type int. However, on some personal computer operating systems based on the Intel x86 architecture, it may correspond to C/C++ data type long. Further confusing the matter is that some Fortran compilers on those systems may map Fortran integer to a C/C++ int of size 16 bits. The only way to tell for sure on such systems is to read your Fortran and C/C++ compiler documentation carefully, or resort to compilation experiments with small test programs.

Data types: logical

The Fortran logical data type is a definite barrier to interlanguage calling.

The reason is that the Fortran language requires that data of types integer, logical, and real occupy one storage `location' (in 1956, when Fortran was first defined, all computers were word-addressed; byte addressing did not appear until IBM's System/360 in 1964). Data of types complex and double precision each occupy two successive storage locations, exactly twice as much as the other three types.

Since a logical value holds only two distinct values, .true. and .false., a single bit of storage is sufficient, yet the Fortran Standards mandate that such a value occupies an entire word of storage. So, which bit in that word should be used? Some compilers use the sign bit, others use the least-significant bit (corresponding to odd/even), and still others use zero/nonzero. Because C and C++ also use zero/nonzero, you can expect Fortran compilers on UNIX systems to uniformly follow that practice. Even then, there are differences: GNU, HP, IBM, NAG, SGI, and Sun Fortran 77, 90, 95, and HPF compilers use 1 for .true., and 0 for .false., while Compaq/DEC and PGI compilers use -1 for .true. and 0 for .false.. Thus, on Compaq/DEC OSF/1 (now called Tru64) and GNU/Linux systems, both forms are found, depending on which compiler you use.

Data types: character

Fortran 77 (and later) character data pose the biggest barrier to interlanguage calling, because they are handled so differently by various compilers. The original AT&T UNIX f77 compiler had to deal with legacy Fortran code containing Hollerith data. In order to make call foo(5Hhello) work exactly like call foo('hello'), it passed Fortran character data by the address of the first byte.

Unfortunately, the Fortran 77 Standard made character data unlike all other Fortran data types, in that it magically carries around its length.

Feldman's compiler handled this by passing additional arguments at the end of the argument list, one for each character string, passing them by value. Thus, Fortran call bar('one', 'two', 'three') would be handled in C and C++ by void bar_(char *a, char *b, char* c, int lena, int lenb, int lenc). This was a perfectly sensible solution, in that it handled both Hollerith and character data, and communicated the needed string lengths between the Fortran and C/C++ routines.

Unfortunately, IBM's AIX/370 mainframe compilers did not follow this sensible practice. Instead, they interspersed the length arguments with the normal arguments, following each character argument, and passing the address, rather than the value, so the C/C++ routine in the previous paragraph must be rewritten as void bar_(char *a, int *lena, char *b, int *lenb, char* c, int *lenc). Hewlett-Packard also did this up to HP-UX version 8, but with version 9 and later (10.20 is current), changed to the AT&T style for character arguments. IBM's RS/6000 AIX C and C++ compilers also use the AT&T style for character arguments.

Still other Fortran compilers have used a different scheme. For each character argument, they pass a pointer to a structure that contains a pointer to the string, and a maximum length. The details of this scheme vary between compilers, so once again, you must consult your Fortran compiler documentation for details.

Finally, it should be remembered that Fortran character data are of fixed length: they are blank padded on the right when assigned a shorter value, and silently truncated on the right when assigned a longer value. C/C++ char strings are of varying length, up to some compile-time or run-time maximum; a trailing NUL character ('\0') terminates the string. Thus, C/C++ char* strings always contain at least one more character than their length (as returned by strlen()). C/C++ strings include the empty string, "", but Fortran does not allow one. This is somewhat akin to defining an integer arithmetic system without a zero! Fortran programmers are therefore forced to simulate empty strings by blank ones.

The best way to handle passing character strings to Fortran from C/C++ is to define a new Fortran string data type in C/C++, and create a set of primitives to handle the blank padding.

Data types: nonstandard

Nonstandard data types with a byte-length modifier frequently appear in carelessly-written Fortran code: avoid them like the plague. Write the Standard double precision instead of real*8. If you unavoidably have integer*1 or byte, these may map to C/C++ signed char and unsigned char. Fortran integer*2 may map to C/C++ short int. Fortran integer*8 may map to C/C++ long long int.

In the other direction, there are no Fortran equivalents of the unsigned types of C and C++, or of integer bitfields inside struct and union.

I/O: language view

Fortran views files as data streams containing Fortran records, where a record is either a text line for formatted files, or whatever is written by a single write statement for binary unformatted files. The record is an identifiable object, so that a read statement with an empty I/O list will skip one record, and a backspace statement can successfully move backwards over it. The analogy with magnetic tapes, card readers, and line printers is very strong.

C and C++ view files as data streams containing bytes , the smallest amount of storage capable of holding one item of type char. All I/O is done at the byte level, although for text files, higher-level primitives can give the illusion of block- and line-structuring. This model is notably more powerful than Fortran's, because it imposes no structure on files. In Standard Fortran, you simply cannot write an arbitrary stream of bytes to a file: there will always be additional material surrounding what you wrote that is compiler-dependent, beyond your control, and invisible to you in Fortran.

I/O: file handles

Fortran, C, and C++ all refer to files through a small object, called a file handle, or file descriptor, or in Fortran terminology, unit number. In Fortran, that value is a small integer that must be set by the user. Consequently, its choice can lead to a loss of portability if integer handle values acceptable on one system are found to be out-of-range on another.

The architect of C therefore chose to have the handle returned by the open-file system call. Later, it was found convenient to store more information about the file in a FILE object, invariably defined as a C struct, one of whose elements is the integer file handle. That form was adopted in Standard C, and the older one was not. However, UNIX systems at least, still have low-level system calls that require the integer file handle, so a macro, or function, fileno(), is provided to extract it from the FILE structure.

Since the Fortran library is implemented in C or C++ on UNIX systems, there has to be a correspondence, somewhere , between a Fortran unit number and a C/C++ integer file handle. Unfortunately, there is no consistent way to find this across platforms. For example, Compaq/DEC and Sun provide getfd() to map a Fortran unit number to the file handle, but HP and IBM hide the relation entirely.

These considerations strongly suggest that you should restrict I/O activity to a single language.

I/O: who manages I/O buffers

In general, the runtime library for the language in which a file was first opened has control over its I/O buffers, and maintains additional state information about the file. You should not reference the file in the other language.

However, in UNIX and IBM PC DOS, all processes start with at least three standard files already open and ready to use. In UNIX, these are called stderr, stdin, and stdout, and their respective file handles are guaranteed to be 0, 1, and 2. It is quite possible that you will need to refer to these standard files from both languages, even though it is always best to restrict their use to just one language. To avoid confusion from I/O buffering in each runtime library, it is best to force those buffers to be emptied before beginning I/O in either language. Some, but not all, Fortran vendors provide a flush() routine, and C/C++ always have fflush() available. Thus, you will never achieve portable behavior if you do this.

I/O: formatted (text) files

Fortran views formatted (text) files as a series of records, each of which corresponds to a single line. C and C++ view them as byte streams. While all three languages can produce such files, you may have trouble communicating those files between Fortran and C/C++ programs, for at least these reasons:

Fortran will asterisk-fill formatted fields that are too small, while C/C++ will expand them to fit the data; in either direction, there will be confusion and input errors.
Fortran floating-point output may contain D exponents or omitted exponent letters, and neither of these are recognized by C/C++ input routines. You can avoid this second problem by using E-style format items with explicit exponent lengths: that is, use e25.15e3 instead of d25.15. On Cray systems, which have a wider exponent range, you should increase the exponent width from 3 to 4.
Fortran list-directed and namelist I/O have no counterparts in C/C++, and files with such contents will be difficult to impossible to deal with easily in C/C++ programs.

I/O: unformatted (binary) files

Because of the record structure discussed earlier, Fortran unformatted (binary) files must contain additional material prefixing and suffixing the record. This material is compiler-specific, and you cannot even expect to read binary files on the same system when two different Fortran compilers have been used for the reading and writing programs.

It will be very difficult to deal with such files in C/C++ programs, and you are likely to have difficulty in even finding vendor documentation of what unformatted Fortran files look like.

Exception handling

When runtime exceptions, notably floating-point ones, occur, which language handles them? In general, they are handled by the language in which the main program was written. You can confuse this issue, however, by using system-specific calls to supply your own error handlers.

Historically, most Fortran runtime libraries provided fixups for numerical exceptions, flushing underflows to zero, and setting overflows to the largest floating-point number, or Infinity if supported (as in IEEE 754 arithmetic). The practice in C and C++ implementations has been to call an error handler which prints a message and terminates. You may have to compile with special options, or call nonstandard library routines, to control this behavior.

Argument passing conventions

Fortran passes all arguments by reference (by address), while C and C++ pass scalars by value and structures by value, and arrays by reference. Thus, scalar arguments to Fortran routines will require an ampersand prefix to pass their address instead of their value.

Fortran character data require special treatment, as discussed in an earlier section.

Historically, compiler writers have used at least three different mechanisms for argument passing:

On the IBM System/360 mainframes introduced in 1964, stack instructions were absent, and compilers generally constructed a vector of addresses, with the high-order bit set in the last address to mark the end of the list. Since addresses were limited to 24 bits, and words were 32 bits, the high-order byte of each word holding an address was `wasted', and software architects therefore made use of it, for type flags, and end-of-list markers.

In 1981, when the S/360 architecture was extended to support larger address spaces, and renamed S/370-XA (for eXtended Architecture), that flag bit was so ingrained in existing software that the IBM architects could only extend addressing to 31 bits. [There was also an important loop instruction that assumed signed arithmetic on addresses, again limiting them to 31 bits.]

It was not until 1988 that the Enterprise Systems Architecture, ESA/370, got around that problem, extending addressing to 44 bits, but even then, that argument list flag bit still interferes, and extended addressing requires a complicated remapping of 2GB (31-bit address) memory segments with hidden base registers.

You can read more about this topic in a separate (lengthy) document, The Impact of Memory and Architecture on Computer Performance.
The DEC VAX architecture also used an argument vector, but stored a separate argument count, thereby preserving all 32 bits of each word for addressing. Importantly, the VAX was the first to specify an architecture-defined calling sequence for all languages, making it relatively easy to mix languages on VAX (Open)VMS and VAX UNIX.
On stack architectures, on which all current personal computer and UNIX workstation systems run, argument lists are created on the stack by pushing one argument after another. Some compilers push from first to last, and others, last to first.

On all UNIX systems, the argument order is first to last, and interlanguage calling is relatively feasible.

On personal computer operating systems on the Intel x86 architecture, however, each compiler and assembler is free to choose its own argument-passing scheme, with the result that it is usually impossible to mix object code compiled with different compilers, even for the same language!
As the performance gap between memory and registers grew with computer systems in the 1990s, it became desirable to avoid storing arguments in memory, because that typically is 30 to 50 times slower than registers on machines of the late 1990s. UNIX vendors on RISC systems therefore began to store arguments in a specified set of registers, spilling to memory for long argument lists. Fortunately, this was done the same way for all languages, so interlanguage calling remained possible. However, it did complicate programs that have variable numbers of arguments, like C's printf() and scanf() family. The 1989 C Standard addressed this by introducing the <stdarg.h> header file, with va_start(), va_arg(), and va_end() access macros to hide the nasty details of when arguments move from registers to a vector in memory. Since Fortran has never standardly supported routines with a variable number of arguments, this aspect should rarely be of concern in the C/C++-to-Fortran interface.

Recursion

Like virtually all modern languages defined after the mid 1960s, C and C++ fully support recursion. Fortran 77 does not. Fortran 90 and 95 permit it only if the functions and subroutines involved are declared with an initial recursive option. This is appallingly-bad language design, and you are advised to avoid recursive use of Fortran code, unless you know that you will always have compilers for Fortran 90 or later available to compile your code, and you make careful use of the recursive option.

Dept Info • Outreach • College of Science • Newsletter

Department of Mathematics
University of Utah
155 South 1400 East, JWB 233
Salt Lake City, Utah 84112-0090
Tel: 801 581 6851, Fax: 801 581 4148
Webmaster

Using the NAG Fortran libraries at the University of Utah Mathematics Department from C and C++

Table of contents