Previous: grinfo Up: ../plot79.html Next: info
PROPOSAL FOR A STANDARD SET OF PRIMITIVES FOR MACHINE-INDEPENDENT BIT MANIPULATION IN FORTRAN by Nelson H.F. Beebe Departments of Physics and Chemistry University of Utah Salt Lake City, UT 84112 Tel: (801) 581-5254 INTRODUCTION ============ The 1979 NRCC Conference on Software Standards in Chemistry held at the University of Utah resulted in considerable discussion about standard ways of implementing bit manipulation in FORTRAN programs. A previous proposal by the author described in the "Programmer's Guide to Portable Software" [BEEB79a] met with the criticism that the routine names began with the letters BIT, even though some were INTEGER functions. Many programmers apparently prefer to use FORTRAN's default typing conventions based on the initial letter of variable names, even if this reduces mnemonic significance and does not provide for other than REAL and INTEGER data types, unless an IMPLICIT statement is supported by the host compiler. General agreement could apparently be reached if the routine name prefix was changed to IBT, but it should be noted that this still requires variable name typing for those routines which are LOGICAL functions. Several participants recalled that the Instrument Society of America (ISA), which establishes industrial standards of various sorts, and acts as an advisory body to the American National Standards Institute (ANSI), perhaps had established standard bit primitives for FORTRAN programming. This is indeed the case, and has been published as ISA Standard S61.1 "Industrial Computer System FORTRAN Procedures for Executive Functions, Process Input/Output, and Bit Manipulation". I feel strongly that ISA S61.1 is not an acceptable standard for the needs of chemistry-related programming. There are several objections which may be noted. First, names chosen for the bit primitives (IOR, IAND, NOT, IEOR, ISHFT, BTEST, IBSET, and IBCLR) and for obtaining the date and time of day (DATE and TIME) are very likely to conflict with names already implemented by various manufacturers in their extensions to the FORTRAN language. This is not a trivial problem, because some compilers expand intrinsic functions in-line, and overriding them by user-provided routines may not be possible, or may require the use of a special compiler option, or additional declaration statements in the routine which refers to them. The order and interpretation of arguments may differ from machine to machine, and could cause unexpected results which might be difficult to trace. Second, there is no consistent naming convention for the routines. The use of a mnemonic prefix such as BIT or IBT has proved invaluable in large program systems in that it allows groups of related routines to be readily identified, and also distinguishes the names from local names. Third, the bits are numbered from right to left starting with zero. There is no agreement among computer manufacturers about whether numbering should be 0123..., 123..., ...321, or ...3210, and the precedence of FORTRAN array subscripts following the order 123... strongly suggests that a left-to-right 123... numbering convention should be adhered to. This is particularly important for the routines which test and set bits, for which an extension to bit arrays contained in more than one word of storage is rather useful. Fourth, ISA S61.1 does not prescribe a complete interpretation of the arguments. Nothing is said about what the primitive will do when presented with an out-of-range bit number. This is an unsatisfactory situation in a standard, and is bound to cause portability problems. An alternative set of standard bit manipulation routines is presented in this proposal, and, I believe, remedies all of the above criticisms. Because the internal format of FORTRAN data types, and also of characters, differs from machine to machine, and indeed, even from compiler to compiler at a single installation, it should be clear that the use of bit primitives to access data immediately involves assumptions which will defeat the goal of portability of chemical software. It is nevertheless possible to implement certain types of bit manipulation in a machine-independent way, as is illustrated, for example, in the report "Integral File Data Compression" [BEEB79b]. In particular, those applications which require only bit strings can be implemented completely portably, once the number of bits in an integer storage unit is available. I propose for this purpose that the PORT Library Framework [FOX78a, FOX78b] be adopted as a standard facility for accessing machine- and operating-system parameters. This software is in the public domain, and at present, supports various operating systems for computers manufactured by: * Burroughs (1700/5700/6700/7700 series) * CDC(6000/7000/Cyber series) * Cray 1 * Data General (Eclipse S/200) * DEC (10/20/PDP-11/VAX) * Harris (Slash 6 and Slash 7) * Honeywell (600/6000 series) * IBM (360/370 series) * Interdata (8/32) * SEL (System 85/86) * UNIVAC (1100 series) * Xerox (Sigma 5/7/9 series) Parameters for others are very straightforward to add. The PORT functions I1MACH(5) and I1MACH(9), returning respectively the number of bits in an integer storage unit, and the largest positive integer, are sufficient to allow the data compression routines in [BEEB79b] to convert quantum chemistry integral and matrix element files containing REAL, DOUBLE PRECISION, and INTEGER data into compressed bit strings in which insignificant leading and trailing bits have been removed, offering a data reduction by up to a factor of five or six. Other applications which can be supported by these two PORT functions together with the bit primitives include: * Manipulation of bit representations of Slater determinants and excitation operators in configuration interaction, perturbation theory, coupled cluster, and propagator calculations, including conversion to and from human-readable integer representations. * Bit mapping used to represent patterns of non-zero elements in sparse matrices. * Patterns of dots in a plot to be output on a dot-matrix plotter. * Compressed LOGICAL arrays using only one bit per element, instead of an entire word. * Implementation of the standard set of character primitives on a given machine. * Masking operations. * Construction of byte strings for control of devices such as plotters. Other functions in the PORT Library Framework which provide such parameters as the machine precision, the base of integer and floating-point numbers, and floating-point exponents may largely obviate the need for bit manipulations which require knowledge of the internal format of such data.
THE BIT PRIMITIVES ================== A description of the complete set of bit primitives which are proposed here follows. An experienced assembly language programmer should be able to implement them in an afternoon's work on any existing computer presently used for chemical computations. An agency such as the Quantum Chemistry Program Exchange or the NRCC could act as a source of implementations of these routines for a variety of host computers and operating systems, thereby increasing their availability and hopefully encouraging their widespread adoption. It has proven useful at our computing installation to install these in the system FORTRAN library, so that they are automatically available to all FORTRAN programmers. If they are not made convenient to use, many less-motivated programmers will not take the trouble to use them, and the present undesirable situation of machine-dependencies permeating chemical software will only continue. In all of the bit primitive routines, K, L, and NBIT represent INTEGER variables; STRING(*) is an INTEGER array. Many compilers will generate correct code if STRING is an INTEGER scalar variable, but this use does not conform to Portable FORTRAN. Portability dictates that the arguments to these functions should be restricted to INTEGER data types; use with arguments of REAL, LOGICAL, COMPLEX, DOUBLE PRECISION, or Hollerith (character) data types may introduce machine-dependence, and should be strongly discouraged. It is considered good programming practice to treat FUNCTION arguments as read-only values, in order to avoid side effects. This convention is adhered to in the definition of all of the FUNCTION primitives. In those higher-level primitives which deal with variable-length bit strings, rather than single-word bit patterns, the strings are defined in terms of three variables. These are the name of the INTEGER array containing the string, a starting position (numbering 1,2,3,... from the left), and the number of bits to be considered. Thus, an argument sequence STRING,LOC,LEN represents bits LOC, LOC+1, LOC+2, ..., LOC+LEN-1 stored in the array STRING(*). It is an error condition if LOC or LEN is less than 1, and the action to be taken in such a case will be expressly defined for each primitive. In some cases, two strings of the same length are present in the argument list, and the length parameter of the first will then be omitted.
INTEGER FUNCTION IBTAND (K,L) INTEGER FUNCTION IBTCMP (STRNGA,LOCA,STRNGB,LOCB,LENGTH) INTEGER FUNCTION IBTCOM (K) LOGICAL FUNCTION IBTEST (STRING,NBIT) SUBROUTINE IBTGET (BYTE,STRING,LOC,LENGTH) SUBROUTINE IBTMOV (TARGET,LOCTAR,SOURCE,LOCSRC,LENGTH) SUBROUTINE IBTOFF (STRING,NBIT) SUBROUTINE IBTON (STRING,NBIT) INTEGER FUNCTION IBTOR (K,L) SUBROUTINE IBTPUT (BYTE,STRING,LOC,LENGTH) INTEGER FUNCTION IBTROL (K,NBIT) INTEGER FUNCTION IBTROR (K,NBIT) INTEGER FUNCTION IBTROT (K,NBIT) INTEGER FUNCTION IBTSHF (K,NBIT) INTEGER FUNCTION IBTSHL (K,NBIT) INTEGER FUNCTION IBTSHR (K,NBIT) INTEGER FUNCTION IBTSUM (K) SUBROUTINE IBTSWP (STRNGA,LOCA,STRNGB,LOCB,LENGTH) INTEGER FUNCTION IBTXOR (K,L)
INTEGER FUNCTION IBTAND (K,L) Return the logical AND of K and L. This has 1-bits where both K and L have 1-bits, and 0-bits elsewhere.
INTEGER FUNCTION IBTCMP (STRNGA,LOCA,STRNGB,LOCB,LENGTH) Given two bit strings define by STRNGA,LOCA,LENGTH and STRNGB,LOCB,LENGTH, compare LENGTH bits of the two substrings, treating the substrings as UNSIGNED binary integers. Return -1, 0, or +1 for the conditions A < B, A = B, or A > B, respectively. If any of LOCA, LOCB, or LENGTH is less than 1, return 0.
INTEGER FUNCTION IBTCOM (K) Return the logical complement of the bit pattern stored in K. The complement is formed by inverting all bits.
LOGICAL FUNCTION IBTEST (STRING,NBIT) Test bit number NBIT in STRING(*), and return .TRUE. if it is a 1-bit, and .FALSE. if it is a 0-bit. If NBIT is less than 1, return .FALSE..
SUBROUTINE IBTGET (BYTE,STRING,LOC,LENGTH) Return in the INTEGER variable BYTE a right-adjusted bit string containing LENGTH bits extracted starting at position LOC in the STRING(*). Any leading bits in BYTE are set to zero. If LENGTH is larger than the word size, only the first I1MACH(5) bits of the selected substring will be returned in BYTE. If either LENGTH or LOC is less than 1, 0 will be returned in BYTE.
SUBROUTINE IBTMOV (TARGET,LOCTAR,SOURCE,LOCSRC,LENGTH) Given two bit strings defined by TARGET,LOCTAR,LENGTH and SOURCE,LOCSRC,LENGTH, move LENGTH bits from SOURCE into TARGET. If any of LOCTAR, LOCSRC, or LENGTH is less than 1, no bits are moved. Bits must be moved in order from left to right, equivalent to one bit at a time, in order to provide consistent behavior in case TARGET and SOURCE overlap.
SUBROUTINE IBTOFF (STRING,NBIT) Set bit number NBIT in STRING(*) to 0. If NBIT < 1, no bit will be set.
SUBROUTINE IBTON (STRING,NBIT) Set bit number NBIT in STRING(*) to 1. If NBIT < 1, no bit will be set.
INTEGER FUNCTION IBTOR (K,L) Return the logical OR of the bit patterns present in K and L. 1-bits are returned in positions in which either K or L, or both, have a 1-bit.
SUBROUTINE IBTPUT (BYTE,STRING,LOC,LENGTH) Store the right-most LENGTH bits of BYTE in the STRING(*), starting at bit position LOC. If either LENGTH or LOC is less than 1, no bits will be stored. If LENGTH is larger than the wordsize, only the first I1MACH(5) bits of BYTE will be stored at the designated position.
INTEGER FUNCTION IBTROL (K,NBIT) Return the bit pattern represented by K after rotating it left by |NBIT| bits. If NBIT is negative, ignore its sign.
INTEGER FUNCTION IBTROR (K,NBIT) Return the bit pattern represented by K after rotating it right by |NBIT| bits. If NBIT is negative, ignore its sign.
INTEGER FUNCTION IBTROT (K,NBIT) Return the bit pattern represented by K after rotating left (NBIT > 0) or right (NBIT < 0) by NBIT bits.
INTEGER FUNCTION IBTSHF (K,NBIT) Return the bit pattern represented by K after performing a logical shift left (NBIT > 0) or right (NBIT < 0) by NBIT bits. Bit positions vacated are filled by 0-bits. 1-bits shifted out of the word are lost.
INTEGER FUNCTION IBTSHL (K,NBIT) Return the bit pattern represented by K after performing a logical shift by |NBIT| bits to the left.
INTEGER FUNCTION IBTSHR (K,NBIT) Return the bit pattern represented by K after performing a logical shift by |NBIT| bits to the right.
INTEGER FUNCTION IBTSUM (K) Return the number of 1-bits in the bit pattern represented by K.
SUBROUTINE IBTSWP (STRNGA,LOCA,STRNGB,LOCB,LENGTH) Given two bit strings defined by STRNGA,LOCA,LENGTH and STRNGB,LOCB,LENGTH, swap the bit strings in memory. If any of LOCA, LOCB, or LENGTH is less than 1, no swap is performed.
INTEGER FUNCTION IBTXOR (K,L) Return the exclusive OR of the bit patterns represented by K and L. The result has 1-bits in positions where K and L have different bits, and 0-bits where K and L have identical bits.
CONCLUDING REMARKS ================== It may be of some cause for concern that the routines containing STRING(*) as an argument do not also contain its dimension, in order that an out-of-bounds storage reference can be avoided in the event of NBIT being too large. We have elected NOT to include this extra argument, because the same criticism may be raised for any FORTRAN array reference for which the compiler does not perform subscript range checking. Even when this feature is available, it is usually suppressed in production programs for reasons of run-time efficiency. I believe that the above routines provide a satisfactory set of bit primitives whose performance, apart from word-length variations, is the same on all machines, for both valid and invalid arguments. Arithmetic shift routines have been intentionally excluded, because their behaviour is inherently connected with the host number representation, and their use would then unnecessarily compromise portability. Careful programming utilizing I1MACH(5) to provide the number of bits in an INTEGER storage unit can eliminate all word-length dependencies from the executable code, and even from most of the dimensional limitations in the case of the array STRING(*) if its storage is suitably allocated at the top level of a program system. Just as it is becoming more widely understood that fixed dimensions for potential variable-sized arrays of standard data types should be restricted to the MAIN program, so should the dimension of a bit string disguised as an INTEGER array, since the required amount of storage is certain to vary from one computer to another. Applications will no doubt arise in which it will be convenient to permit Boolean operations to be extended to variable-length bit strings beginning at arbitrary offsets in data words. This presents sufficient extra complications, that a set of higher-level bit routines based on the basic primitives described above would be desirable. Development of such a collection will be left for later work.
REFERENCES ========== BEEB79a Beebe, N.H.F., "Programmer's Guide to Portable Software", Proceedings of the NRCC Conference of Software Standards in Chemistry (1979). BEEB79b Beebe, N.H.F., "Integral File Data Compression", Proceedings of the NRCC Conference of Software Standards in Chemistry (1979). FOX78a Fox, P.A., Hall, A.D., and Schryer, N.L., "The PORT Mathematical Subroutine Library", ACM Transactions on Mathematical Software 4, 109-126 (1978). FOX78b Fox, P.A., Hall, A.D., and Schryer, N.L., "Algorithm 528. Framework for a Portable Library", ACM Transactions on Mathematical Software 4, 177-188 (1978). ISA76 "Standard S61.1. Industrial Computer System FORTRAN Procedures for Executive Functions, Process Input/Output, and Bit Manipulation", Instrument Society of America, 400 Stanwix Street, Pittsburgh, PA 15222, February (1976).