Previous: grinfo Up: ../plot79.html Next: info
PROPOSAL FOR A STANDARD SET OF PRIMITIVES
FOR MACHINE-INDEPENDENT
BIT MANIPULATION IN FORTRAN
by
Nelson H.F. Beebe
Departments of Physics and Chemistry
University of Utah
Salt Lake City, UT 84112
Tel: (801) 581-5254
INTRODUCTION
============
The 1979 NRCC Conference on Software Standards in
Chemistry held at the University of Utah resulted in
considerable discussion about standard ways of implementing
bit manipulation in FORTRAN programs. A previous proposal
by the author described in the "Programmer's Guide to
Portable Software" [BEEB79a] met with the criticism that the
routine names began with the letters BIT, even though some
were INTEGER functions. Many programmers apparently prefer
to use FORTRAN's default typing conventions based on the
initial letter of variable names, even if this reduces
mnemonic significance and does not provide for other than
REAL and INTEGER data types, unless an IMPLICIT statement is
supported by the host compiler. General agreement could
apparently be reached if the routine name prefix was changed
to IBT, but it should be noted that this still requires
variable name typing for those routines which are LOGICAL
functions.
Several participants recalled that the Instrument
Society of America (ISA), which establishes industrial
standards of various sorts, and acts as an advisory body to
the American National Standards Institute (ANSI), perhaps
had established standard bit primitives for FORTRAN
programming. This is indeed the case, and has been
published as ISA Standard S61.1 "Industrial Computer System
FORTRAN Procedures for Executive Functions, Process
Input/Output, and Bit Manipulation".
I feel strongly that ISA S61.1 is not an acceptable
standard for the needs of chemistry-related programming.
There are several objections which may be noted.
First, names chosen for the bit primitives (IOR,
IAND, NOT, IEOR, ISHFT, BTEST, IBSET, and IBCLR) and for
obtaining the date and time of day (DATE and TIME) are very
likely to conflict with names already implemented by various
manufacturers in their extensions to the FORTRAN language.
This is not a trivial problem, because some compilers expand
intrinsic functions in-line, and overriding them by
user-provided routines may not be possible, or may require
the use of a special compiler option, or additional
declaration statements in the routine which refers to them.
The order and interpretation of arguments may differ from
machine to machine, and could cause unexpected results which
might be difficult to trace.
Second, there is no consistent naming convention
for the routines. The use of a mnemonic prefix such as BIT
or IBT has proved invaluable in large program systems in
that it allows groups of related routines to be readily
identified, and also distinguishes the names from local
names.
Third, the bits are numbered from right to left
starting with zero. There is no agreement among computer
manufacturers about whether numbering should be 0123...,
123..., ...321, or ...3210, and the precedence of FORTRAN
array subscripts following the order 123... strongly
suggests that a left-to-right 123... numbering convention
should be adhered to. This is particularly important for
the routines which test and set bits, for which an extension
to bit arrays contained in more than one word of storage is
rather useful.
Fourth, ISA S61.1 does not prescribe a complete
interpretation of the arguments. Nothing is said about what
the primitive will do when presented with an out-of-range
bit number. This is an unsatisfactory situation in a
standard, and is bound to cause portability problems.
An alternative set of standard bit manipulation
routines is presented in this proposal, and, I believe,
remedies all of the above criticisms.
Because the internal format of FORTRAN data types,
and also of characters, differs from machine to machine, and
indeed, even from compiler to compiler at a single
installation, it should be clear that the use of bit
primitives to access data immediately involves assumptions
which will defeat the goal of portability of chemical
software.
It is nevertheless possible to implement certain
types of bit manipulation in a machine-independent way, as
is illustrated, for example, in the report "Integral File
Data Compression" [BEEB79b]. In particular, those
applications which require only bit strings can be
implemented completely portably, once the number of bits in
an integer storage unit is available. I propose for this
purpose that the PORT Library Framework [FOX78a, FOX78b] be
adopted as a standard facility for accessing machine- and
operating-system parameters. This software is in the public
domain, and at present, supports various operating systems
for computers manufactured by:
* Burroughs (1700/5700/6700/7700 series)
* CDC(6000/7000/Cyber series)
* Cray 1
* Data General (Eclipse S/200)
* DEC (10/20/PDP-11/VAX)
* Harris (Slash 6 and Slash 7)
* Honeywell (600/6000 series)
* IBM (360/370 series)
* Interdata (8/32)
* SEL (System 85/86)
* UNIVAC (1100 series)
* Xerox (Sigma 5/7/9 series)
Parameters for others are very straightforward to add.
The PORT functions I1MACH(5) and I1MACH(9),
returning respectively the number of bits in an integer
storage unit, and the largest positive integer, are
sufficient to allow the data compression routines in
[BEEB79b] to convert quantum chemistry integral and matrix
element files containing REAL, DOUBLE PRECISION, and INTEGER
data into compressed bit strings in which insignificant
leading and trailing bits have been removed, offering a data
reduction by up to a factor of five or six. Other
applications which can be supported by these two PORT
functions together with the bit primitives include:
* Manipulation of bit representations of Slater
determinants and excitation operators in
configuration interaction, perturbation theory,
coupled cluster, and propagator calculations,
including conversion to and from human-readable
integer representations.
* Bit mapping used to represent patterns of non-zero
elements in sparse matrices.
* Patterns of dots in a plot to be output on a
dot-matrix plotter.
* Compressed LOGICAL arrays using only one bit per
element, instead of an entire word.
* Implementation of the standard set of character
primitives on a given machine.
* Masking operations.
* Construction of byte strings for control of devices
such as plotters.
Other functions in the PORT Library Framework which
provide such parameters as the machine precision, the base
of integer and floating-point numbers, and floating-point
exponents may largely obviate the need for bit manipulations
which require knowledge of the internal format of such data.
THE BIT PRIMITIVES
==================
A description of the complete set of bit primitives
which are proposed here follows. An experienced assembly
language programmer should be able to implement them in an
afternoon's work on any existing computer presently used for
chemical computations. An agency such as the Quantum
Chemistry Program Exchange or the NRCC could act as a source
of implementations of these routines for a variety of host
computers and operating systems, thereby increasing their
availability and hopefully encouraging their widespread
adoption. It has proven useful at our computing
installation to install these in the system FORTRAN library,
so that they are automatically available to all FORTRAN
programmers. If they are not made convenient to use, many
less-motivated programmers will not take the trouble to use
them, and the present undesirable situation of
machine-dependencies permeating chemical software will only
continue.
In all of the bit primitive routines, K, L, and
NBIT represent INTEGER variables; STRING(*) is an INTEGER
array. Many compilers will generate correct code if STRING
is an INTEGER scalar variable, but this use does not conform
to Portable FORTRAN. Portability dictates that the
arguments to these functions should be restricted to INTEGER
data types; use with arguments of REAL, LOGICAL, COMPLEX,
DOUBLE PRECISION, or Hollerith (character) data types may
introduce machine-dependence, and should be strongly
discouraged.
It is considered good programming practice to treat
FUNCTION arguments as read-only values, in order to avoid
side effects. This convention is adhered to in the
definition of all of the FUNCTION primitives.
In those higher-level primitives which deal with
variable-length bit strings, rather than single-word bit
patterns, the strings are defined in terms of three
variables. These are the name of the INTEGER array
containing the string, a starting position (numbering
1,2,3,... from the left), and the number of bits to be
considered. Thus, an argument sequence STRING,LOC,LEN
represents bits LOC, LOC+1, LOC+2, ..., LOC+LEN-1 stored in
the array STRING(*). It is an error condition if LOC or LEN
is less than 1, and the action to be taken in such a case
will be expressly defined for each primitive. In some
cases, two strings of the same length are present in the
argument list, and the length parameter of the first will
then be omitted.
INTEGER FUNCTION IBTAND (K,L)
INTEGER FUNCTION IBTCMP (STRNGA,LOCA,STRNGB,LOCB,LENGTH)
INTEGER FUNCTION IBTCOM (K)
LOGICAL FUNCTION IBTEST (STRING,NBIT)
SUBROUTINE IBTGET (BYTE,STRING,LOC,LENGTH)
SUBROUTINE IBTMOV (TARGET,LOCTAR,SOURCE,LOCSRC,LENGTH)
SUBROUTINE IBTOFF (STRING,NBIT)
SUBROUTINE IBTON (STRING,NBIT)
INTEGER FUNCTION IBTOR (K,L)
SUBROUTINE IBTPUT (BYTE,STRING,LOC,LENGTH)
INTEGER FUNCTION IBTROL (K,NBIT)
INTEGER FUNCTION IBTROR (K,NBIT)
INTEGER FUNCTION IBTROT (K,NBIT)
INTEGER FUNCTION IBTSHF (K,NBIT)
INTEGER FUNCTION IBTSHL (K,NBIT)
INTEGER FUNCTION IBTSHR (K,NBIT)
INTEGER FUNCTION IBTSUM (K)
SUBROUTINE IBTSWP (STRNGA,LOCA,STRNGB,LOCB,LENGTH)
INTEGER FUNCTION IBTXOR (K,L)
INTEGER FUNCTION IBTAND (K,L)
Return the logical AND of K and L. This has 1-bits
where both K and L have 1-bits, and 0-bits elsewhere.
INTEGER FUNCTION IBTCMP (STRNGA,LOCA,STRNGB,LOCB,LENGTH)
Given two bit strings define by STRNGA,LOCA,LENGTH
and STRNGB,LOCB,LENGTH, compare LENGTH bits of the two
substrings, treating the substrings as UNSIGNED binary
integers. Return -1, 0, or +1 for the conditions A < B, A =
B, or A > B, respectively. If any of LOCA, LOCB, or LENGTH
is less than 1, return 0.
INTEGER FUNCTION IBTCOM (K)
Return the logical complement of the bit pattern
stored in K. The complement is formed by inverting all
bits.
LOGICAL FUNCTION IBTEST (STRING,NBIT)
Test bit number NBIT in STRING(*), and return
.TRUE. if it is a 1-bit, and .FALSE. if it is a 0-bit. If
NBIT is less than 1, return .FALSE..
SUBROUTINE IBTGET (BYTE,STRING,LOC,LENGTH)
Return in the INTEGER variable BYTE a
right-adjusted bit string containing LENGTH bits extracted
starting at position LOC in the STRING(*). Any leading bits
in BYTE are set to zero. If LENGTH is larger than the word
size, only the first I1MACH(5) bits of the selected
substring will be returned in BYTE. If either LENGTH or LOC
is less than 1, 0 will be returned in BYTE.
SUBROUTINE IBTMOV (TARGET,LOCTAR,SOURCE,LOCSRC,LENGTH)
Given two bit strings defined by
TARGET,LOCTAR,LENGTH and SOURCE,LOCSRC,LENGTH, move LENGTH
bits from SOURCE into TARGET. If any of LOCTAR, LOCSRC, or
LENGTH is less than 1, no bits are moved. Bits must be moved
in order from left to right, equivalent to one bit at a
time, in order to provide consistent behavior in case TARGET
and SOURCE overlap.
SUBROUTINE IBTOFF (STRING,NBIT)
Set bit number NBIT in STRING(*) to 0. If NBIT < 1,
no bit will be set.
SUBROUTINE IBTON (STRING,NBIT)
Set bit number NBIT in STRING(*) to 1. If NBIT < 1,
no bit will be set.
INTEGER FUNCTION IBTOR (K,L)
Return the logical OR of the bit patterns present
in K and L. 1-bits are returned in positions in which
either K or L, or both, have a 1-bit.
SUBROUTINE IBTPUT (BYTE,STRING,LOC,LENGTH)
Store the right-most LENGTH bits of BYTE in the
STRING(*), starting at bit position LOC. If either LENGTH
or LOC is less than 1, no bits will be stored. If LENGTH is
larger than the wordsize, only the first I1MACH(5) bits of
BYTE will be stored at the designated position.
INTEGER FUNCTION IBTROL (K,NBIT)
Return the bit pattern represented by K after
rotating it left by |NBIT| bits. If NBIT is negative,
ignore its sign.
INTEGER FUNCTION IBTROR (K,NBIT)
Return the bit pattern represented by K after
rotating it right by |NBIT| bits. If NBIT is negative,
ignore its sign.
INTEGER FUNCTION IBTROT (K,NBIT)
Return the bit pattern represented by K after
rotating left (NBIT > 0) or right (NBIT < 0) by NBIT bits.
INTEGER FUNCTION IBTSHF (K,NBIT)
Return the bit pattern represented by K after
performing a logical shift left (NBIT > 0) or right (NBIT <
0) by NBIT bits. Bit positions vacated are filled by
0-bits. 1-bits shifted out of the word are lost.
INTEGER FUNCTION IBTSHL (K,NBIT)
Return the bit pattern represented by K after
performing a logical shift by |NBIT| bits to the left.
INTEGER FUNCTION IBTSHR (K,NBIT)
Return the bit pattern represented by K after
performing a logical shift by |NBIT| bits to the right.
INTEGER FUNCTION IBTSUM (K)
Return the number of 1-bits in the bit pattern
represented by K.
SUBROUTINE IBTSWP (STRNGA,LOCA,STRNGB,LOCB,LENGTH)
Given two bit strings defined by STRNGA,LOCA,LENGTH
and STRNGB,LOCB,LENGTH, swap the bit strings in memory. If
any of LOCA, LOCB, or LENGTH is less than 1, no swap is
performed.
INTEGER FUNCTION IBTXOR (K,L)
Return the exclusive OR of the bit patterns
represented by K and L. The result has 1-bits in positions
where K and L have different bits, and 0-bits where K and L
have identical bits.
CONCLUDING REMARKS
==================
It may be of some cause for concern that the
routines containing STRING(*) as an argument do not also
contain its dimension, in order that an out-of-bounds
storage reference can be avoided in the event of NBIT being
too large. We have elected NOT to include this extra
argument, because the same criticism may be raised for any
FORTRAN array reference for which the compiler does not
perform subscript range checking. Even when this feature is
available, it is usually suppressed in production programs
for reasons of run-time efficiency.
I believe that the above routines provide a
satisfactory set of bit primitives whose performance, apart
from word-length variations, is the same on all machines,
for both valid and invalid arguments. Arithmetic shift
routines have been intentionally excluded, because their
behaviour is inherently connected with the host number
representation, and their use would then unnecessarily
compromise portability.
Careful programming utilizing I1MACH(5) to provide
the number of bits in an INTEGER storage unit can eliminate
all word-length dependencies from the executable code, and
even from most of the dimensional limitations in the case of
the array STRING(*) if its storage is suitably allocated at
the top level of a program system. Just as it is becoming
more widely understood that fixed dimensions for potential
variable-sized arrays of standard data types should be
restricted to the MAIN program, so should the dimension of a
bit string disguised as an INTEGER array, since the required
amount of storage is certain to vary from one computer to
another.
Applications will no doubt arise in which it will
be convenient to permit Boolean operations to be extended to
variable-length bit strings beginning at arbitrary offsets
in data words. This presents sufficient extra
complications, that a set of higher-level bit routines based
on the basic primitives described above would be desirable.
Development of such a collection will be left for later
work.
REFERENCES
==========
BEEB79a Beebe, N.H.F., "Programmer's Guide to Portable
Software", Proceedings of the NRCC Conference of
Software Standards in Chemistry (1979).
BEEB79b Beebe, N.H.F., "Integral File Data Compression",
Proceedings of the NRCC Conference of Software
Standards in Chemistry (1979).
FOX78a Fox, P.A., Hall, A.D., and Schryer, N.L., "The PORT
Mathematical Subroutine Library", ACM Transactions
on Mathematical Software 4, 109-126 (1978).
FOX78b Fox, P.A., Hall, A.D., and Schryer, N.L.,
"Algorithm 528. Framework for a Portable Library",
ACM Transactions on Mathematical Software 4,
177-188 (1978).
ISA76 "Standard S61.1. Industrial Computer System FORTRAN
Procedures for Executive Functions, Process
Input/Output, and Bit Manipulation", Instrument
Society of America, 400 Stanwix Street, Pittsburgh,
PA 15222, February (1976).