INTRODUCTION

          PROPOSAL FOR A STANDARD SET OF PRIMITIVES
                   FOR MACHINE-INDEPENDENT
                 BIT MANIPULATION IN FORTRAN
 
                              by
 
                      Nelson H.F. Beebe
             Departments of Physics and Chemistry
                      University of Utah
                   Salt Lake City, UT 84112
 
                     Tel: (801) 581-5254
 
 
                         INTRODUCTION
                         ============
 
          The  1979 NRCC Conference on  Software Standards in
 Chemistry  held  at  the  University  of  Utah  resulted  in
 considerable discussion  about standard ways of implementing
 bit  manipulation in FORTRAN programs.   A previous proposal
 by  the  author  described in  the  "Programmer's  Guide  to
 Portable Software" [BEEB79a] met with the criticism that the
 routine names  began with the letters  BIT, even though some
 were INTEGER  functions.  Many programmers apparently prefer
 to use  FORTRAN's default  typing conventions  based on  the
 initial  letter of  variable  names,  even if  this  reduces
 mnemonic  significance and  does not provide  for other than
 REAL and INTEGER data types, unless an IMPLICIT statement is
 supported by  the  host  compiler. General  agreement  could
 apparently be reached if the routine name prefix was changed
 to IBT,  but it  should be  noted that  this still  requires
 variable  name typing  for those routines  which are LOGICAL
 functions.
 
          Several  participants recalled  that the Instrument
 Society  of  America  (ISA),  which  establishes  industrial
 standards of various sorts,  and acts as an advisory body to
 the American  National Standards  Institute (ANSI),  perhaps
 had  established   standard  bit   primitives  for   FORTRAN
 programming.   This  is   indeed  the  case,  and  has  been
 published as ISA  Standard S61.1 "Industrial Computer System
 FORTRAN   Procedures  for   Executive   Functions,   Process
 Input/Output, and Bit Manipulation".
 
          I feel strongly that ISA S61.1 is not an acceptable
 standard for  the  needs of  chemistry-related  programming.
 There are several objections which may be noted.
 
          First,  names chosen  for the  bit primitives (IOR,
 IAND,  NOT, IEOR,  ISHFT, BTEST,  IBSET, and  IBCLR) and for
 obtaining the date and time  of day (DATE and TIME) are very
 likely to conflict with names already implemented by various
 manufacturers  in their extensions to  the FORTRAN language.
 This is not a trivial problem, because some compilers expand
 intrinsic  functions   in-line,  and   overriding  them   by
 user-provided  routines may not be  possible, or may require
 the  use  of   a  special  compiler  option,  or  additional
 declaration statements in  the routine which refers to them.
 The  order and  interpretation of arguments  may differ from
 machine to machine, and could cause unexpected results which
 might be difficult to trace.
 
          Second, there  is no  consistent naming  convention
 for the routines.  The use  of a mnemonic prefix such as BIT
 or  IBT has  proved invaluable  in large  program systems in
 that it  allows groups  of related  routines to  be  readily
 identified, and  also  distinguishes  the names  from  local
 names.
 
          Third, the  bits are  numbered from  right to  left
 starting  with zero.   There is no  agreement among computer
 manufacturers about  whether  numbering should  be  0123...,
 123...,  ...321, or  ...3210, and the  precedence of FORTRAN
 array   subscripts  following  the   order  123...  strongly
 suggests that  a left-to-right  123... numbering  convention
 should  be adhered  to.  This is  particularly important for
 the routines which test and set bits, for which an extension
 to bit arrays contained in  more than one word of storage is
 rather useful.
 
          Fourth, ISA  S61.1 does  not prescribe  a  complete
 interpretation of the arguments.  Nothing is said about what
 the  primitive will  do when presented  with an out-of-range
 bit  number.   This  is an  unsatisfactory  situation  in  a
 standard, and is bound to cause portability problems.
 
          An alternative  set  of standard  bit  manipulation
 routines is  presented  in this  proposal, and,  I  believe,
 remedies all of the above criticisms.
 
          Because the internal  format of FORTRAN data types,
 and also of characters, differs from machine to machine, and
 indeed,   even  from  compiler  to   compiler  at  a  single
 installation,  it  should  be clear  that  the  use  of  bit
 primitives to  access data  immediately involves assumptions
 which  will  defeat  the goal  of  portability  of  chemical
 software.
 
          It is  nevertheless possible  to implement  certain
 types  of bit manipulation in  a machine-independent way, as
 is  illustrated, for  example, in the  report "Integral File
 Data   Compression"   [BEEB79b].    In   particular,   those
 applications   which  require   only  bit   strings  can  be
 implemented completely portably,  once the number of bits in
 an  integer storage unit  is available.  I  propose for this
 purpose that the  PORT Library Framework [FOX78a, FOX78b] be
 adopted  as a  standard facility for  accessing machine- and
 operating-system parameters.  This software is in the public
 domain,  and at present, supports  various operating systems
 for computers manufactured by:
 
 *  Burroughs (1700/5700/6700/7700 series)
 *  CDC(6000/7000/Cyber series)
 *  Cray 1
 *  Data General (Eclipse S/200)
 *  DEC (10/20/PDP-11/VAX)
 *  Harris (Slash 6 and Slash 7)
 *  Honeywell (600/6000 series)
 *  IBM (360/370 series)
 *  Interdata (8/32)
 *  SEL (System 85/86)
 *  UNIVAC (1100 series)
 *  Xerox (Sigma 5/7/9 series)
 
 Parameters for others are very straightforward to add.
 
          The  PORT   functions  I1MACH(5)   and   I1MACH(9),
 returning respectively  the  number of  bits in  an  integer
 storage   unit,  and  the  largest   positive  integer,  are
 sufficient  to  allow   the  data  compression  routines  in
 [BEEB79b]  to convert quantum chemistry  integral and matrix
 element files containing REAL, DOUBLE PRECISION, and INTEGER
 data into  compressed  bit  strings in  which  insignificant
 leading and trailing bits have been removed, offering a data
 reduction  by  up  to  a  factor  of  five  or  six.   Other
 applications  which  can  be supported  by  these  two  PORT
 functions together with the bit primitives include:
 
 *   Manipulation   of  bit  representations  of  Slater
     determinants    and   excitation    operators    in
     configuration   interaction,  perturbation  theory,
     coupled  cluster,   and  propagator   calculations,
     including  conversion to  and  from  human-readable
     integer representations.
 
 *   Bit mapping used to represent  patterns of non-zero
     elements in sparse matrices.
 
 *   Patterns  of  dots  in a  plot to  be output  on  a
     dot-matrix plotter.
 
 *   Compressed  LOGICAL arrays  using only  one bit per
     element, instead of an entire word.
 
 *   Implementation  of  the  standard set  of character
     primitives on a given machine.
 
 *   Masking operations.
 
 *   Construction of byte strings for control of devices
     such as plotters.
 
          Other functions in the PORT Library Framework which
 provide  such parameters as the  machine precision, the base
 of integer  and floating-point  numbers, and  floating-point
 exponents may largely obviate the need for bit manipulations
 which require knowledge of the internal format of such data.

BIT-PRIMITIVES

                      THE BIT PRIMITIVES
                      ==================
 
          A description of the complete set of bit primitives
 which  are proposed  here follows.   An experienced assembly
 language programmer  should be able to  implement them in an
 afternoon's work on any existing computer presently used for
 chemical  computations.   An  agency  such  as  the  Quantum
 Chemistry Program Exchange or the NRCC could act as a source
 of implementations  of these routines for  a variety of host
 computers and  operating systems,  thereby increasing  their
 availability  and  hopefully  encouraging  their  widespread
 adoption.   It   has   proven   useful  at   our   computing
 installation to install these in the system FORTRAN library,
 so that  they  are automatically  available to  all  FORTRAN
 programmers.  If  they are not made  convenient to use, many
 less-motivated programmers will  not take the trouble to use
 them,    and   the   present    undesirable   situation   of
 machine-dependencies permeating  chemical software will only
 continue.
 
          In  all of  the bit  primitive routines,  K, L, and
 NBIT  represent INTEGER  variables; STRING(*)  is an INTEGER
 array.  Many compilers  will generate correct code if STRING
 is an INTEGER scalar variable, but this use does not conform
 to  Portable   FORTRAN.   Portability   dictates  that   the
 arguments to these functions should be restricted to INTEGER
 data  types; use  with arguments of  REAL, LOGICAL, COMPLEX,
 DOUBLE  PRECISION, or  Hollerith (character)  data types may
 introduce  machine-dependence,   and  should   be   strongly
 discouraged.
 
          It is considered good programming practice to treat
 FUNCTION  arguments as  read-only values, in  order to avoid
 side  effects.   This   convention  is  adhered  to  in  the
 definition of all of the FUNCTION primitives.
 
          In those  higher-level primitives  which deal  with
 variable-length bit  strings,  rather than  single-word  bit
 patterns,  the  strings   are  defined  in  terms  of  three
 variables.   These  are   the  name  of  the  INTEGER  array
 containing  the   string,  a  starting  position  (numbering
 1,2,3,... from  the left),  and  the number  of bits  to  be
 considered.   Thus,   an  argument  sequence  STRING,LOC,LEN
 represents bits LOC,  LOC+1, LOC+2, ..., LOC+LEN-1 stored in
 the array STRING(*).  It is an error condition if LOC or LEN
 is less  than 1, and the  action to be taken  in such a case
 will  be expressly  defined  for  each primitive.   In  some
 cases,  two strings  of the same  length are  present in the
 argument  list, and the  length parameter of  the first will
 then be omitted.

ARGUMENT-SUMMARY

 
       INTEGER FUNCTION IBTAND (K,L)
       INTEGER FUNCTION IBTCMP (STRNGA,LOCA,STRNGB,LOCB,LENGTH)
       INTEGER FUNCTION IBTCOM (K)
       LOGICAL FUNCTION IBTEST (STRING,NBIT)
       SUBROUTINE IBTGET (BYTE,STRING,LOC,LENGTH)
       SUBROUTINE IBTMOV (TARGET,LOCTAR,SOURCE,LOCSRC,LENGTH)
       SUBROUTINE IBTOFF (STRING,NBIT)
       SUBROUTINE IBTON  (STRING,NBIT)
       INTEGER FUNCTION IBTOR  (K,L)
       SUBROUTINE IBTPUT (BYTE,STRING,LOC,LENGTH)
       INTEGER FUNCTION IBTROL (K,NBIT)
       INTEGER FUNCTION IBTROR (K,NBIT)
       INTEGER FUNCTION IBTROT (K,NBIT)
       INTEGER FUNCTION IBTSHF (K,NBIT)
       INTEGER FUNCTION IBTSHL (K,NBIT)
       INTEGER FUNCTION IBTSHR (K,NBIT)
       INTEGER FUNCTION IBTSUM (K)
       SUBROUTINE IBTSWP (STRNGA,LOCA,STRNGB,LOCB,LENGTH)
       INTEGER FUNCTION IBTXOR (K,L)

IBTAND

       INTEGER FUNCTION IBTAND (K,L)
 
          Return the logical AND of K and L.  This has 1-bits
 where both K and L have 1-bits, and 0-bits elsewhere.

IBTCMP

       INTEGER FUNCTION IBTCMP (STRNGA,LOCA,STRNGB,LOCB,LENGTH)
 
          Given two  bit strings define by STRNGA,LOCA,LENGTH
 and  STRNGB,LOCB,LENGTH,  compare LENGTH  bits  of  the  two
 substrings,  treating  the  substrings  as  UNSIGNED  binary
 integers.  Return -1, 0, or +1 for the conditions A < B, A =
 B, or A > B,  respectively.  If any of LOCA, LOCB, or LENGTH
 is less than 1, return 0.

IBTCOM

       INTEGER FUNCTION IBTCOM (K)
 
          Return  the logical  complement of  the bit pattern
 stored in  K.  The  complement is  formed by  inverting  all
 bits.

IBTEST

       LOGICAL FUNCTION IBTEST (STRING,NBIT)
 
          Test  bit  number NBIT  in  STRING(*),  and  return
 .TRUE. if  it is a 1-bit,  and .FALSE. if it  is a 0-bit. If
 NBIT is less than 1, return .FALSE..

IBTGET

       SUBROUTINE IBTGET (BYTE,STRING,LOC,LENGTH)
 
          Return    in   the   INTEGER    variable   BYTE   a
 right-adjusted  bit string containing  LENGTH bits extracted
 starting at position LOC in the STRING(*).  Any leading bits
 in BYTE are set to  zero.  If LENGTH is larger than the word
 size,  only  the   first  I1MACH(5)  bits  of  the  selected
 substring will be returned in BYTE.  If either LENGTH or LOC
 is less than 1, 0 will be returned in BYTE.

IBTMOV

       SUBROUTINE IBTMOV (TARGET,LOCTAR,SOURCE,LOCSRC,LENGTH)
 
          Given     two    bit     strings     defined     by
 TARGET,LOCTAR,LENGTH and  SOURCE,LOCSRC,LENGTH, move  LENGTH
 bits from SOURCE into  TARGET.  If any of LOCTAR, LOCSRC, or
 LENGTH is less than 1, no bits are moved. Bits must be moved
 in  order from  left to  right, equivalent  to one  bit at a
 time, in order to provide consistent behavior in case TARGET
 and SOURCE overlap.

IBTOFF

       SUBROUTINE IBTOFF (STRING,NBIT)
 
          Set bit number NBIT in STRING(*) to 0. If NBIT < 1,
 no bit will be set.

IBTON

       SUBROUTINE IBTON  (STRING,NBIT)
 
          Set bit number NBIT in STRING(*) to 1. If NBIT < 1,
 no bit will be set.

IBTOR

       INTEGER FUNCTION IBTOR  (K,L)
 
          Return  the logical OR of  the bit patterns present
 in K  and L.   1-bits  are returned  in positions  in  which
 either K or L, or both, have a 1-bit.

IBTPUT

       SUBROUTINE IBTPUT (BYTE,STRING,LOC,LENGTH)
 
          Store the  right-most LENGTH  bits of  BYTE in  the
 STRING(*),  starting at bit position  LOC.  If either LENGTH
 or LOC is less than 1, no bits will be stored.  If LENGTH is
 larger than  the wordsize, only the  first I1MACH(5) bits of
 BYTE will be stored at the designated position.

IBTROL

       INTEGER FUNCTION IBTROL (K,NBIT)
 
          Return  the  bit pattern  represented  by  K  after
 rotating it  left  by |NBIT|  bits.   If NBIT  is  negative,
 ignore its sign.

IBTROR

       INTEGER FUNCTION IBTROR (K,NBIT)
 
          Return  the  bit pattern  represented  by  K  after
 rotating it  right by  |NBIT| bits.   If NBIT  is  negative,
 ignore its sign.

IBTROT

       INTEGER FUNCTION IBTROT (K,NBIT)
 
 
          Return  the  bit pattern  represented  by  K  after
 rotating left (NBIT > 0) or right (NBIT < 0) by NBIT bits.

IBTSHF

       INTEGER FUNCTION IBTSHF (K,NBIT)
 
          Return  the  bit pattern  represented  by  K  after
 performing a logical shift left  (NBIT > 0) or right (NBIT <
 0)  by NBIT  bits.   Bit  positions vacated  are  filled  by
 0-bits. 1-bits shifted out of the word are lost.

IBTSHL

       INTEGER FUNCTION IBTSHL (K,NBIT)
 
          Return  the  bit pattern  represented  by  K  after
 performing a  logical shift by |NBIT| bits  to the left.

IBTSHR

       INTEGER FUNCTION IBTSHR (K,NBIT)
 
          Return  the  bit pattern  represented  by  K  after
 performing a logical shift  by |NBIT| bits to the right.

IBTSUM

       INTEGER FUNCTION IBTSUM (K)
 
          Return the  number  of 1-bits  in the  bit  pattern
 represented by K.

IBTSWP

       SUBROUTINE IBTSWP (STRNGA,LOCA,STRNGB,LOCB,LENGTH)
 
          Given two bit strings defined by STRNGA,LOCA,LENGTH
 and  STRNGB,LOCB,LENGTH, swap the bit  strings in memory. If
 any  of LOCA,  LOCB, or  LENGTH is less  than 1,  no swap is
 performed.

IBTXOR

       INTEGER FUNCTION IBTXOR (K,L)
 
          Return  the  exclusive   OR  of  the  bit  patterns
 represented by K and  L.  The result has 1-bits in positions
 where K and L have  different bits, and 0-bits where K and L
 have identical bits.

CONCLUDING-REMARKS

                      CONCLUDING REMARKS
                      ==================
 
 
          It  may be  of  some  cause for  concern  that  the
 routines containing  STRING(*) as  an argument  do not  also
 contain  its  dimension,  in  order  that  an  out-of-bounds
 storage reference can be  avoided in the event of NBIT being
 too  large.  We  have  elected  NOT to  include  this  extra
 argument, because  the same criticism may  be raised for any
 FORTRAN array  reference  for which  the compiler  does  not
 perform subscript range checking.  Even when this feature is
 available,  it is usually suppressed  in production programs
 for reasons of run-time efficiency.
 
          I  believe  that   the  above  routines  provide  a
 satisfactory set of  bit primitives whose performance, apart
 from  word-length variations,  is the same  on all machines,
 for both  valid  and  invalid arguments.   Arithmetic  shift
 routines have  been  intentionally excluded,  because  their
 behaviour  is  inherently connected  with  the  host  number
 representation,  and  their  use  would  then  unnecessarily
 compromise portability.
 
          Careful programming  utilizing I1MACH(5) to provide
 the number of bits  in an INTEGER storage unit can eliminate
 all  word-length dependencies from the  executable code, and
 even from most of the dimensional limitations in the case of
 the array STRING(*) if  its storage is suitably allocated at
 the top  level of a program system.  Just  as it is becoming
 more  widely understood that fixed  dimensions for potential
 variable-sized  arrays  of standard  data  types  should  be
 restricted to the MAIN program, so should the dimension of a
 bit string disguised as an INTEGER array, since the required
 amount  of storage is  certain to vary from  one computer to
 another.
 
          Applications  will no doubt arise  in which it will
 be convenient to permit Boolean operations to be extended to
 variable-length  bit strings beginning  at arbitrary offsets
 in    data   words.    This    presents   sufficient   extra
 complications, that a set of higher-level bit routines based
 on the basic  primitives described above would be desirable.
 Development of  such a  collection will  be left  for  later
 work.

BIBLIOGRAPHY

                          REFERENCES
                          ==========
 
 BEEB79a  Beebe,  N.H.F.,  "Programmer's  Guide  to  Portable
          Software", Proceedings  of the  NRCC Conference  of
          Software Standards  in Chemistry  (1979).
 
 BEEB79b  Beebe, N.H.F.,  "Integral File  Data  Compression",
          Proceedings  of  the NRCC  Conference  of  Software
          Standards in Chemistry (1979).
 
 FOX78a   Fox, P.A., Hall, A.D., and Schryer, N.L., "The PORT
          Mathematical  Subroutine Library", ACM Transactions
          on Mathematical Software 4, 109-126 (1978).
 
 FOX78b   Fox,  P.A.,   Hall,   A.D.,  and   Schryer,   N.L.,
          "Algorithm 528.  Framework for a Portable Library",
          ACM  Transactions   on  Mathematical   Software  4,
          177-188 (1978).
 
 ISA76    "Standard S61.1. Industrial Computer System FORTRAN
          Procedures   for   Executive   Functions,   Process
          Input/Output,  and  Bit  Manipulation",  Instrument
          Society of America, 400 Stanwix Street, Pittsburgh,
          PA 15222, February (1976).

Table of contents