Previous: support-criteria Up: ../karrtn.html Next: argument-summary


CHARACTER-PRIMITIVES

                   THE CHARACTER PRIMITIVES
                   ========================
 
          The  character primitives defined  in the remainder
 of this proposal can  all be implemented entirely in FORTRAN
 if a standard set  of bit primitives is available.  However,
 because of the differing storage order on some machines such
 as the  PDP-11 and the DEC VAX  11/780, where characters are
 stored  in reverse  order, a FORTRAN  implementation will in
 general not  be portable,  even if  such parameters  as  the
 number of bits in  a character, and the number of characters
 in    an     INTEGER    storage    unit,    are    available
 machine-independently  through the  PORT  Library  Framework
 [FOX78a,    FOX78b].    However,    an    initial    FORTRAN
 implementation  in terms of bit  primitives may nevertheless
 be useful as a  bootstrapping process when software is to be
 installed on  a new  machine. All  of the  routines will  be
 straightforward  to  implement  in  assembly  language,  and
 particularly  for  those machines  which  support  character
 addressing in hardware, it may be an order of magnitude more
 efficient to do so.
 
          Just as in the case of the proposed bit primitives,
 it is anticipated that  bodies such as the Quantum Chemistry
 Program Exchange  or  the  NRCC could  act  as a  source  of
 implementations  of these  primitives for a  wide variety of
 host  computers.    Installations   will  also   find   that
 programmers  are more easily encouraged  to use the standard
 character primitives  if  they are  conveniently  available,
 preferably as part of the local system FORTRAN library.
 
          In the  following descriptions,  all arguments  are
 scalar  INTEGER variables, except  TEXT(*), which represents
 either  a Hollerith constant, or  character data packed with
 the maximum  number of  characters per  word. Exceptions  to
 this  will be  noted when necessary.   Readers familiar with
 the programming  languages PASCAL  and PL/1  will note their
 influences on the design of these routines.
 
          The  character primitives will be  divided into two
 classes  -- basic routines, and  higher-level routines.  The
 latter can be implemented in FORTRAN in terms of the former,
 although on some  systems with advanced hardware facilities,
 it may  be desirable  to define  them directly  in  assembly
 language.
 
          In developing any  software system, a decision must
 always be made about how error conditions are to be handled.
 In  a set of routines  which are proposed for  adoption as a
 Standard,  it is clearly unacceptable  to ignore errors, and
 it is equally unsatisfactory to define behaviour under error
 conditions to be "undefined", for this simply means that the
 action to be taken is decided by the implementor.
 
          Only two  acceptable alternatives exist.  Either an
 error flag can  be returned, or predefined reasonable action
 can  be taken when  errors arise. The first  of these places
 the  burden of error  handling on the user  of the software,
 and frequently  results  in  error conditions  simply  being
 ignored,   or  perhaps  handled   incorrectly.   The  second
 alternative  simplifies programming on the  part of the user
 by  moving the error  processing to a lower  level, and also
 guarantees consistent error handling in all implementations.
 For  this reason, the  second of these has  been adopted for
 the character primitives.
 
          An axiom  of  good  programming is  that  functions
 should not  have  side effects.   In practical  terms,  this
 usually  means that they should  not modify their arguments,
 or  variables globally accessible through  COMMON storage or
 its equivalent.  This convention  has been adhered to in the
 definition of the FUNCTION character primitives.
 
          In  those  primitives  which  deal  with  character
 strings, rather  than  single  characters, the  strings  are
 defined in terms of  three variables.  These are the name of
 the INTEGER array containing the string, a starting position
 (numbering  1,2,3,... from  the  left),  and the  number  of
 characters to  be  considered,  counting from  the  starting
 position.    Thus,   an   argument   sequence   TEXT,LOC,LEN
 represents  characters  LOC, LOC+1,  LOC+2,  ...,  LOC+LEN-1
 stored  in the array  TEXT(*).  It is an  error condition if
 either LOC or LEN is less than 1, and the action to be taken
 will be expressly defined for each primitive. In some cases,
 two strings  of the same length are  present in the argument
 list,  and the length  parameter for the first  will then be
 omitted. In most  applications, the LOC parameter will point
 to  the first  character  in  the array;  its  presence  is,
 however, necessary  to allow access to  strings which do not
 begin at a word boundary.