Previous: support-criteria Up: ../karrtn.html Next: argument-summary
THE CHARACTER PRIMITIVES ======================== The character primitives defined in the remainder of this proposal can all be implemented entirely in FORTRAN if a standard set of bit primitives is available. However, because of the differing storage order on some machines such as the PDP-11 and the DEC VAX 11/780, where characters are stored in reverse order, a FORTRAN implementation will in general not be portable, even if such parameters as the number of bits in a character, and the number of characters in an INTEGER storage unit, are available machine-independently through the PORT Library Framework [FOX78a, FOX78b]. However, an initial FORTRAN implementation in terms of bit primitives may nevertheless be useful as a bootstrapping process when software is to be installed on a new machine. All of the routines will be straightforward to implement in assembly language, and particularly for those machines which support character addressing in hardware, it may be an order of magnitude more efficient to do so. Just as in the case of the proposed bit primitives, it is anticipated that bodies such as the Quantum Chemistry Program Exchange or the NRCC could act as a source of implementations of these primitives for a wide variety of host computers. Installations will also find that programmers are more easily encouraged to use the standard character primitives if they are conveniently available, preferably as part of the local system FORTRAN library. In the following descriptions, all arguments are scalar INTEGER variables, except TEXT(*), which represents either a Hollerith constant, or character data packed with the maximum number of characters per word. Exceptions to this will be noted when necessary. Readers familiar with the programming languages PASCAL and PL/1 will note their influences on the design of these routines. The character primitives will be divided into two classes -- basic routines, and higher-level routines. The latter can be implemented in FORTRAN in terms of the former, although on some systems with advanced hardware facilities, it may be desirable to define them directly in assembly language. In developing any software system, a decision must always be made about how error conditions are to be handled. In a set of routines which are proposed for adoption as a Standard, it is clearly unacceptable to ignore errors, and it is equally unsatisfactory to define behaviour under error conditions to be "undefined", for this simply means that the action to be taken is decided by the implementor. Only two acceptable alternatives exist. Either an error flag can be returned, or predefined reasonable action can be taken when errors arise. The first of these places the burden of error handling on the user of the software, and frequently results in error conditions simply being ignored, or perhaps handled incorrectly. The second alternative simplifies programming on the part of the user by moving the error processing to a lower level, and also guarantees consistent error handling in all implementations. For this reason, the second of these has been adopted for the character primitives. An axiom of good programming is that functions should not have side effects. In practical terms, this usually means that they should not modify their arguments, or variables globally accessible through COMMON storage or its equivalent. This convention has been adhered to in the definition of the FUNCTION character primitives. In those primitives which deal with character strings, rather than single characters, the strings are defined in terms of three variables. These are the name of the INTEGER array containing the string, a starting position (numbering 1,2,3,... from the left), and the number of characters to be considered, counting from the starting position. Thus, an argument sequence TEXT,LOC,LEN represents characters LOC, LOC+1, LOC+2, ..., LOC+LEN-1 stored in the array TEXT(*). It is an error condition if either LOC or LEN is less than 1, and the action to be taken will be expressly defined for each primitive. In some cases, two strings of the same length are present in the argument list, and the length parameter for the first will then be omitted. In most applications, the LOC parameter will point to the first character in the array; its presence is, however, necessary to allow access to strings which do not begin at a word boundary.