For indexing purposes, a word is any contiguous set of letters and numbers, after the following steps:
- 0.
- Single characters and a few very common words ("an", "and", "for", "in", "of", "on", "the", "to", and "with") are never indexed.
- 1.
- Letters are folded to lower case. Thus, "Voronoi" is indexed as "voronoi".
- 2.
- All TeX commands, except those in math expressions, are removed, but their arguments are left behind. Thus, "Erd{\H o}ss" is indexed as "erdos".
- 3.
- All other non-alphanumeric characters are removed. Non-word characters inside {{possibly} nested} braces or dollar signs do not delimit words, so they may cause unexpected results. Within braces, spaces and tabs delimit components of compound words, which are indexed both as a unit and as individual components. Thus, "{this example}" is indexed as "this example thisexample".
- 4.
- TeX commands in math expressions are considered normal text. However, each contiguous string of letters and numbers is considered a component of a compound word. Thus, "$O(n\log^2 n)$" is indexed as "log onlog2n" instead of "on2".
- 5.
- Apostrophes and brackets are ignored. Thus, "{\'O}'D{\'u}nlaing" is indexed as "odunlaing", and "J[ohn]" is indexed as "john".
- 6.
- Single hyphens separate components of compound words. Thus, "semi-on-line" is indexed as "semi line semionline". (Recall that "on" is ignored.)
bibindex examines the contents of all value strings, and expects them to be well-formed TeX input. In particular, braces, quotation marks, and dollar signs should be balanced.
Errors detected result in a message giving the line number of the BibTeX entry in which the error was detected, and the line number at the point of the error. Unbalanced braces or dollar signs can result in large differences between these line numbers; in such a case, the error is somewhere in the entry indicated by the first line number.
Jeff Erickson Computer Science Division University of California Berkeley, CA 94720 USA Email: <jeffe@cs.berkeley.edu>
This program is in the public domain. You may use it or modify it to your heart's content, at your own risk.