Beginning of example document
List of informative files to view:
<A NAME=1 HREF="http://nsidc1.colorado.edu:1729/u/CIMS/Demo_Description.html"> Description of this demo
<A NAME=2 HREF="http://nsidc1.colorado.edu:1729/u/CIMS/More_info.html"> More Info Frame
<A NAME=3 HREF="http://nsidc1.colorado.edu:1729/u/CIMS/More_info.html"> Free Text Frame
<A NAME=4 HREF="http://nsidc1.colorado.edu:1729/u/CIMS/Even_more_info.html"> Free Text Frame
The first link contains a description of this demo. The second link points to a frame that provides more information. The third link has a different tag that the second link, but points to the same place. The fourth tag is the same as the third tag, but points to a different file.
End of example document
NOTE: for clarity, the following terms refer to the above constructs.
ANCHOR:
First, the program calls extract_links( ) recursively on all files that end with "html" to load the skiplist abstract data type (ADT) with information about the hyperlinks. Concurrently, a non-html version of the file is created and placed in the temporary repository (default: /var/tmp/html_analyzer; to change the path of the repository, state the path as the last command line argument).
Second, the program calls validate( ) if the -val command line argument is not present. validate( ) attempts to confirm the hyperlinks extracted from all the files in the directory hierarchy.
Third, the program calls comleteness( ), which looks for occurrences of each anchor's contents in the database that are not used as a hyperlink. This is implemented by performing a grep on the non-html files and listing the files that matched the hyperlink's contents.
Finally, the functions consistent_link and consistent_content( ) are called to find hyperlinks that are pointed to by two different contents and hyperlink content's that point to two different documents. Essentially, this looks for hyperlinks that violate a one-to-one correspondence between the contents of the hyperlink and the anchor itself.
VERIFYING LINKS... WWW Alert: HTTP server at nsidc1.colorado.edu:1729 replies: HTTP/1.0 500 Unable to access document. WWW Alert: Unable to access document. WARNING: Failed in checking: http://nsidc1.colorado.edu:1729/u/CIMS/Demo_Description.html With content of: Description of this demo In local file: ./temp/example.html
Next, html_analyzer finds out that the contents used to describe this link occurred elsewhere with out a link. This could have been in another file, but in this case, the string occurred in the text portion of the document. The user is given a list of the file(s) that need to have that tag made into a hyperlink. The output will look something like this:
VERIFYING COMPLETENESS... WARNING: These filenames contain the content: Description of this demo Without a link to: http://nsidc1.colorado.edu:1729/u/CIMS/Demo_Description.html example.html
Next, the user will be informed that more than one hyperlink content is used to describe the link to /u/CIMS/More_info.html on nsidc1. In this case, both the "More Info Frame" and the "Free Text Frame" point to the same file. One of them needs to go. To aid the HTML db maintainers with the task of deciding which content to remove, the software informs the use of the number of occurrences of each in the database. The output would look something like this:
VERIFYING CONSISTENCY OF LINKS... WARNING: Link used inconsistently. HREF: http://nsidc1.colorado.edu:1729/u/CIMS/More_info.html occurs 1 time with content: Free Text Frame as in file: ./temp/example.html, but also occurs 1 time with content: More Info Frame as in file: ./temp/example.html
Next, the user will be informed that the hyperlink that contains "Free Text Frame" points to more than one file. This is easily corrected by changing the contents of the hyperlink that is used least often to another name. The output would look like this:
VERIFYING CONSISTENCY OF CONTENTS... WARNING: Content used inconsistently. CONTENT: Free Text Frame occurs 1 time with href: http://nsidc1.colorado.edu:1729/u/CIMS/Even_more_info .html as in file: ./temp/example.html, but also occurs 1 time with href: http://nsidc1.colorado.edu:1729/u/CIMS/More_info.html as in file: ./temp/example.html
James E. Pitkow Graphics, Visualization and Usability Laboratory Georgia Institute of Technologypitkow@cc.gatech.edu