Working ISINDEX Tag Element Examples

The <ISINDEX> tag can be both easy and yet difficult to understand. References often speak of it referring to a file as being an "indexed" file, whatever that means. Working examples tend to be somewhat scarce, even on HTML documentation pages because a simple HTML file with this tag is not enough. With that one can see what sort of widget the <ISINDEX> tag causes a browser to show, but it does not illustrate the dynamic working of this tag. On the other hand, what can make this tag easy to understand is the fact that for a replacement, most guides will recommend HTML forms, which indeed is the successor to this tag.

Forms seem to have made their first appearence somewhere in 1993, with HTML+ describing a crude version of forms (with some variations from how forms ended up in future versions of HTML), and finally being added as an option for the second release of HTML 1 ("HTML 1.m") and HTML 2 "level 2." However, <ISINDEX> shows all signs of being much older. In an email from Tim Berners-Lee dated October 29, 1991, <ISINDEX> was mentioned in passing and described as being something very much already in existence and apparently a going concern. In those early days, it was thought that the server would insert this tag if it somehow recognized the file as being "searchable." The exact mechanisms for that as used back then are not clear, but in effect that is how it works as it can be (and is in some very few instances) implemented today.

Though <ISINDEX> is almost as old as <NEXTID>, <PLAINTEXT>, <XMP>, and <LISTING>, unlike them it continues to be recognized by even the most advanced version of HTML and even the initial version of XHTML. But with the coming of XHTML 1.1 this tag is no longer recognized. The versions of HTML that concern the <ISINDEX> tag are:

HTML "0.a" - from the beginning through January 10, 1991
This tag had not been invented as yet, so no examples are found from this period.
HTML "0.c" - from January 23, 1991 though November 23, 1992
This early version of HTML introduced <ISINDEX> as a binary switch, to create a widget window if present and do nothing if absent.
HTML "0.d" - from November 26, 1992 through May 24, 1993
During this span, <ISINDEX> is quite specifically meant to be confined to the <HEADER> or <HEAD> portion of the document.
HTML "1.k" - Version 1 (first release)
In this first published draft of HTML, <ISINDEX> is the same as it had been and would be in HTML 2 Level 1, either absent or present in the <HEAD>.
HTML "1.m" - Version 1 (second release)
In the next published draft of HTML, <ISINDEX> can also be in the <BODY> any number of times as well as in the <HEAD> once, except if forms are disabled, in which case it would be handled just like in HTML 2 Level 1.
HTML Version 2 Level 1
This is like the level 2 default but it excludes all the forms elements, i. e. <FORM>, <INPUT>, <TEXTAREA>, <SELECT>, and <OPTION>, and <ISINDEX> is permitted only in the <HEAD> portion of the document.
HTML Version 2 Strict Level 1
This is like regular Level 1 but it also excludes certain other depreciated elements, along with such constructs as nesting a header (<H*> element) within a link (<A> element). <ISINDEX> is handled the same as in the regular Level 1.
HTML Version 2 Level 2
This is the default and includes and permits all HTML Level 2 functions and elements and attributes. <ISINDEX> can here occur in either the <HEAD> or the <BODY>.
HTML Version 2 Strict Level 2
This excludes certain depreciated elements and also forbids such constructs as nesting a header (<H*> element) within a link (<A> element), or having a forms <INPUT> element which is not within a block level element such as <P>. <ISINDEX> is handled the same as in the regular Level 2.
HTML Version 3.2
This adds all the Version 3 constructs such as <FONT>, <MAP>, <APPLET>, and <TABLE>. <ISINDEX> continues to be acceptible in both <HEAD> and <BODY> as it is in HTML 2 Level 2.
HTML Version 4.0 and 4.01 Transitional
This version of HTML adds new constructs for frames, stylesheets, scripting languages, and adds many details to forms and tables, but many tags, including <ISINDEX>, are included only as "deprecated" tags.
HTML Version 4.0 and 4.01 Strict
This version of HTML adds new constructs for stylesheets, scripting languages, and adds many details to forms and tables, but many tags, including <ISINDEX>, are excluded.
XHTML Version 1.0 Transitional
The initial version of XHTML features virtually all the same features as HTML 4.01 Transitional with the difference that it is XML-based instead of SGML-based. This version includes the depreciated HTML tags, including <isindex />.
XHTML Version 1.0 Strict
The initial version of XHTML features virtually all the same features as HTML 4.01 Strict with the difference that it is XML-based instead of SGML-based. This version excludes all depreciated HTML tags, including <isindex />.
XHTML Version 1.1
<ISINDEX> has vanished altogether, never to be heard from again.

Of the really old tags, the one thing most unusual about <ISINDEX> is its need for some programming of some sort at the server end. The tag in the file displayed by the browser is only part of the picture. For these "Working examples" to be truly working, it is necessary for some programs to be running in support of it on the server end of the connection. In this case, not only do I have such a program running, but I also present here the full text of the program (a Perl script) which is actually ready to run in support of this. It is however beyond the scope of this demonstration to implement a full search engine. Since I am concerned with the <ISINDEX> tag itself only, it is enough to demonstrate the ability for the user to enter something and see it echoed as the "search response." Furthermore, though I present the Perl script in full that is executing, there will be here no attempt to explain the Perl programming language. There are plenty enough standard references out there for that. However, as a piece of quality control, my executable files used for this can be copied directly off the screen as presented here, which I also did, so what you see here is quite literally and exactly what is running on the server, or included in this file as a Server Side Include command.

For an <ISINDEX> example is one place where Server Side Includes can be of interest. While there must be a program to run on the server side, one can still make an ordinary HTML file with <ISINDEX> in it, and then relegate the program processing to a supplementary file to perform the processing. Such files, when called from a Server Side Include file, inherit the environmental variables (including Query string, as seen in the example of this file) of the calling file. This file here is marked as a Server Side Include file to the server by its extension .shtml and the executable Perl script file it uses, named isinsup.pl, is contained beneath the current directory in a subdirectory named cgi. The executable code contained in isinsup.pl reads as follows:

#!/usr/local/bin/perl
print "Content-type: text/html\r\n\r\n";
$sourc = $ENV{'QUERY_STRING'};
$sourc =~ tr/+/ /;
$sourc =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
if ($sourc eq '')
{
  print "Nothing has been entered yet.";
}
else
{
  $sourc =~ s/&/&amp;/g;
  $sourc =~ s/</&lt;/g;
  $sourc =~ s/>/&gt;/g;
  $sourc =~ s/"/&#34;/g;
  $sourc =~ s/ /&nbsp;/g;
  print "The user has entered \&#34\;$sourc.\&#34\;";
}

If you want to use this, the first line is the one thing that might have to be modified, give or take where the Perl script interpreter is located on your server. Other than that, it is a straight cut and paste (and enable the script file for execution, also ensuring that your server is enabled for CGI script execution, Server Side Includes, and specifically the exec command), and you can try this at home. Notice how the $sourc = $ENV{'QUERY_STRING'}; line extracts the query string fed to the calling .shtml file. The next two lines pertain to url encoding (see more of that below) so as to reconstruct in the program the exact phrase the user typed in. Normally, that should be enough for a search engine, but here in this example where we are simply echoing it back, a few additional steps have been added to ensure that the HTML "trigger characters" (e. g. & and <) are harmlessly rendered with HTML entities and as such readible in the browser. The program is invoked from this .shtml file with the following command:

<P><!--#exec cgi="cgi/isinsup.pl" --></P>

and the result is (try going to the <ISINDEX> widget window (usually located at the top of this file) and entering something):

Nothing has been entered yet.

Most of the example files called from this file call the same routine using the same Server Side Include command as utilized above. The more conventional way an <ISINDEX> tag file is handled is by being entirely generated by a script. The following simple script file, as seen here, generates an HTML file:

#!/usr/local/bin/perl
print "Content-type: text/html\r\n\r\n";
print "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0 Strict Level 1//EN\">\r\n";
print "<HTML VERSION=\"-//IETF//DTD HTML 2.0 Strict Level 1//EN\">\r\n";
print "<HEAD>\r\n";
print "<META HTTP-EQUIV=\"Content-Type\" CONTENT=\"text/html\; charset=utf-8\">\r\n";
print "<TITLE>ISINDEX Example</TITLE>\r\n";
$sourc = $ENV{'QUERY_STRING'};
$sourc =~ tr/+/ /;
$sourc =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
if ($sourc eq '')
{
  print "<ISINDEX>\r\n";
  print "</HEAD>\r\n";
  print "<BODY>\r\n";
  print "<P>ISINDEX Example</P>\r\n";
  print "</BODY>\r\n";
  print "</HTML>\r\n";
}
else
{
  $sourc =~ s/&/&amp;/g;
  $sourc =~ s/</&lt;/g;
  $sourc =~ s/>/&gt;/g;
  $sourc =~ s/"/&#34;/g;
  $sourc =~ s/ /&nbsp;/g;
  print "</HEAD>\r\n";
  print "<BODY>\r\n";
  print "<P>Searched for is \&#34\;$sourc\&#34\;</P>\r\n";
  print "<P>Click on Back button to return to menu.</P>\r\n";
  print "</BODY>\r\n";
  print "</HTML>\r\n";
}

This script can be executed from here. This one may be easier to try at home since it does not need Server Side Includes (nor its exec command, specifically) to be enabled. When this program is run, it generates an HTML file that looks like this when nothing has been entered:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 Strict Level 1//EN"> 
<HTML VERSION="-//IETF//DTD HTML 2.0 Strict Level 1//EN"> 
<HEAD> 
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8"> 
<TITLE>ISINDEX Example</TITLE> 
<ISINDEX> 
</HEAD> 
<BODY> 
<P>ISINDEX Example</P> 
</BODY> 
</HTML>

Alternatively, is something has been entered, it looks like this:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 Strict Level 1//EN"> 
<HTML VERSION="-//IETF//DTD HTML 2.0 Strict Level 1//EN"> 
<HEAD> 
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8"> 
<TITLE>ISINDEX Example</TITLE> 
</HEAD> 
<BODY> 
<P>Searched for is &#34;abc&#34;</P> 
<P>Click on Back button to return to menu.</P> 
</BODY> 
</HTML>

In the above, one sees what the generated file looks like if the user entered "abc," for example.

One thing <ISINDEX> is especially useful for is demonstrating how url encoding works. URL encoding means taking what the user entered and converting it to a different string which can be used as a url. It cannot have any spaces, and many special characters must be converted since they have special meanings in the context of a url, and must therefore be converted. One thing to see is how spaces are converted into plus (+) signs (making the plus sign itself one of the special characters that must be converted). All punctuation except @, *, -, _, and . must be converted, along with any non-ASCII characters. So they each get changed into the format of a % followed by two hexidecimal digits (0-9A-F) so, for example, the plus sign itself, if entered, shows up in the url as a %2B. That is why the Perl script (or whatever program one uses) must first convert actual plusses into spaces and %nn sequences into single byte characters. The best part is that by setting the charset of the file to "utf-8" (which is acceptible in HTML 2 since it is done as a mere value fed to an attribute of the <META> tag and there are no restrictions on its attribute values) it becomes able to accept and display (if your browser is enabled to display) such horrible things as "أَبْجَدْ" (Arabic for how one begins to recite the Arabic ABC's) can be entered and will display where echoed, since the multible bytes of utf-8 simply become multible %nn sequences in the url. You can copy and paste from the screen here to the <ISINDEX> widget window and see for yourself what it does. If your browser cannot handle Arabic, try this Vietnamese name instead: "Ngô Đình Thục."

Recommended Implementation

Though more than one occurrance of <ISINDEX> in the head of the document is strictly speaking an error, its presence there, either one or more times, should result in a single widget in the browser display frame, similar to how it shows on Mosaic, though it may be at the top or the bottom, reversed in direction for right-to-left, or even put on a side for vertical languages such as Chinese. Putting <ISINDEX> in the head shows that one does not want it in the main flow of the scrolling text but always present (not moving ) on the screen and independant of the scrolling text. Relatively few styles would apply to it, though one could implement color, typeface, and window size style commands, using those styles specified in the last instance within the head of the document (if erroneously more than one occurs in the head of the document).

If occurring in the body of the document, <ISINDEX> should insert a widget window at the point in the flow of the text at which it occurs, as many times as it is found in the body of the document, and with the style commands implemented as specifically applied to the particular instance. It should by default be implemented consistent with most modern browsers which have an <HR> line above and below the prompt and widget window. If a text direction DIR is right to left, the prompt string and text window should start from the right hand side, with the prompt string itself to the right of the widget window. The <ISINDEX> widget window should be able to respond to the pressing of the Enter or Return key on the terminal keyboard so no "SUBMIT" button would be needed. Finally, any <FORM> widget of type TEXT where the NAME is "ISINDEX" (case insensitive) and not connected to a SUBMIT button, should not have an "ISINDEX=..." put in front of the entered text (much as Microsoft Explorer and the older (pre Version 5) Netscape handles it).

Style sheet commands should apply to the whole widget, where applicable by nature. Though my examples here have been tailored to the Microsoft model of applying all style commands to the widget window itself alone (and I am personally more comfortable with that), there is a good case to be made for applying it to everything else as well, for example replacing the surrounding <HR> lines with whatever border commands are given, and applying text styles to both the prompt string and the text entered characters in the widget window itself. When used with vertical languages (such as Chinese) the whole <ISINDEX> widget should be aligned vertically as well.

Upgrades and Downgrades

I have created a small cluster of demonstration files to show the various upgrades and possible replacements for the <ISINDEX> element contained in this file.

Possible downgrades are:

Possible upgrades are:

This file, "isin.shtml," is HTML 2.0 Strict Level 1 compliant, even when something is entered.
The <ISINDEX> small test demonstration file "cgi/test6.pl" is HTML 2.0 Strict Level 1 compliant, even when something is entered.
The HTML 2 Level 2 multiple <ISINDEX> tag demonstration file "isin1.shtml" is HTML 2.0 Strict (Level 2) compliant, even when something is entered.
The HTML 2 Level 2 <FORM> and <INPUT> demonstration file "isin2.shtml" is HTML 2.0 (Level 2) compliant, even when something is entered.
The <ISINDEX> "PROMPT" attribute demonstration file "isin3.shtml" is HTML 3.2 compliant, even when something is entered.
The <ISINDEX> Stylesheet attributes demonstration file "isin4.shtml" is HTML 4.01 Transitional compliant, even when something is entered.
The <isindex /> XHTML 1.0 demonstration file "isin5.shtml" is XHTML 1.0 Transitional compliant, even when something is entered.
The Forms and Frames "<ISINDEX>" Emulation demonstration file "isin6.html" is HTML 4.01 Frameset compliant.
The Forms and Frames Upper portion demonstration file "isinf1.html" is HTML 4.01 Transitional compliant.
The Forms and Frames Lower portion demonstration file "isinf2.html" is HTML 4.01 Transitional compliant.
The Propriatary attribute "ACTION" demonstration file "isin7.html" is not any kind of HTML compliant.


Show a real working Instance              Next Level Up