Search Websites
|
| Version | ht://Dig 3.1.5 |
| Hardware Specifics | 384M RAM -- Disk Consideration |
| Production Directory | /usr/www/ss/ |
| Test Directory | /usr/www/ssd/ |
/usr/www/ss/ is actually the home directory for a WWW account set up for the special purpose of providing a UGA-wide site search (hence the name ss for the account). This home directory contains the following sub-directories:
home.html in this directory is nothing more than a pointer to the initial search page http://www.uga.edu/search.
The local directory contains UGA images used on the results pages.
ScriptAlias /ss-bin/ /usr/www/ss/bin/A simple example of the HTML which invokes htsearch:
<form method="post"
action="/ss-bin/htsearch">
<font size=-1
face="helvetica,arial,universal">
Search UGA Websites:<br>
</font>
<font class="FORM"
size=-1
face="helvetica,arial,universal">
<input
type="text"
size=15
name="words"
value="">
</font>
<input
type="submit"
value="Go">
<input
type=hidden
name=method
value=and>
<input
type=hidden
name=config
value=htdig>
<input
type=hidden
name=restrict
value="">
<input
type=hidden
name=exclude value="">
</form>
rundig is the single command which actually performs the indexing (rundigt is a test program). It is slightly modified from the distribution:
max_hop_count: 5The hop count sets the number of "hops" followed from the UGA homepage for the pages to be included in the UGA primary database.
# Run ht://Dig 0 9 * * 5 /usr/www/ss/htdig/bin/rundig -a > /dev/null 2>&1 #The -a creates the "alternate" .work files. Default configuration file htdig.conf is used (more on the configuration files in htdig/conf).
htdig.conf is the default configuration file, used to create the UGA primary database.
htdig-u.conf is based on the default configuration file. It sets a different common_dir for general use (more on common_dir in htdig/common) in conjunction with the restrict variable in the HTML form which invokes htsearch. The UGA primary database is searched, but results will match the value of restrict. Example (changed lines emphasized):
<form method="post"
action="/ss-bin/htsearch">
<font
size=-1
face="helvetica,arial,universal">
Search UGA Websites:<br>
</font>
<font
class="FORM"
size=-1
face="helvetica,arial,universal">
<input
type="text"
size=15
name="words"
value="">
</font>
<input
type="submit"
value="Go">
<input
type=hidden
name=method
value=and>
<input type=hidden
name=config value=htdig-u>
<input type=hidden
name=restrict
value="http://www.coe.uga.edu/coenews/">
<input
type=hidden
name=exclude
value="">
</form>
Notice the use of the absolute URL for the value of restrict.
This is required -- forcing restriction to a particular site but
enabling a URL in the results pages by using the value of
restrict
(more on the results pages in
htdig/common).
Additional documentation provides more complete instructions on the use of restrict.
The remaining configuration files not owned by root are maintained by departmental webmasters (with the assistance of the UWC). These webmasters are encouraged to use restrict if at all possible. The *t.conf configuration files are provided to the departmental webmasters for testing purposes.
footer.html header.html nomatch.html syntax.htmlThe other .html files are output templates which could be used in addition to, or in place of, the ones above. The ht://Dig website has more information on these other templates.
common is used to customize output for the search results pages used on the main UGA pages (e.g., a search performed from the UGA homepage). common-u is used in conjunction with the restrict variable, as described in htdig/conf (as in this example: WWW Pages for Departments, Organizations, and Units). The graphic used for the results pages is different and the .html files in common-u include a URL to the restrict value (absolute URL required):
<a href="$(RESTRICT)">$(RESTRICT)</a>The non-HTML files (most of which end in .db), are endings and synonym databases. These files are semi-static and are not rebuilt each time a new index is created. It is important to note, however, that these new .db files will only be created in common and are linked to common-u.
Subsequent to each rundig, a set of alternate work files with .work appended to the name of the file is created. The UWC is responsible for moving the .work files to the production equivalent (removing the .work extension).
It can take as long as 36 to 48 hours for rundigt to complete. The UWC should use this time estimate, the relative size of the .work files, and the system top command (to insure that neither htdig or htmerge are running).
Total disk space required for the indexes is approximately 4 times the amount of the set of production database files -- the set itself, the .work files, the temporary files used for sorting and merging (as defined in TMPDIR in rundig), and a backup set of the production database files.
A mirror account and CGI directory are available for testing new versions of ht://Dig. The account is ssd with a home directory of:
/usr/www/ssdThe CGI directory is bin within this directory and the Apache server configuration file /usr/local/apache//conf/httpd.conf includes the following ScriptAlias directive to enable this directory as a CGI directory:
ScriptAlias /ssd-bin/ /usr/www/ssd/bin/Be sure to remove the public_html, htdig, and bin directories before installing a new version of ht://Dig in /usr/www/ssd. This will insure a new initial set of binaries and related filies.
/usr/local/src/htdigUngzip and untar the file here; cd to the newly created directory; read README for general installation instructions. (Also see these local instructions for any special considerations, should they exist.) Follow the pointer in README to the installation document.
Run:
configureas described in the installation document.
When configure has completed, edit CONFIG. There are several values that need to be changed in CONFIG and should be changed only after reviewing the current production version's CONFIG file. Change the new version's CONFIG file with respect to the mirror account -- replacing ss with ssd throughout the file.
After editing CONFIG, run
makeIt may take a little while, but all the binaries should build without incident.
After make, run
make installto install the ht://Dig components as specified in CONFIG.
HTML forms located in:
/usr/local/apache//htdocs/search/testand accessed via the URL:
http://www.uga.edu/search/testcan be used to test:
ssdand NOT:
ssThis is particularly important with respect to where the database files are to be written, since it would be possible to overwrite the production database files. The test database files should be written to:
/usr/www/ssd/htdig/dband rundigt should be run from:
/usr/www/ssd/htdig/bininitially as:
./rundigt -c ../conf/test.conf &and subsequently as:
./rundigt -a -c ../conf/test.conf &Keep in mind that the -a creates an "alternate" set of work files named *.work. These files must be moved to names with .work removed.
If testing indicates that the HTML which invokes htsearch requires modification, the UWC should be aware that this HTML may be incorporated into other main UGA pages for which the UWC is responsible (in addition to the homepage). These pages can be located by using the file system search tool glimpse and searching for the word htsearch.
After a successful test, and if it is determined that significant changes will result with an upgrade to the new version, inform the departmental webmasters with their own configuration files of new version testing. Offer assistance if anyone wishes to participate. Also mail the UGA Webmasters discussion list (ugawww@listserv.uga.edu) to announce testing and solicit comments and participation from this group. Subsequent mailings may also be required during installation if significant changes result from the upgrade. The level of required communication is at the discretion of the UWC and should be proportional to the assessed impact of upgrade to the new version.
Build a complete UGA primary database and test it with the test forms. If successful, be sure to keep the database files. The database files will be used as the production UGA primary database when the new version of ht://Dig is installed as the production version.
Begin the transition by informing the UGA Webmasters discussion list (ugawww@listserv.uga.edu) of when the transition will take place. Done properly there should be no interruption of service to searches performed from the UGA homepage and all subsidiary searches, including those using restrict.
Service disruptions are possible for departmental webmasters with their own configuration files if existing databases are often not compatible with new versions of ht://Dig. However, since these are typically relatively small databases, they can be rebuilt in a matter of minutes. The service interruption impact can be mitigated by short-term use of restrict.
Proceed as follows:
ScriptAlias /ss-bin/ /usr/www/ss/bin/ ScriptAlias /ssd-bin/ /usr/www/ssd/bin/
to
ScriptAlias /ss-bin/ /usr/www/ssd/bin/ ScriptAlias /ssd-bin/ /usr/www/ss/bin/and restart httpd. Swapping the two actual locations (second argument) enables ssd as the actively used version and creates a temporary ScriptAlias for the new version's permanent location (ss). Perform a few searches from the UGA homepage, which should be successful. It is quite possible that a few images may be broken images (due to hardcoded paths to images in ss/images). When the new version is installed in ss, the images will re-appear.
tar -cvf /tmp/ss.tar ssMove ss.tar to a permanent location at the completion of the new ht://Dig install.
userdel -r ssand recreate the account as any new account would be created. (with the UWC as the website administrator). This insures a clean install location.
After editing CONFIG, run
makeIt may take a little while, but all the binaries should build without incident.
After make, run:
make installto install the ht://Dig components as specified in CONFIG.
Check all files to insure that changes necessitated by the new version are applied. Pay particular attention to rundig and htdig.conf. Make sure that rundig and htdig.conf reference:
ssand NOT:
ssd
There is no need to create backup files in ss because ssd serves as a backup area to ss after the above files are copied. System backups are also available for all files located in ssd and ss.
Do not proceed with the next step unless tests are successful.
ScriptAlias /ss-bin/ /usr/www/ss/bin/ ScriptAlias /ssd-bin/ /usr/www/ssd/bin/and restart httpd.
As mentioned earlier, rundig creates temporary work files. Before these files are moved to their production equivalents, back up the current production files and remove old backups. (Always keep one generation backed up).
As a last resort in the event that the newest iteration of database files as well as the backups fail, modify the hop count in rundig from:
-h 6to
-h 3
and then run the program "by hand":
rundigThis will create a small, but functional database in a relatively short period of time. When the small database is created, reset the hop count to 6 and run rundig -a. As documented earlier, the -a option creates an alternate set of .work files which must be moved to the production equivalent (removing the .work extension).