Search Catalog

Constructing a query
Understanding the Search Progress screen
Viewing the Output of Search Results
Controlling who can search through resources
Search engine performance measurements
Notes on software architecture
Limitations

1. Constructing a query

Queries are constructed with the three logical operators: AND, OR, and NOT. These logical operators combined with search terms creates a logical expression (Boolean query). Search terms can either be alphanumeric words, or phrases delimited by quotation marks. Logical operations can be grouped together with parentheses.

Examples include:

(Harrison and not Albertelli) or (Kortemeyer and "Alexander Sakharuk")
not "invariant set" and topology
prokaryot or bacteria
not einstein and not bohr

The search is case insensitive. (Logical operators are also evaluated in a case insensitive manner, e.g. and=AND.)

2. Understanding the Search Progress screen

The Search Progress screen provides five pieces of information. This information is dynamically updated every second as the search progresses across the LON-CAPA network.

The number of library servers being scanned.
The number of database hits found.
The time elapsed (in seconds).
A grid showing the response status of every LON-CAPA library server.
A window for displaying response details of individual LON-CAPA library servers.

The response status grid consists of the following symbols:

: unknown; the server has yet to be contacted
: a network connection cannot be established with the server
: a network connection was established to the database, but search results have yet to be completely transmitted from the database
: a network connection was established and all search results are transmitted; however, there are no matching records for this server for this search
: a network connection was established and all search results are transmitted; there is at least one matching record on this server for this search

3. Viewing the Output of Search Results

The interface provides four different ways to format the output of metadata information.

Detailed Citation View

Description: Per database record, this view shows the following fields: Owner, Last Revision Date, Title, Author, Subject, Keyword(s), Notes, MIME Type, Language, Copyright/Distribution, Extra custom metadata fields, and Short Abstract. This view is meant to show a nicely formatted, detailed listing of data describing a LON-CAPA resource.
Example:

Summary View

Description:
Example:

Fielded Format

Description:

field_name: field_value

Example:

XML/SGML

Description:

<field_name>field_value</field_name>

Example:

4. Controlling who can search through resources

Currently, any user can see metadata for any published resource. We are working to change this and are considering two possibilities:

Browsing and searching should only be
 
* either user specific (georgio can only browse and search
   /res/DOMAIN/georgio)
* or has advanced status as indicated by $ENV{'user.adv'}

If user can access resource through current role (student in a
class, etc) then it should show up on searching and browsing.
Even if resource conditionals prevent actually viewing
the specific resource.  Advanced users can search and browse
"everywhere".

5. Search engine performance measurements

6. Notes on software architecture

LON-CAPA is meant to distribute A LOT of educational content to A LOT of people. It is ineffective to directly rely on contents within the ext2 filesystem to be speedily scanned for on-the-fly searches of content descriptions. (Simply put, it takes a cumbersome amount of time to open, read, analyze, and close thousands of files.)

The solution is to hash-index various data fields that are descriptive of the educational resources on a LON-CAPA server machine. Descriptive data fields are referred to as "metadata". The question then arises as to how this metadata is handled in terms of the rest of the LON-CAPA network without burdening client and daemon processes. I now answer this question in the format of Problem and Solution below.

PROBLEM SITUATION:

  If Server A wants data from Server B, Server A uses a lonc process to
  send a database command to a Server B lond process.
    lonc= loncapa client process    A-lonc= a lonc process on Server A
    lond= loncapa daemon process

                 database command
    A-lonc  --------TCP/IP----------------> B-lond

  The problem emerges that A-lonc and B-lond are kept waiting for the
  MySQL server to "do its stuff", or in other words, perform the conceivably
  sophisticated, data-intensive, time-sucking database transaction.  By tying
  up a lonc and lond process, this significantly cripples the capabilities
  of LON-CAPA servers. 

  While commercial databases have a variety of features that ATTEMPT to
  deal with this, freeware databases are still experimenting and exploring
  with different schemes with varying degrees of performance stability.

THE SOLUTION:

  A separate daemon process was created that B-lond works with to
  handle database requests.  This daemon process is called "lonsql".

  So,
                database command
  A-lonc  ---------TCP/IP-----------------> B-lond =====> B-lonsql
         <---------------------------------/                |
           "ok, I'll get back to you..."                    |
                                                            |
                                                            /
  A-lond  <-------------------------------  B-lonc   <======
           "Guess what? I have the result!"

  Of course, depending on success or failure, the messages may vary,
  but the principle remains the same where a separate pool of children
  processes (lonsql's) handle the MySQL database manipulations.

7. Limitations

The metadata search can only consist of spaces and alphanumeric characters. Other characters are illegal and are filtered out when sending the search request to the search engine.

LON-CAPA library servers are given 9 seconds to inform another server that they are in the process of generating a reply to a search request. Note that this is DIFFERENT than actually conducting the search. Upon initial communication, the individual library servers just send a response key to indicate the name of the results file that is going to be generated.

LON-CAPA library servers will only send up to 100 records in response to a search.

The output of matching records is limited to 200 records.

The capping of results to values of 100 and 200 should eventually be user modifiable. These limitations exist to avoid processing overly expansive search requests.