Support
Help Save Reptile!
Navigation

Essentials

Installation

Developers

P2P (content distribution)

Search Infrastructure

Services

Proposals

Resources

Search

One of the most important pieces of Reptile is out Search framework. This provides Reptile with a plugin infrastructure for tying in 3rd party search infrastructures such as JXTA, Lucene and even the internal Torque DB index that Reptile uses.

Here is a UML sequence diagram which demonstrates how everything is put together.


SearchProviders

Essentially everything is based around a SearchProvider:

All searches are abstracted with a SearchRequest:

After running a request, each SearchProvider is held with a SearchProviderManager.

SearchProviderManagers are also used for obtaining references to SearchProviders and for garbage collection activities.


Results

All Reptile search results are then serialized into an XML result set. This content is used by the Reptile sequence system to provide a UI for the user so that they can navigate the search results.

Since this is just XML, Reptile can provide additional XSL stylesheet for each type of provider. For example a ChannelSearchProvider can provide a stylesheet for navigating and subscribing to channels.

Example:


<!--

     Search document declaration. Includes all namespaces and:
     
     - provider-handle: used for future requests from this provider.

     - provier-state: ( search-started |search-in-progress | search-complete )
                      Basically the state the provider is in.

     - provider-name: The short classname of this SearchProvider.

     - request-name: The short classname of this SearchRequest (or
                     AdvancedSearchRequest)

     - search-started: the (UNIX) time this search started

     - search-completed: the (UNIX) time this search completed (optional)

     - search-time: the total number of millisecond this request took to execute. (optional)

     -->
            
<search:search xmlns:search="http://schemas.openprivacy.org/reptile/search"
               xmlns:dc="http://purl.org/dc/elements/1.1/"
               xmlns="http://www.w3.org/1999/xhtml"
               provider-handle="1012887691"
               provider-state="search-complete"
               provider-name="ArticleSearchProvider"
               request-name="SearchRequest"
               search-started="1013209070341"
               search-completed="1013297300042"
               search-time="100">

    <!-- SearchProvider/SearchRequestion specific information can be added here.
         We can also add other reptile specific information here including OCS
         feeds, subscriptions, monitors, etc. -->

    <!-- A serialization of the SearchRequest for 'Linux' -->
    <search:request>

        <search:criteria>

            <!-- List of strings given as criterias -->
            <xsd:string>Linux</xsd:string>
            
        </search:criteria>

        <search:search-fields>

            <xsd:string>TITLE</xsd:string>
            <xsd:string>DESCRIPTION</xsd:string>

        </search:search-fields>

        <search:sort-order>

            <xsd:string>DATE_FOUND</xsd:string>

        </search:sort-order>

    </search:request>

    <!--
         
         results element provides an X SearchProvider mechanism navigating through
         results.

         Attributes:

         - start: The index number for the first entry in this result set.

         - end: The index number for the first entry in this result set.

         - found: The total number of results this search has found.

         - total: The total number of entries this SearchProvider is exposed to.
         For in-memory databases this is provided.  For distributed
         systems, aka JXTA, this can't be determined (because P2P
         systems by definition are non-deterministic) and could
         potentially be the same as 'found'

         - page: the page number that this request falls on (uses a 0 based
         index). 
         
         - total-pages: The total number of pages this SearchProvider contains.
         This can be used by a stylesheet for providing a UI so
         that the user can navigate to the next page or an
         arbitrary page in the index.
         
     -->
    <search:results start="0" end="9" found="25" total="398" page="0" total-pages="40">

        <search:entry>

            <!-- title for this entry -->
            
            <dc:title>Linux on the desktop is alive and kicking!</dc:title>

            <!-- A description, this is optional as in RSS -->
            
            <dc:description>
                This article over at LinuxPlanet tries to convince
                the reader that because of recent events, Linux on the desktop is
                never going to happen.  The author couldn't be more wrong.  His
                logic doesn't hold up when compared to historical evidence.  He
                sites the death of Eazel as one example.  This is at the very
                minimum irrelevant.  The Linux movement has nothing to do with
                companies (even though it is nice to have their involvement).  The
                entire KDE project was created with little involvement from
                companies.
            </dc:description>

            <!--

                 link information including the date information.

                 In distributed systems that don't use the DB index, date-found is
                 the current time in milliseconds that we found the entry.

                 When using the index, this is the last time the metaupdate system
                 updated this entry.

                 - date-found: cointains the date this links was found (in
                               milliseconds since Jan 1 1970) (required)

                 - last-updated: same as date-found but the time this URL was
                                 last updated. (required)

                 - channel: the channel (RSS, etc) that this URL is held
                            in. (optional)
                 
             -->
            
            <search:link date-found="1009940886"
                         last-updated="1009940889"
                         channel="http://www.slashdot.org/slashdot.rdf"
                         location="http://www.linuxplanet.com/linuxplanet/opinions/3387/1/"/>
        </search:entry>

    </search:results>

</search:search>
            
            


Search runtime issues.

Problems and runtime exceptions can happen during a search. These issues must be reported to the user in a reliable and convenient manner. To that end every search result can support an error.


<search:search xmlns:search="http://schemas.openprivacy.org/reptile/search"
               xmlns:error="http://schemas.openprivacy.org/reptile/error"
               xmlns:dc="http://purl.org/dc/elements/1.1/"
               xmlns="http://www.w3.org/1999/xhtml"
               provider-handler="1012887948"
               provider-state="search-complete"
               provider-name="ArticleSearchProvider"
               request-name="SearchRequest">

    <!--
         A serialization of the SearchRequest for 'Linux'
         -->
    
    <search:request>

        <search:criteria>

            <!-- List of strings given as criterias -->
            <xsd:string>Linux</xsd:string>
            
        </search:criteria>

    </search:request>
    
    <search:results start="0" end="0" found="0" total="398" page="0" total-pages="0"/>
    
    <error:error type="error">

        No results found.

    </error:error>

</search:search>

</search:search>


Extensible search parameters

Certain types of SearchProviders support different types of search parameters that can't really be abstracted into a generic SearchRequest. A decent example would be Mojonation:

            "File retrieval on Mojo Nation begins with a content search. At the
            search page, the user can select from a growing number of content
            types, and each of the content types presents its own array of type
            fields to delineate the user's search (that is, the user could
            search for a certain 'bitrate' among the 'audio' content types, but
            not others). After the user provides his search criteria and clicks
            'search,' the Broker goes back to work"
        

If we wanted to map a Reptile search provider on top of Mojonation, and at the same type provide the type of query framework a Mojanation user might expect, we would need to have a manner for providing Mojonation style content types within a SearchRequest.

Fortunately such an extension mechanism exists. The SearchRequest object supports an ExtendedProperties object which accept types and multivalued properties. Basically name|value|type pairs that a developer can set in order to tweak search parameters.


Invoking searches through Actions

All searching through Reptile (when done through a browser) are executed by Actions. The default SearchAction accepts the following parameters:

reptile.search.order
channel | title | date_found
reptile.search.fields
comma separated list of fields to match Example: title, description, location, etc.
reptile.search.maxcount
maximum number of items to return 100, 200, 300, etc
reptile.search.provider
The short classname of the provider to use to execute the query.
reptile.search.request
The short classname of the request to use to execute the query.
reptile.search.request-name
A named request name. IE 'NewestArticlesSearchRequest'.
reptile.search.provider-name
Execute a request on a specific provider.

All pages that need to invoke searches should use the Search action. This will handle executing the search with the correct provider and search request and redirect to the urn:search sequence with the correct params.


Serializing requests and page navigation.

Every search provider is broken down into individual atomic pages of results and presented to the user. This is similar to the usual page navigation used in any popular search engine. Each page can display around 10 results and then you can navigate forward and backward through the result set.

The SearchSerializer handles breaking down a SearchProviders results into XML documents which represent a page.

All required XML is given to Reptile from the SearchExtension:

This extension is invoked as an Xalan extension element in the following manner:

<serializer:serialize page="2" provider-handle="1012870437"/>

When invoked within a sequence we use the reptile.search.page parameter to determine which page number to serialize. The provider-handle is used pop this provider from the ProviderManager.


Search sequences

The following stylesheet sequences are used by Reptile search.

urn:search-serialize
Serializes individual pages and provides XML output.
urn:search-serialize/control
Serializes individual pages and provides XHTML output within a Reptile control.
urn:search-serialize/page
Serializes individual pages and provides XHTML output within a Reptile page.
urn:search-request
Executes a search request and then serializes the results to XML.
urn:search-request/control
Executes a search request and then serializes the results within a Reptile control.
urn:search-request/page
Executes a search request and then serializes the results within a Reptile page.


Advanced search requests

Reptile also supports the concept of a AdvancedSearchRequest. This is a compiled search request that contains complex queries that may be specific to a Provider.

For example the UnreadArticlesSearchRequest will find all articles that have not been marked read.


Remote search providers

Reptile supports the concept of a RemoteSearchProvider. This allows us to invoke requests on behalf of a specific network binding. The search() request takes any action necessary to execute



Copyright © 2001-2003, OpenPrivacy.org