merged 1.1 branch into head

[mir.git] / doc / developers-guide / search.xml
diff --git a/doc/developers-guide/search.xml b/doc/developers-guide/search.xml

new file mode 100755 (executable)

index 0000000..da5a628
--- /dev/null
+++ b/doc/developers-guide/search.xml
@@ -0,0 +1,89 @@
+<chapter id="search_framework">
+<title>Search Framework</title>
+Please read the short presentation of the <glossterm linkend="search">search framework</glossterm> for an introduction. You can also check the javadoc.
+<section><title>The SearchTerm class</title>
+The SearchTerm class attempts to encapsulate the relationships
+between:
+<itemizedlist>
+<listitem>A fields or property of Content Entities </listitem>
+<listitem>A field of Lucene Documents</listitem>
+<listitem>An HTTP Query Parameter</listitem>
+<listitem>And a bit of HTML on a Search Results Page   </listitem>
+</itemizedlist>
+<para>
+The  basic  idea  is that  how  you  index,  query, and  display  a
+particular field in a resource are all intimately related, possibly
+more  so  than how  you  index two  different  fields  of the  same
+resource.
+</para>
+<para>
+Instances of classes implementing SearchTerm are created when a Mir
+content entity  is indexed by the  IndexingProducerNode Class.  The
+index  method of  each class  is called  in turn  to add  a  bit of
+information  to the  Lucene documents  which will  be added  to the
+index after it is created  and all its fields specified.  Instances
+of the same classes are created by ServletModuleOpenIndy so that it
+can construct a  query to match against the  lucene index, here the
+makeTerm methods are called in  turn to pick out the parameter they
+want from  the request and then construct  the appropriate fragment
+of lucene query, which are ultimately concatenated together.  These
+classes  are  also  used  to  return  appropriate  template  models
+representing any hits to be displayed as a result of processing the
+query.
+</para>
+</section>
+<section><title>Available search classes</title>
+<variablelist>
+<varlistentry>
+<term><classname>ContentSearchTerm</classname></term><listitem>tokenizes a string field in an Entity and indexes
+it, but does not store it for retrieval (used for content_data)
+</listitem></varlistentry>
+
+<varlistentry>
+<term><classname>ImagesSearchTerm</classname></term><listitem>indexes whether or not an Entity has associated
+images, and also stores urls of those images for retrieval in the search results
+</listitem></varlistentry>
+<varlistentry>
+<term><classname>KeywordSearchTerm</classname></term><listitem>indexes a field and stores it for retrieval, but
+does not tokenize it.  useful for things like strings representing
+dates.
+</listitem></varlistentry>
+<varlistentry>
+<term><classname>MediaSearchTerm</classname></term><listitem>not used
+</listitem></varlistentry>
+<varlistentry>
+<term><classname>AudioSearchTerm</classname></term><listitem>indexes whether an Enity has audio 
+</listitem></varlistentry>
+<varlistentry>
+<term><classname>VideoSearchTerm</classname></term><listitem>indexes whether an Enity has video
+</listitem></varlistentry>
+<varlistentry>
+<term><classname>TextSearchTerm</classname></term><listitem>tokenizes a string field in an Entity and indexes
+it, and stores it for retrieval (used for description)
+</listitem></varlistentry>
+<varlistentry>
+<term><classname>TopicSearchTerm</classname></term><listitem>used by indexing and querying documents based on Topic
+</listitem></varlistentry>
+<varlistentry>
+<term><classname>UnIndexedSearchTerm</classname></term><listitem>Stores some metatdata for retrieval with a hit (for example a URL)  
+</listitem></varlistentry>
+<varlistentry>
+<term><classname>UnStoredSearchTerm</classname></term><listitem>not currently used
+</listitem></varlistentry>
+</variablelist>
+
+</section>
+<section><title>Lucene field types</title>
+<para>
+The  following brief  guide  to  lucene field  types  is invaluable  in
+figuring out what a particular SearchTerm does:
+</para>
+
+        Keyword is stored and indexed, but not tokenized
+         Text is tokenized,stored, indexed
+        Unindexed is not tokenized or indexed, only stored
+         Unstored is tokenized and indexed, but not stored
+
+</section>
+
+</chapter>
+\ No newline at end of file