watson

Homepage: http://emarsden.chez.com/downloads

Author: Eric Marsden

Updated:

Summary

Query web search engines and aggregate results

Commentary

Overview ==========================================================

watson.el is an emacs interface to web search engines such
as Altavista. Given a number of keywords to search for, it will
send the query to several search engines. The results are then
aggregated and displayed in a *watson* buffer. Currently backends
exist for the search engines Altavista, Google, Yahoo!, Excite,
Snap, ftpsearch, Dejanews and dmoz.org.

Entry points:

 * `M-x watson' which queries for keywords in the minibuffer,
   dispatches the requests, then pops up the results according to
   the variable `watson-notify-method'

 * `M-x watson-referers' which queries for an URL, then queries
   certain search engines to provide a list of web pages which link
   to that URL

 * `M-x watson-form' which provides a full-page form which allows
    you to customize different aspects of the search: limit the
    query to a subset of the available backends; select synchronous
    or asynchronous search.

watson.el tries to rank hits intelligently: if an url is returned
by more than one search engine, its rank will be increased. Hits in
the same site are coalesced, with an increased rank. The ranking
also takes into account the order in which hits were presented by
the search engines.

The *watson* buffer is set up so that URLs are clickable (using the
`browse-url' package to dispatch to your favorite browser). `n' and
`p' more to the next and previous match respectively, `?' issues a
`HEAD' request to the server to obtain information such as the date
of last modification of the file. button-3 in XEmacs pops up a
contextual menu. 

The watson module will issue multiple http requests in parallel if
the `watson-async' variable is non-nil (which is the default). In
this mode of operation it will use an external program such as lynx
for downloads (though Emacs/w3 is still required, for encoding
parameters). Otherwise requests are issued sequentially, using
Emacs/w3. On some braindead platforms without subprocess support
this is the only mode of operation which will work.

watson.el was inspired by the Sherlock web search program
shipped with recent releases of MacOS.

Tested with Emacs 20.4/Solaris, Emacs 20.2/NT, XEmacs 20.4/Solaris.
Please note that watson.el depends on carnal knowledge of the HTML
generated by the different search engines which it queries. This
HTML may change occasionally as search engines undergo relifts. In
this event Watson will no longer work correctly for that search
engine, since it will no longer be able to extract the useful
information from the HTML markup. If this occurs, it should signal
an error telling you to upgrade to a newer release (which you
should be able to obtain from the URL above).

References: 

   
   
   
   
   
   @InProceedings{dsl97*1,
     author =       "Luca Cardelli and Rowan Davies",
     title =        "Service Combinators for Web Computing",
     pages =        "1--10",
     ISBN =         "1-880446-89-8",
     booktitle =    "Proceedings of the Conference on Domain-Specific
                    Languages ({DSL}-97)",
     month =        oct # "~15--17",
     publisher =    "USENIX Association",
     address =      "Berkeley",
     year =         "1997",
   }
   
   

   


Thanks ============================================================

Thanks to Robert J. Chassell , Boris Goldowsky
, Christoph Conrad
 and Marko Schütz
 for many excellent suggestions.


TODO ==============================================================

How to handle complex boolean queries? Could implement our own
syntax, like (and "one" (or "two" "three")), which we then convert
to each engine's format. But some of the best search engines (like
Google) don't support boolean queries.

Customization.

Dependencies