Kriterion

  Search Engine and Information Analysis

What is Kriterion?

Kriterion is a fully functional implementation of a document retrieval method.
Work began on day 29 of November 1999 as part of a suite of tools that were intended to result in a SIGINT package.
The search engine showed first light in 2000 when all major bugs could be removed.

What are the system requirements?

In preparation to run Kriterion the following requirements must be fulfilled:
  • an Apache server must be installed for the user interface
  • gcc has to be available to compile the 2 files of C Source
  • documents or what the user wants to be searched through have to be converted to ascii

What is part of the package?

  • a server side search engine listening on a user defined port
  • a webclient for communication with server
  • a small tool to convert word documents into ascii

How does it work exactly?

The search engine uses documents in the format of ASCII text.
All documents are parsed in a special manner and a score for each document is calculated.
After processing, the main program listens on a port that has to be configured via argument passing before starting the service.
Now a user calls up the Webinterface and a java servlet provided within the UI connects to the port of the main programm.
A search query can be typed into an input field and is sent to the main programm.
Over a simple, proprietary protocol the client communicates with the server and get's synchronously the result.
Now the documents that best match the score of the query are returned.
Documents in the result list can be displayed and passages containing words of the search query are highlighted.

Additionally a user can press the button "find similar documents" after having marked a document in the result list.
Now the documents with highest similarity are returned.
The results are always returned in realtime due to comparing just the scores on the server side.

More than 6 years ago, the calculation of the scores took an enormously time: On day 15 of december 1999 a Pentium-III 500Mhz took 23 hours to index 1582 normal sized documents.
Luckily, this got faster and faster over the years: A Pentium IV takes today 28 Minutes for 2127 documents to index.

News

What Platforms are Supported?

Downloading Kriterion

Support for Kriterion

Who is Kriterion?


SourceForge.net Logo
Last updated: August 3, 2006