WelcomeFeaturesDownloadInstallUsageFAQSupportWindows versionOrder mnoGoSearch for WindowsAffiliateAuthorsHistoryTODOBugsWeb boardMailing listFeedbackUsers
 Search:
 

mnoGoSearch: search engine softwareTry It!
Download search engine softwareHow to install softwareSearch engine usageFAQSupportLinksmnoGoSearch Images for links to usMirrorsContact usHelp us
Mnogo.ru network 

History

mnoGoSearch for Windows version history can be found here.

24 Oct 2001: 3.2.2

  • Added meta "Content-Language" processing, added "lang" attribute processing for <html> and <body> tags.
  • Added IBM DB2 support. Tested with DB2 EE V7.1.
  • Stored and storedoc.cgi added. Now it possible to store and display compressed copy of indexed documents with search words hilighting.
  • Tag values are now passed using "tag" form variable so that the variable meaning is more clean. Old "g" form variable does not work anymore.
  • Major documentation improvements and reorganization.
  • Fixed that category and language limits were not working.
  • Fixed that StopwordFile command didn't work in search.htm
  • Fixed that full/substring/beginning/ending word match didn't work.
  • Fixed crash in ServerTable code.
  • Fixed crash in synonyms code on some platforms.
  • qtrack table fileds types changed.
  • Fixed bug in MySQL single mode code. It could kill mysqld server when documents is big enough.
  • Fixed that iso-8859-1 entities like &eacute; were not properly converted to unicode.
  • Fixed that HTML parser considered scripts body as a text in some cases.
  • install.pl installation script has beed added.
  • Some minor configure script and code clean-ups.

27 Sep 2001: 3.2.1

  • New "Listen" searchd.conf command. It allows to bind searchd to specified host and/or port.
  • searchd now can reload searchd.conf when signal HUP is arriving.
  • Added some signal safeness in searchd.
  • Fixed that searchd.conf-dist were not included into distribution.
  • Fixed that national letters in the code range. 128-255 were considered as word separators when searchd is used. They also were not displayed in search results (body, title, etc fields).
  • Fixed some bugs in HTML tag parsers that caused indexer to stall or crash in some cases.
  • Fixed that "Proxy" command was ignored.
  • Fixed that robots.txt related code could stall or crash in threaded version.
  • Fixed compilation with Oracle problem.
  • Fixed compilation problem with errno.h on Solaris.

24 Sep 2001: 3.2.0

  • Now one can compile with several SQL databases support at the same time.
  • Now one can make a bibary distribution using "make bin-dist".
  • Added new program searchd. Among other features, it allows to build a search cluster, distributing between several machines.
  • Support for synonyms fuzzy search has been added.
  • Common words endings fuzzy search using ispell now works in 3.2 branch.
  • New "ReverseAlias" indexer.conf command. This command has the same format with "Alias" command. However, URL mapping is executed just after the moment when new link has been found. URL is stored into database after ReverseAliases applying. Among other things it allows for example to index PHP driven sites which add an unique session ID in the form "PHPSESSION=344646342345df". ReverseAlias is able to remove such substrings from URLs.
  • New "Subnet xxx.xxx.xxx.xxx" indexer.conf command. It works like Realm but checks an IP address matching instead of URL. For example, "Subnet 195.239.38.*" or "Subnet NoMatch 192.*.*.*".
  • Search results highlighting (HlBeg and HlEnd search.htm commands) now works in 3.2 branch.
  • CT-Lib support has been added. Now one can use mnoGoSearch together with SyBase and MS SQL natively, without ODCB drivers. Both original SyBase CT-Lib and FreeTDS CT-Lib are suppored. However Ct-Lib driver is still in beta.
  • indexer now works approximately twice faster with Interbase.
  • Added deflate and compress Content-Encoding's support.
  • New VarDir command in search.htm. It works like the same indexer.conf command but at search time.
  • New "Section" indexer.conf command. It is to be used instead of old ***Weight commands, which have been removed. Take a look into indexer.conf-dist and search.txt for an explanation.
  • Now it is possible to index user-defined META tags as well as HTTP response headers.
  • New "Alias" command in search.htm. It works like "Alias" in indexer.conf but at a search time.
  • Added support for external includes in search template. Format differs from 3.1.x version. Take a look into "templates.txt" for usage information.
  • "Alias" command has been extended. Now it can optionaly use powerful URL mapping using regular expressions like in "Realm" command.
  • Posix threads now should work not only Linux and FreeBSD. Detection for threads for a number of platforms has been added.
  • libudmsearch compilation with pthreads fix. It fixes Apaches with PHP mnoGoSearch extension module crashes when mnoGoSearch was compiled with pthreads support.
  • Tag parser has been rewritten. It now properly process tag attrubites with '>' signs, like for example
    <META NAME=email Contents="<general@mnogosearch.org>">.
    Earlier '>' signs inside quotes was consideter as a tag endings.
  • Apple Darwin fixes for configure scripts
  • Extended number of query parameters stored in qtrack table
  • Added url.charset field. Charset is now stored separately from content_type field. Please recreate or ALTER "url" table structure.
  • "Clones yes/no" has been renamed to "DetectClones yes/no" to avoid confusions.

8 Aug 2001: 3.2.0.b1

  • Content encoding support added (currently gzip only). Requires libz to compile. Use --with-libz to activate.
  • Fixed that $(DE) template variable was not working
  • Fixed that the correct charset was forgotten after robots.txt processing.

3 Jul 2001: 3.2.0.b0

  • Charsets processing has been rewritten. Now mnoGoSearch supports almost all widely used charsets: various single-byte charsets as well as multi-byte charsets including UTF-8, Chinese (BIG5, GB2312), Korean (EUC-KR), Japanese (S-JIS). All internal processing works using UNICODE representation. Using UTF8 as a LocalCharset one can build a multi-lingual search engine with languages which could not be indexed at the same time in 3.1.x branch, for example German+Greek+Russian+Chinese.
  • Character sets module has a new automatic language and charset detection. Currently more than 70 various charsets and languages can be detected automatically when they are not specified in "Content-type" and "Content-Language" server's response headers or html META tags.
  • News extensions now compiled without --enable-news-extensions. Use "NewsExtensions yes" indexer.conf command to activate them.
  • search.cgi has been rewritten.
  • Cache-mode has been rewritten.

22 Aug 2001: 3.1.19

  • Minor fixes for libudmsearch compilation flags when pthreads support is enabled.
  • install.pl and OpenSSL fix
  • Fixed that "Alias" indexer.conf commands for https:// URLs didn't work sometimes. Thanks Jens Thiel <Jens.Thiel@stochastix.de>

17 Aug 2001: 3.1.18

  • Indexer now doesn't parse documents with Content-Encoding HTTP header.
  • <META HTTP-EQUIV="Content-language" ... > now affects language selection for html pages. DefaultLang now have lower priority than stopwords and normalization results for language detection.
  • Several bugs in ispell related code have been fixed.
  • Crash at links like <A HREF="robots.txtxxx"&ht; has been fixed. Thanks Malakhov Alexander <shur@asu.info.kuzbass.net>.
  • indexer now sends "mode reader" string when connecting to a news server. Thanks Matthew Sullivan <matthew@netscape.com>

2 Jul 2001: 3.1.17

  • Ispell suffixes checking bug fixed.
  • Fixed a bug in cache mode. Thanks Andrew Aksyonoff <shodan@chat.ru>
  • Some potential exploits fixes.

13 Jun 2001: 3.1.16

  • Fixed that lower case flags where not processed properly in ispell affix files. Thanks Panayotis Vryonis <vrypan@yassou.net>.
  • Apple Darwin fixes for configure script.
  • Potential exploits in search.cgi clean-up.

7 Jun 2001: 3.1.15

  • Fixed that libssl linking failed in some cases.
  • Fixed minor memory leak in threaded version.
  • Added checking for successful mysql_init() result.
  • Added DMALLOC memory debugger support. Use --enable-dmalloc to enable it.

28 May 2001: 3.1.14

  • FTP functions have been renamed to avoid PHP extension module compilation problems.
  • socketlen_t type checking has been added into configure. That should fix compilation warnings on Mac OSX.
  • A bug in incorrect pid file name composing in cachelogd has been fixed.
  • Fixed that URLs where inserted with incorrect tag and category in some cases.
  • Fixed that links like "./../a.html" where not processed properly.

17 May 2001: 3.1.13

  • Added installation script install.pl to simplify installation process
  • HTTPS support has been added. Thanks Dubun Guillaume <gdubun@gecko.fr>
  • search.cgi now accepts tmplt parameter. It's can be used to specify an alternative search template to be opened.
  • Content-Language: HTTP header support for detecting document language.
  • Using language of normalized words for document language detecting.
  • Now all programs can accept alternative /var working directory. This allows to put built-in and cache-mode databases in non-default directories without having to recompile the package. indexer, spelld, search.cgi, search.cgi take the path from VarDir command in respectively indexer.conf, spelld.conf and search.htm. splitter and cachelogd take working directory value from -w command line argument.
  • A problem with quotes in AliasProg has been fixed. Thanks "Justin" <antiliquidj@mail.com> for reporting.
  • Fixed that sgml entities ( like & " ä ) were not unescaped in META KEYWORDS and DESCRIPTION. Thanks Danil Lavrentyuk <arilou@ice.ru> for reporting.
  • A bug that basic authorization where not work when ServerTable is used has been fixed.
  • A bug that <META NAME="Refresh" Content="..."> where not processed properly in some cases has been fixed. Thanks Ivan Mikhnevich <vano@red.by>
  • A bug that text hilighting were not work properly in some cases has been fixed. Thanks Thomas Olsson <mnogo@armware.dk>.
  • A bug in spelld hanging has been fixed.
  • Some bugs and possible exploits in search.cgi have been fixed.
  • Fixed a bug that socket was not closed when connect() failed. Thanks Ivan Mikhnevich <vano@red.by>.
  • Trap while fetching too big newsgroup lists fixed
  • Fixed a bug that Host: HTTP header were composed incorrectly when port is not 80.
  • Minor bug in built-in database has been fixed.
  • A bug that a line in indexer.conf, which contains only spaces, caused an "Error in config file" has been fixed.
  • A bug that indexer crashed when URL command argument has no correspondent Server/Realm command has been fixed.
  • ISO 10646 characters entity reference skeeping bug fixed.

12 March 2001: 3.1.12

  • cachelogd now accepts port to listen from -pXXX command line argument.
  • Crash in UdmAddURL() function has been fixed.
  • CPU usage overhead bug in cachelogd has been fixed.
  • A bug in search.cgi in IspelMode db has been fixed.
  • Ispell data loading speed-up.
  • Some built-in database bugs have been fixed. Thanks Darko Koruga <darkok@operamail.com>
  • Added search template variable $L - language limit selection.
  • A minor bug in navigator bar has been fixed.
  • Some memory leaks have been eliminated.
  • Stopwords are not checked anymore in the case of "substring match" searches.
  • A bug in Include indexer.conf command has been fixed. Indexer didn't stop when included file had wrong syntax.
  • A minor bug that URLs were not unescaped before parsing (when non-zero URLWeight) has been fixed.

20 February: 3.1.11

  • New feature so called "crosswords" has been added. It allows to assign words between <a href="xxx"> and </a> also to a document this link leads to. It works in SQL database mode and is not supported in built-in database and "cache-mode". To enable crosswords, please use "CrossWord yes" command in indexer.conf and search.htm.
  • Phrase support in "cache mode" has been added. Use --enable-phrase to activate it.
  • $ndocs template variable has been added. It displays the total number of documents in the database.
  • A bug that absolute path where not work in $if() search.htm include directive has been fixed.
  • Minor bug that fields category and tag were not filled in "INSERT INTO url ..." queries has been fixed. Now tag and category limits in "indexer -tXXX -gXXX" should work fine for not indexed yet documents.
  • Some locking in cache mode code has been added. Now threaded version of indexer should work fine in cache mode.
  • A bugs in ServerTable have been fixed.
  • A bug in robots.txt locking in threaded version has been fixed.
  • Documentation updates

9 February 2001: 3.1.10

  • Phrase indexing has been implemented in SQL and built-in modes. Note that cachemode does not support it yet. Now documents with full phrases or their parts are displayed before others. Use "Phrase yes" in both search.htm and indexer.conf to activate this. It requires to rebuild MySQL and Oracle tables and does not require changes in in SQL table structures for other databases. Index with phrases requires approximately four times more disk space than "Phrase no" version.
  • Indexer now stores a number of each word appearance in "Phrase no" version. Documents with greater word numbers will be displayed first.
  • You may select search for "full phrase" now among "all", "any" and "boolean" search types.
  • MinWordLength and MaxWordLength commands have been added into search template. Not matched words are considered as a stopword.
  • MinWordLength and MaxWordLength indexer.conf commands moved to global section of indexer.conf. Now their have global effect for whole indexer.conf. These fields also have been removed from ServerTable.
  • Minor bug in file: URL scheme has been fixed. Indexer didn't work properly with directories which contain space character. Thanks "Kaspar Brand" <kasparb@freesurf.ch>.
  • A bug in incorrect "DeleteNoServer no" command behaviour has been fixed.
  • A bug that URL limits didn't work in cachemode has been fixed.
  • Documentation updates. New file doc/relevancy.txt has been added.

24 January 2001: 3.1.9

  • Substring search has been added. It works in "single" and "multi" modes in both SQL and built-in versions. As far as other modes use CRC32 for words storage they do not support substring search.
  • Allow, Disallow, CheckOnly, HrefOnly, Realm, AddType commands has been extended to support fast "string match" comparison instead of "regexp match". This improve indexer speed up to several times. Note that "string match" comparison type is defailt, this makes old indexer.conf files incompatible with 3.1.9 version. Check your indexer.conf and change it if required. Take a look into indexer.conf-dist for explanation.
  • Allow, Disallow, CheckOnly, HrefOnly now can take three optional parameters and understand this syntax: "Allow match/nomatch case/nocase string/regex arg [arg...]". AllowNoMatch, AllowCS, AllowCSNoMatch, DiallowNoMatch, DisallowCS, DisallowCSNoMatch, HrefOnlyNoMatch, HrefOnlyCS, HrefOnlyCSNoMatch, CheckOnlyNoMatch, CheckOnlyCS, CheckOnlyCSNoMatch indexer.conf command have been removed.
  • New AliasProg indexer.conf command has been added. It allows to process URL through an external aliasing command.
  • Server and Realm commands has been extended to support powerful aliasing mechanisms. Take a look into doc/alias.txt
  • "StopwordTable table_name" indexer.conf and search.htm command has been added."stopword" table isn't read automatically anymore.
  • "StopwordFile" indexer.conf and search.htm command has been added. It loads stop-words from a text file insteat of loading from SQL database. This command works in both built-in and SQL versions.
  • Ispell code has been extended to support prefixes.
  • IspellUsePrefixes indexer.conf and search template commands has been added.
  • A bug in ispell code with incorrect regular expression check has been fixed.
  • Affix table data format has been changed. All users who use "IspellMode db" have to reimport affix table.
  • indexer.conf variable 'IspellMode' has been added. Now indexer can use 'IspellMode db' the same as search.cgi.
  • "lang" tag attribute support has been added. Tags like <P lang="de">...</P> now take part in language guessing.
  • New "fw" search form variable has been added. It can be used to change different document sections (body,title,keywords, description) weights at search time or to choose sections to search through. Check search.htm-dist as an example and refer to doc/search.txt for futher information.
  • indexer has been fixed to properly process some URLs. For example those with special characters: http://www.something.com/script.cgi?coord=12&amp;14 -> http://www.something.com/script.cgi?coord=12&14
  • "cache mode" now support quick search with tag, category and site limits. "cache mode" storage format has been changed. It is now incompatible with previous version. Take a look into doc/cachemode.txt for more information.
  • "cache mode" log server "cachelogd" has been added. Now indexer makes TCP connect to cachelogd at startup. Cachlogd allows to run several simultanious indexers even on different machines.
  • New "LogdAddr" indexer.conf command has been added.
  • "HLBeg" and "HLEnd" search.htm variable has been added. By using it you can specify how to hilight the words founded. HLBeg is prepended before the word. HlEnd is appended after the word. Defaults are "<b>" and "</b>" to comply with previous versions.
  • UseRemoteContentType indexer.conf variable has been added. This command specifies if the indexer should get content type from http server headers (yes) or from it's AddType settings (no). If set to 'no' and the indexer could not determine content-type by using its AddType settings, then it will use http header. Default is yes.
  • Turkish iso-8859-9 and windows-1254 character sets support has been added. Thanks Oyku Gencay <oyku.gencay@veezy.com>.
  • Modified version of connect() with timeout has been added to be more sensitive with network problems.
  • New "DocTimeout" indexer.conf command has been added. It limits total amount of time for each document fetching.
  • Title, keywords and description inside <!--UdmComment--> are not indexed anymore.
  • Major memory leak in InterBase driver has been fixed. Thanks mordicus <mordicus@free.fr>.
  • A bug that some SQL error messages were not displayed in search.cgi has been fixed.
  • A bug that paths from robots.txt where not escaped before adding into database has been fixed.
  • ftp code improvements and minor bug fixes.
  • Fixed that boolean search was not work in built-in database.
  • Some code clean-ups which fixes compilation warnings.
  • Call for possibly broken strptime() removed from search_tl.c
  • A bug that DBPort command didn't not work with MySQL has been fixed. MySQL driver has been changed to use mysql_real_connect() instead of mysql_connect().
  • New -g command line argument has been added to indexer. Now one is able to tell indexer which category to index.
  • External parsers code has been rewritten. It now supports four parser types: STDIN->parser->STDOUT, FILE->parser->STDOUT, STDIN->parser->FILE, FILE->parser->FILE. indexer now creates UDM_URL environment variable with a URL being processed as a value when executing external parser. Several bugs has been fixed and some more features has been added. Take a look in doc/parsers.txt for more information.
  • New exec: and cgi: virtual schemes. cgi: allows to index CGI scripts wihtout having to involve HTTP server in indexing process. exec: scheme allows to use external retrival programs to index protocols which are not natively supported by mnoGoSearch, for example HTTPS.
  • Documentation updates.

30 October 2000: 3.1.8

  • Project name has been changed to mnoGoSearch.
  • Support for SAPDB has been added.
  • Case sensitive Allow/Disallow/CheckOnly/HrefOnly commands have been added.
  • New "Realm <regex>" indexer.conf command has been added. It works like "Server" but takes a regular expression as it's parameter. Servers are not sorted by URL length after loading anymore. They are found in the order of appearence in indexer.conf. It means that if you want different parameters for server subsections, use "Server" command for subsection first, then command "Server" for the whole server.
  • Default indexer.conf has been reorganized. All commands now devided into five logical sections.
  • Crash in UdmFreeParsers() has been fixed.
  • A bug in page navigator has been fixed. Bug since 3.1.7.
  • A bug in new Period format has been fixed.
  • A minor bug which caused slightly non-standard "Accept-Charset" line in request header has been fixed.
  • Subtle bug in indexer has been fixed. Indexer followed a redirect link given in "Location" HTTP header even if "Follow page" or "Follow no" is specified for current "Server".

16 October 2000: 3.1.7

  • indexer.conf parameters Period, MirrorPeriod, NetErrorDelayTime,ReadTimeout now can be specified in more convenient way.
  • New different Follow indexer.conf command values have been added for more flexible spidering configuration.
  • New follow_type optional Server indexer.conf command argument has been added to specify site realm.
  • FollowOutside indexer.conf command has been removed. Use Follow world instead.
  • New URL indexer.conf command has been added. It allows to specify alternative server's entry points.
  • New ServerTable table_name indexer.conf command has been added. It loads server entries with all their parameters from the database and makes it possible remote servers configuration through the web application.
  • New CREATE TABLE script for servers table have been added.
  • splitter -h now displays short help page.
  • <NOINDEX> and </NOINDEX> have been added as a synonim for <!--UdmComment--> and <!--/UdmComment-->
  • Subtle bug which produced core dump in cache mode search code has been fixed.
  • A bug in search.c which caused threaded version compilation problems has been fixed.
  • Thread stack size has been increased to avoid threaded indexer crashes.

11 October 2000: 3.1.6

  • search.cgi template name detection has been fixed to support content negotiation. This makes possible to install multi-language search pages.
  • DefaultLang indexer.conf parameter has been added to set default language for server. Suitable if you are using language restriction while doing search.
  • InterBase related configure stuff has been fixed.
  • Some memory leaks in InterBase driver have been fixed.
  • A bug which possibly was a reason of crashes in document CRC32 calculation has been fixed.
  • Some ftp code improvements in symlinks processing.
  • A bug in charsets handling which affected introduced in 3.1.4 Hebrew and Baltic charsets have been fixed.
  • A bug in reindexing code for "multi" mode has been fixed. This fixes "table was not locked" error message in MySQL.
  • A bug in that $if() was not work has been fixed in search.cgi
  • A bug in splitter for cache mode has been fixed.

27 September 2000: 3.1.5

  • New "cache" storage mode which is able to index and quickly search through the millions of documents.
  • New $SearchTime templare variable.
  • New $DE template variable. It displays description when not empty and text overwise.
  • Boolean search has been implemented in search.cgi It has the same with PHP fron-end syntax.
  • Greek iso-8859-7 and cp-1253 character sets support has been added. Thanks Dimitri Bougoulias .
  • Hebrew iso-8859-8 and cp-1255 charsets support has been added.
  • Baltic iso-8859-4, iso-8859-13 and cp-1257 support has been adeded.
  • New $DY template variable to display document category.
  • Meamory leak in PostgreSQL driver has been fixed.
  • "indexer -i -f urllist.txt" core dump bug since 3.1.4 has been fixed.
  • MP3 code cleanups.
  • Binary search in host names cache has been added.

18 September 2000: 3.1.4

  • Search results cache support has been added into search.cgi. This allows very fast output when query is repeated for example while navigation through search result pages. New command Cache yes/no has been added into search template.
  • Support for Last-Modified and If-Modified-Since has been implemented for file: URL scheme. This makes significantly faster reindexing on not modified local files.
  • Server news://servername/groupname syntax support for NEWS groups has been added into indexer.conf
  • Support for nntp:// URL scheme has been added.
  • "Subject:" and "From:" headers are currently decoded according to RFC 1522.
  • MP3 headers processing has been added.
  • HTTP Proxy Basic Authorization support has been added.
  • <META NAME="Language" Content="xx"> processing has been added.
  • Host names cache has been added into indexer.
  • Some configure.in fixes and improvements have been done.
  • libudmsearch interface functions have been slightly changed.
  • Shared library creation has been implemented.
  • A bug which caused external parsers hangup sometimes has been fixed.
  • A bug which caused hard CPU loading in threaded indexer version has been fixed. Thanks Peter Hanecak <hanecak@megaloman.com> for this.
  • A bug in robots.txt and gethostbyname() mutexes locking has been fixed.
  • A minor bug in URL parser has been fixed.
  • New indexer command line argument to limit maximum indexing time.
  • A bug which caused wrong document charset detection in some cases has been fixed.
  • A bug that indexer did not understand spaces and special characters in file: URL scheme has been fixed.
  • A bug that clones were not displayed in search.cgi has been fixed.
  • New huge English stop-list has been added.
  • Danish stop-list has been added.
  • Slovak stop-list has been added.

3 Aug 2000: 3.1.3

  • $g template variable has been improved to support parameter $g(X). This displays a tag without X trailing characters.
  • Polish stop-list has been added. Thanks Maciek Uhlig <muhlig@us.edu.pl> for contribution.
  • Some Oracle improvements have been done.
  • Some crc-multi mode improvements have been done.
  • A bug in udm_strnlen which could caused a buffer overflow in "URL too long" messages has been fixed.
  • A bug which caused wrong keywords fetching from sql backend has been fixed.
  • New categories and tags releated documentation.
  • A bug in URL cache which was a reason of losten links has been fixed. Thanks Willem Brown <willem@brwn.org> for discovering this problem.
  • crc32() has been renamed into UdmCRC32() to avoid compilation problems on some platforms.
  • --disable-file configure argument were not work.
  • Oracle tables structure has been fixed.
  • Minor search.cgi speed improvements.
  • Compilation problem on Alpha has been fixed.
  • A bug that content type was modified after external parser call has been fixed.
  • Some InterBase related bugs have been fixed.

13 Jul 2000: 3.1.2

  • HTTP and FTP date parsing functions replaced to ones taken from Apache to avoid using strptime (that is broken on Solaris 2.6)
  • Code added to search.cgi to 'remember' the state of HTML checkboxes and radiobuttons between pages.
  • Query tracking facility has been implemented. Use "TrackQuery yes" template command to enable it.
  • New template sections navleft_nop, navright_nop, noquery has been added.
  • New "udm_recursion" search.cgi parameter to skip $iurl() directives has been added. This allows to avoid recursion when search.cgi includes itself.
  • A patch by The Hermit Hacker to use WHERE rec_id IN(...) when displaying search results has been applied. This should make search a bit faster.
  • Text between <STYLE> and </STYLE> is not indexed anymore.
  • Perl and PHP front-end are removed from the package and distributed as a separate packages since now.
  • A minor Oracle related stuff in configure.in has been fixed.
  • Some Oracle improvements have been made.
  • A minor bug in ispell support has been fixed.
  • A bug in external parsers code has been fixed.
  • A bug which appeared when HREF has leading or traling spaces has been fixed.
  • Content-type were not escaped. This caused SQL failure in some cases.

30 June 2000: 3.1.1

  • Nested categories support has been added.
  • Tag type has been changed to CHAR instead of INT. The above changes require updating database structure.
  • "Category" indexer.conf command has been added
  • "$CP" template variable to display category path has been added
  • "$CS" template variable to display current category subtree has been added
  • "cat" search.cgi parameter has been added to pass the category to search through
  • "$cat" template variable to display current category ID.
  • A bug in cp1250 and iso-8859-2 support has been fixed
  • Some functions moved from log.c to udmutils.c
  • udm_snprintf for Solaris has been added
  • Some bugs have been fixed

15 June 2000: 3.1.0

  • Native FTP support added so you can index ftp sites without using proxy.
  • Low-level network functions layer added (work in progress).
  • New advanced search options (time limits). You can search for recent pages and so on.
  • md5 hash replaced to crc32 in database.
  • Some (incompatible) database structure changes needed for the above.

3.0.x ChangeLog

24 September 2000: 3.0.23

  • Memory leak in PostgreSQL driver has been fixed.

4 September 2000: 3.0.22

  • A bug that indexer did not understand spaces and special characters in file: URL scheme has been fixed.
  • A bug which caused wrong document charset detection in some cases. has been fixed.
  • A bug that "last_index_time" field was not modified has been fixed.
  • Danish stop-list has been added.

25 August 2000: 3.0.21

  • A bug which caused hard CPU loading in threaded version has been fixed.
  • A bug which caused external parsers hangup sometimes has been fixed.
  • Some configure.in improvements.
  • --disable-file were not work.
  • Bugs in robots.txt and gethostbyname() mutexes locking have been fixed.

25 July 2000: 3.0.20

  • A bug in URL cache which was a reason of losten links has been fixed.
  • Polish stop-list has been added.
  • A patch by Matthew Sullivan <matthew@netscape.com> for NEWS extensions has been applyed. This fixes the problem of the indexer not storing the text of a news message if it is submited as a multipart mime message.
  • A minor bug in ispell support has been fixed.
  • Content-Type field were not escaped. This caused SQL failure in some cases.
  • A bug in external parsers code has been fixed.
  • A bug which appeared when HREF has leading or traling spaces has been fixed.
  • -lz has been added to fix compilation problem with latest MySQL versions.
  • crc32() function has been rnamed to avoid compilation problems on some platforms.

28 Jun 2000: 3.0.19

  • A bug in "No title" output has been fixed
  • A bug in Central Europe cp1250 and iso-8859-2 charsets has been fixed.
  • A bug in indexer -S wich appeared with Oracle native support has been fixed.

14 Jun 2000: 3.0.18

  • Several improvements in PHP front-end.
  • $q template variable to display URL escaped search query has been added. You can use it for links like for example "find the same in AltaVista".
  • A minor bug in <TITLE>..</TITLE> processing has been fixed.
  • A minor bug which caused compilation failure on AIX has been fixed.
  • A minor improvement for <OPTION> processing has been done.
  • A bug in <BASE HREF=".."> and "DeleteNoServer no" combination which caused indexer to follow outside desired area has been fixed.
  • search.cgi outputs "No title" line now instead of empty string when document has not title.

31 May 2000: 3.0.17

  • A minor bug in charset stuff which caused crashes in some cases has been fixed.
  • Minor bugs in URL parser has been fixed.
  • Minor bug in URL list importer has been fixed.

25 May 2000: 3.0.16

  • Several minor bugs have been fixed in PHP front-end.
  • configure script has been fixed to support Oracle8i. Use "configure --with-oracle8i[=/path/to/oracle]".
  • Search results when "ul=" is given were not sorted by score with built-in dastabase support.
  • New template "$g" variable has been added. It is replaced by "t=" (tag) value. Check search.htm-dist for usage examples.

16 May 2000: 3.0.15

  • -L -D -A and -d indexer command line arguments have added to load ispell data into database.
  • "IspellMode db" search.htm template variable has been added to load ispell data from database by search.cgi
  • <!--UdmComment--> and <!--/UdmComment--> special tags processing has been added in HTML parser. It is possible to hide unnecessary text from indexing now.
  • &#220; character notation processing has been added.
  • Checking for snprintf has been added into configure script. This should fix compile problems on Digital UNIX V4.0F.
  • Some bugs in ispell stuff of search.c has been fixed.
  • Some locale-related improvements and speed improvements have been made in PHP front-end.
  • Patch by Andrej Filipcic <Andrej.Filipcic@ijs.si> to avoid "duplicate key" error messages with MySQL-3.23.15 has been applied.
  • A bug that the t (tag) parameter were not given in links to next/previous pages in search.c has been fixed.
  • A bug in "indexer -k" has been fixed.
  • A bug in If-Modified-Since header composing has been fixed.
  • Minor bugs in HTML parsers have been fixed. This makes indexer more stable for incorrect HTML documents, for example, when <title> is not closed, etc.

7 May 2000: 3.0.14a

  • Several bugs have been fixed

4 May 2000: 3.0.14

  • "crc" and "crc-multi" modes for built-in database has been added. These modes uses fast binary files. Now UdmSearch works very fast with built-in database for big enough sites.
  • Bugs in PHP front-end error messages reporting and dropping temporary tables in "multi" modes for MySQL have been fixed.
  • Some PHP front-end improvements.
  • More nice error reporting when URL is too long.
  • Patch by The Hermit Hacker for faster search under PostgreSQL when "ul=" is specified has been applied.

30 April 2000: 3.0.13

  • "crc-multi" storage mode has been implemented.
  • "multi" and "crc-multi" modes have been implemented in PHP search front-end.
  • "udm-config" script is now installed into the /bin directory of UdmSearch. Script detects dependencies required by libudmsearch to allow easier use of libudmsearch in third party applications. Thanks Bob Smith <bob@cs.csoft.net>.
  • "DBAddr" indexer.conf and search.htm command has been added. One may configure all database parameters with one command now:
    "DBAddr mysql://user:passwd@hostname:port/database_name/"
    . Old style DBType, DBUser, DBPass, DBHost, DBPort, DBName commands will be removed in one of next releases.
  • Minor URL parser improvements.

19 April 2000: 3.0.11

  • "Clones yes/no" indexer.conf and template command has been added.
  • "crc" mode support in PHP frontend has been added.
  • $if(filename) template command to include other tempate files from current one has been added. It has been implemented in both PHP and CGI frontends.
  • A bug that search.c were not work in some cases under Netscape Enterprise HTTP server has been fixed.
  • search.cgi does not allow now very big page sizes (>100 by default) to be passed. That could crash or slow down server earlier.
  • indexing speed improvements. indexer now works about 20% faster.
  • libudmsearch improvements
  • Several PHP frontend improvements
  • It is possible now to describe several (up to 100) search result output types in template. One may choose for example "Long", "Short" etc. search results types. This feature is supported in both PHP and C CGI versions.

5 April 2000: 3.0.10

  • UrlHostWeight, UrlPathWeight, UrlFileWeight indexer.conf commands have been added.
  • EasySoft ODBC-ODBC bridge support has been added.
  • Native Oracle7 support has been added.
  • Oracle7 support in PHP frontend has been added.
  • "DBType MSSQL" and "DBType Oracle7" has been added.
  • Some bugs in Oracle8 support have been fixed.
  • Patch by Marcin Marszałek for URL filter in search.cgi in built-in text mode has been applied.
  • Patches by Heiko Stoermer for extended NEWS mode have been applyed
  • Some multi-threaded version impovements
  • Little bug in random numbers generation (which might be required for including banners into template) has been fixed
  • Bug in indexer.conf parser has been fixed (bug since 3.0.9)
  • Some HTDB improvements and bug fixes
  • <BASE HREF="xxx"> processing has been added.
  • Wrong --with-syslog=<facility> configure behaviour has been fixed
  • Several bugs have been fixed

15 March 2000: 3.0.9

  • Ispell support for PHP frontend has been added

    9 March 2000: 3.0.8

  • New "crc" storage mode has been added. When running in this mode UdmSearch does store 32 bit numeric word IDs calculated by CRC32 algorythm. This mode allows to reduce required disc space as well as speed up search.
  • MinWordLength, MaxWordLength, NumberFactor, AlnumFactor indexer.conf commands have been added.
  • '-k' indexer command line argument has been added. This option prevents indexer from using locking system with MySQL (LOCK TABLE xxx ... UNLOCK TABLES) and PostgreSQL (BEGIN WORK...END WORK) backends.
  • Some Oracle related improvements and bug fixes
  • Minor bugs in HTML parser have been fixed
  • search.cgi hilights query words in search results since now
  • checking of UDMSEARCH_TEMPLATE environment variable has been added into search.cgi. You may specify this variable to tell search.cgi to open different template instead of default one.

    29 February: 3.0.7

  • Native Oracle8 support has been added. Some Oracle related documentation is available in doc/UdmSearch-Oracle8.html.
  • libudmsearch.a creation has been added. One may easily create own search scripts now using udmsearch library.
  • indexer can parse https:// URL type since now. This allows to use https:// URLs in combination with Alias command. Note that HTTPS protocol itself is not implemented though.
  • URL file list can be read from STDIN now by using
    indexer -f -

    23 February: 3.0.6

  • Resolve related bug on DEC has been fixed
  • The bug which caused repeatedly looking at the same URLs with built-in database support has been fixed.
  • Added IspellCorrectFactor, IspellIncorrectFactor indexer.conf commands. Now it is easily to configure indexer to work in "spell checking" mode. Check /doc/ispell.txt of UdmSearch distribution for details.
  • Added "ue" search.php3 parameter to exclude matched URL from search. It has reverse effect with "ul" parameter.

    21 February 2000: 3.0.5

  • Added --enable-linux-pthreads and --enable-freebsd-pthreads configure parameters to compile multi-threaded indexer on Linux and FreeBSD machines

    19 February 2000: 3.0.4

  • Added built-in text files support. UdmSearch may be compiled without any SQL databases since now. UdmSearch with text files do not support all functionality of SQL version but it works very well and fast with small sites up to 1000 documents.
  • Added Arabic cp1256 character set support.
  • Added DBMode indexer.conf and search.htm command. It will allow to choose single/multitable storage model in configuration instead of selecting at installation time.
  • Added PostgreSQL support in new PHP frontend in both single and multi table storage model.
  • Some "ftpsearch" mode and syslog related bugs have been fixed.

    15 February 2000: 3.0.3

  • New "-f <file>" argument for indexer to allow to pass a filename with URLs. This file is expected to be a list of URLs to be inserted, re-indexed or cleared. SQL '%' wildcard are supported in file.
  • New -i option tells indexer to insert at startup any URLs passed via -f or -u
  • New -w option suppresses warnings before clearing URLs.
  • Added DeleteNoServer indexer.conf command
  • New PHP front-end for MySQL in single "dict" mode and for Oracle in both single/multi dictionary mode.
  • Added Russian automatic charset guesser for all popular character sets: koi8-r, windows-1251, iso-8859-5 and x-max-cyrillic
  • Several bug fixes.

    02 February 2000: 3.0.2

  • Some indexing speed improvements
  • search.cgi speed improvements when many (>10000) results are found
  • indexer follows this links now <LINK .... href="http://server/inhalt2.htm">
  • Several bug fixes in sources and configure.in

    25 January 2000: v3.0.1

  • Added native InterBase support. Tested with IB 4.0 for FreeBSD.
  • indexer can do mirror on local disc now.
  • One may generate site form database now using mirroring combined with HTDB features.
  • Added language guesser. Added new field 'lang' into the structure of stopword and url table for language guesser.
  • indexer can follow these URLs now: <META HTTP-EQUIV="Refresh" Content="3, URL=http://www.name.com/page.html"> <IMG SRC="...">

    13 January 2000: v3.0.0

  • Added DBType indexer.conf command. It will allow UdmSearch compiled with ODBC to take in account database specific stuff. MySQL, Oracle and Solid types are recognized now. Other backends have default behaviour.
  • Added "multidict" mode. It allows to store words in different tables depending of words length. Use --enable-multidict parameter to configure to enable this feature.
  • Some improvements in stopword searching
  • DeleteBad is now "no" by default
  • MySQLHost, MySQLUser, MySQLPass commands are not longer supported. Use DBHost, DBUser, DBPass, DBName instead
  • Added a possibility to remember selected <OPTION> in search.cgi
  • Added htdb: virtual URL schema to support database fields indexing
  • One may run search.cgi from command line now It takes argv[1] as search query
  • One may put HTML template directly in Apache directory structure and use it through AddType/AddHandler It easily allows to use search.cgi with different templates
  • Main code is rewritten to support threads
  • UdmSearch is installed in /usr/local/udmsearch/ by default now

  • 2.x.x ChangeLog

    13 March 2000: v2.2.1d stable

  • Minor bug in miniSQL code hase been fixed
  • 05 March 2000: v2.2.1c stable

  • Minor bug in PostgreSQL PHP frontend has been fixed
  • Workaround for platforms which have not strnlen() has been added
  • Some configure.in lacks have been fixed
  • 05 January 1999: v2.2.1b

  • Fixed some bugs from previous version
  • 25 November 1999: v2.2.1

  • Added AllowNoMatch, DisallowNoMatch, CheckOnlyNoMatch new URL limitation control commands
  • 24 November 1999: v2.1.9

  • Added Solid support
  • Added unixODBC support
  • Added PostgreSQL PHP frontend with query language Porting from MySQL by ZioBudda <michel@michel.enter.it>
  • MySQL PHP frontends are not installed by default anymore
  • 16 November 1999: v2.1.8

  • Added iODBC support
  • Added Czech stoplist
  • Some bug fixes
  • 4 November 1999: v2.1.7

  • Minor bug fixes for 2.1.6
  • 3 November 1999: v2.1.6

  • Added NNTP (news:) support. It is possible to organize search through news groups now. Take a look at sample indexer configuration file doc/samples/news.conf in UdmSearch distribution
  • Added AddType indexer.conf command. This command associates filename extensions (for services that don't automatically include them like file:) with a mime type
  • Some other file: improvements
  • 18 October 1999: v2.1.5

  • robots.txt support did not work with "FollowOutside yes", fixed
  • Added file: URL. One could index local files now
  • Added Alias indexer.conf command
  • Added miniSQL support
  • Added perl frontend by Rohan Baxter search.pl matches functionality of search.php3 for version udmsearch-2.1.3 except for use of Rand variables
  • Added sorting by rate/modification date in search.cgi
  • Added HTTPHeader indexer.conf command to add user defined headers in HTTP requests
  • Fixed bug in reaping zombies of external parser processes Thanks Manfred Bathelt
  • Fixed some bugs in new database interface (bugs since 2.1.4.b1)
  • 30 September 1999: v2.1.4 b1

  • Added PostgreSQL support
  • Added italian stoplist
  • Added page navigator template
  • Fixed a bug in html template when title, description or text contain a special html characters like &<>
  • Fixed a bug in cp1250 -> iso-8859-2 conversion
  • 18 September 1999: v2.1.3

  • Added <META HTTP-EQUIV="Content-type" CONTENT="text/html; charset=xxx"> support to determine charset
  • 17 September 1999: v2.1.2

  • Added ISO-8859-1, ISO-8859-2, cp1250 charsets support
  • Added ISO-8859-2 - cp1250 conversion (both direct and reverse)
  • Added LocalCharset command in html template
  • Added build-in vsnprintf for platforms which don't have one
  • Added fr,ua,es stoplists
  • Added ForceIISCharset1251 option to deal with MS IIS servers
  • search.php3 and morph.php3 speed improvements
  • Fixed stopword-related bugs in search.php3/morph.php3
  • Fixed core dump bug in logging
  • Fixed MySQL parse error when TITLE contains special characters like ' and \
  • RPM cleanups
  • 29 June 1999: v2.1

  • Some minor fixes, including several ones for Solaris
  • Added MaxDocSize configuration option
  • 16 Jun 1999: v2.1.pre2

  • documentation updates and fixes
  • a couple of compilation fixes for Solaris
  • fixed problems with some SGML letters like &Auml;
  • some minor bug fixes and cleanups
  • 8 Jun 1999: v2.1.pre1

  • Added support for templates to easily customize search results appearance.
  • Added support for different word forms. UdmSearch now uses Ispell affixes and dictionaries. For example if you are searching the word 'test'. tested, tester, testers, testing, testings, tests will be found as well. Several languages are supported at the same time.
  • Added support for external parser programs (e.g. programs which can convert different formats to plain text or html). Tested with .sgml, MS WORD .doc, man files. Adding your own parsers is extremely easy.
  • indexer now uses syslog for logging its messages by default.
  • MD5 hash of every document is stored in database. This is used by search to show document clones, well, documents with the same contents but different locations. It is also used by indexer itself to decide whether to parse document again or not, which further improves indexing speed.
  • Various improvements and bug fixes.
  • 21 Apr 1999: v2.0

    Incompatible changes!!!!! Changed structure of table "url". Check 'create/alter19-20.txt' script to fix structure.
  • Added configure script to easy installation on different platforms.
  • Added robots.txt and <META Name="robots"...> support.
  • Added 'MaxHops' indexer.conf command Now one can define maximum way in 'mouse clicks' from start URLs.
  • Added META keywords and description processing. Now UdmSearch will find document first by default if there given words are found in keywords or description. UdmSearch stores keywords and description in database. One can easy add it into search result.
  • Added TitleWeight, BodyWeight, DescWeight, KeywordWeight and UrlWeight indexer.conf command to define the 'weight' of each word in the different parts of the document. Weight is a number in the range 0..127. All weights are bit OR'ed if word appears in different parts at the same time.
  • Now one could easy configure indexer to run in 'ftpsearch' mode (search through URL rather then content of the document) and in 'checker' mode (search incorrect references on the site). Configuration samples are included.
  • Added 'FollowOutside' indexer.conf command to allow indexer follow outside servers given in indexer.conf. Should be used carefully with MaxHops limit.
  • Added 'Index' indexer.conf command to allow/disallow indexer store found words in database
  • Added 'Follow' indexer.conf command to allow/disallow indexer store new found urls in database
  • Added 'CheckOnly regexp' indexer.conf command. Indexer will use HEAD instead of GET http method for URLs that matches regexp. It means that the file will be checked only rather then downloaded. Useful for zip,exe,arj and other binary files in 'ftpsearch' and 'checker' modes.
  • Added 'MaxNetErrors' indexer.conf command If there too many network errors on some server (server is down, host unreachable etc) indexer will try to do not more then 'number' attempts to connect to this server.
  • Added some nice command line arguments to indexer
    - Now one could easy reindex only subsection of database with given filter(s): tag, URLs those match given pattern (sql LIKE wildcards) or URLs with given HTTP status code.
    - One could easy reindex URLs even if not expired yet.
    - Now indexer can show some statistics of whole/part database.
    - One could easy delete URLs with given filter(s) or clear whole database.
  • Some fixes and nice changes in search.cgi and search.php3
  • 29 Dec 1998: v1.9

  • Added basic http authorization (base64-encodes login:password)
  • Fixed some problems with FTP via proxy
  • Added automatic reconnect to mysql server
  • 17 Dec 1998: v1.8

  • Added proxy support.
  • Added FTP support when using proxy
  • Added User-Agent in HTTP header
  • 3 Dec 1998: v1.7

  • Added PHP3 client Thanks for Mustapha MOUGHIT (emoughit@cie.fr) for porting it from php2.
  • Added If-Modified-Since in HTTP header
  • Best documents were not found first in search.cgi. Fixed.
  • search.cgi now reports count of found documents
  • 19 Nov 1998: v1.6

    • Added C cgi program to use it instead php/fi embedded html. It does not support advanced search with boolean query language yet.

    18 Nov 1998: v1.5

  • Initial public release.


  • Copyright © 2000-2001 Lavtech.Com Corp.
    Project of Lavtech.Com Corp.