Sphider and Sphider-Plus Search Engine

Jan 11 2009: link for Sphider-Plus version 1.6 download
Dec 2 2008: link for Sphider-Plus version 1.6 download mirror, no Sphider-Plus 1.6 ?
Revised Nov 25 2008: new Sphider-Plus project site !

There's some good news about Sphider-Plus - a newSphider-Plus project site, with documentation, FAQS, etc. There's even a small forum powered by Phorum for support and suggestions, nicely integrated with Sphider-Plus styles. It's a good useful site, as one might expect from the developers of one of the best open source GPL search engines around.

There's also some potentially bad news. Here's a link to the new download page for Sphider-Plus version 1.7. See what you make of it. For reference purposes, here's also a link to the section for GPL licensing FAQS about download fees.

In fact, I am highly sympathetic to cause of making a living from open source GPL software. Recently, several leading open source developers have had to call it quits and load trucks or work at McDonalds for a living ( thereby producing a big jump in their incomes ). How many hours have the developers devoted to creating Sphider-Plus ? A thousand hours, at least. That's not including the many hours spent developing the original Sphider by Ando Saabas ( which obviously raises some questions of its own ).

Overall, I hope that a download fee or some form of it helps the developers to get some income for all their hard work, but it may have a chilling effect, maybe too chilling. We'll see.



Previous to Nov 25 2008:


Oct 19 2008: added note about backup/restore

Sphider-Plus is the big brother of Sphider. I use the term "brother" in a loose sense. At this point, the code bases are so different that it's not clear what their relationship is. However, Sphider-Plus is definitely related and bigger. The overall size is over 5 times that of Sphider, although some of the difference is documentation.

Just to get an idea of the difference in terms of raw file size:


  Sphider-Plus 1.6 Sphider 1.3.4
search.php 26K 4K
/include directory 183K 55K
/admin directory 314K 139K
/converters 2.66M N/A


As you can see, the big difference is the converters for different types of content. Most of the simple search engines only parse html content. The converter binaries in Sphider-Plus will also parse PDF, MS Word and MS PowerPoint to extract keywords ...

... but only running on MS Windows. The coverter programs need only return text for some file name of a certain type, so any program of any OS that returns converted text will work. I've heard of PDF-to-text conversion done successfully on Linux ( See the PDF text extractor for XPDF ).

In fact, the apparent disadvantage of having to run indexing on a home machine is a blessing for a shared hosting environment - it pushes the whole resource-intensive activity of text parsing off the shared host. Simply, build the index on the home machine, export the mySQL database, import the ( possibly large 30 Meg or so ) mySQL file into the database on the server and there you are. Done !

There may be some tricky moments during the mySQL load when users might see "page not found" errors or similar side effects, but that should be fairly minimal on low volume sites, as the vast majority of personal sites are ... there is also the possibility of updating a new database and then switching DB names in the Sphider-Plus configuration. [ Note: it is often easiest to use the Backup/Restore function to update the database on the server, that is backup on the home machine and restore on the server. The restore function is not tolerant of errors.

It seems to me that this is more than just a search engine application that works: it appears to be a complete solution for building a big multi-site index and search application in a low-resource shared hosting environment.

I predict that we will be hearing more from Sphider-Plus in the coming year. I could even see it as an enhanced search engine for Semantic Web formats ( RDF. OWL etc. ).

Dream on ?