Ichido is a set of experimental search engines and software projects created by Anthony Mancini. The flagship project is the Ichido general purpose search engine, a classic search engine with its own independent index.
Now, indexing the web is hard and this is in beta so the search results aren’t exactly what you’d call competitive, but I have to say – the user interface for this search engine is downright fantastic. It emulates that late ’90s look, and does a very interesting thing where it adds buttons for things like RSS feeds and social accounts for the pages it links to in the results. On top of that, it will list less desirable features of websites – trackers, ads, etc., as red warnings.
No, this can’t replace DDG or Google – but I love the thought put into the UI.
And, it unfortunately fails at the first query for “books”:
https://ichi.do/search?q=books
Starts with (relatively) random links (though some are authors), and finally at the fifth link we reach something like a book repository (The Web Archive):
https://archive.org/details/texts?query=%22foot%22&sin=TXT&sort=-date
But instead of landing on their main books page, we reach a random query in there.
What is the moral of the story?
Information retrieval is hard.
Web retrieval is even harder, due to unstructured nature of the data.
Though kudos for them for coming so far. Building these is no small feat. However getting a proper ranking system will definitely need all those things we usually not prefer: click through data from users for example, or location information and machine learning on top of that.
So you search for “books” and are surprised that it doesn’t return a webpage that doesn’t mention books anywhere…. you have to remember these days “google” isn’t a 90s search engine anymore its.. a curated keyword list ala AOL that falls back to search engine results after its top listings are exhausted.
I am (not) surprised it did not return useful results. Not surprised, because I did not have too high of expectations, but at least they could have implemented some of the basics.
Anyway, a Search Engine has three basic components:
1. A crawler
2. An index (retrieval)
3. A ranker
I don’t expect them to compete with the larger crawlers. Even using open data is acceptable for a new project.
For the index, I assumed should be able to select documents related to “books” while excluding ones that are not related to the query. (Not necessarily having the keyword, look up how PageRank works: https://en.wikipedia.org/wiki/PageRank, but relevancy is an entire academic area on itself).
And the final part, ranker, is the one with the problem here.
Having “100,000,000 results” returned about books mean nothing, if I can’t sift though them in a meaningful way. Otherwise we would still be using AltaVista, and humans can’t be expected to do this manually either (Yahoo Categories).
That leaves us with a ranking function, usually made of features like:
1. Relevancy (topic match)
2. Freshness
3. Popularity (could be visits and/or incoming web anchors)
4. Quality (some metric of landing page evaluation)
Those are the minimum you’d need to have a useful Search Engine in this day and age.
Still, wish them the best of luck though, they have done a significant amount of work just to reach this point.
This is a bit “Erm, Ackshully”, but this isn’t 90s. This is mid 2000’s Google. Google didn’t look like that in 1998 and Yahoo was the dominant engine in the 90s, so I wish they had an option for a Yahoo theme 😛