Category Archives: Search Engines

Begone Searchwiki

Google’s Searchwiki sounded like a good idea at the time.  Sign in with your Google account, activate Searchwki, carry out a search and you can promote, or delete entries in your results list and add comments that can be made public.  So I had a go. Unfortunately, should you later want to look at the un-searchwiki’ed results list you cannot just switch it off. You have to log out of your Google account and re-run the search.  What annoys me even more, though, is that it overrides my Firefox Customise  Google add-on.

I like to drop in an out of my Gmail, look at Google analytics about once a day and work on my Google Custom Search Engines. So most of the time I am signed in to one of my Google accounts.  I do not want to have keep logging out  just to carry out an ordinary  search  and to be able to look at the results in the way I want.  Thankfully, Google seems to have seen sense over this.  Techcrunch reports that Google will be providing a add a toggle button to allow users to turn the feature off, possibly in  Q1 2009.

In the mean time if you are curious about Searchwiki, curb your curiousity. Don’t go there!

Ten science search engines

Ten science search engines is actually a list of nine – you are invited to submit suggestions for the tenth via the comments section. The nine are:Scirus, Scitopia.org, Science.gov, ScienceResearch.com, Scitation, WorldWideScience.org, Science Accelerator, TechXtra, and search.optics.org. They all have different coverage and emphasis and none are comprehensive. Which one will work for you depends very much on the subject area. The three I regularly use in this list are Elsevier’s  Scirus, TechXtra for engineering (ICBL and Heriot-Watt University) and WorldWideScience.org. Conspicuous by its absence is Google Scholar!

RefSeek for “academic information”

RefSeek is a a new search engine that “aims to make academic information easily accessible to everyone”.  There is very little information on how it works other than it  searches more than one billion documents, including web pages, books, encyclopaedias, journals, and newspapers.  A few test searches suggest that it searches just .edu. ac.uk,  .org web and .gov web sites but not .org.uk or .gov.uk.  Straightaway, those of us in the UK are missing out on a large chunk of scientific information and data as are other countries whose academic web domains do not include an organisation type such as. edu or ac. My searches on zeolites, for example,  failed to pick up papers on Zurich University’s web site (http://www.uzh.ch/).  Also, .org and .org.uk domains can be bought by anyone and are not guaranteed to carry quality, peer reviewed articles. A search on my husband (Rhodes) and zeolites came up with some of his papers on the Royal Society of Chemistry web site (http://www.rsc.org/) and his own home page that is normally advertised as www.fresh-lands.com, but RefSeek picked up on the alternative freshlands.org domain.

Search options are the standard double quotes around phrases, minus sign to exclude documents containing a term, plus sign to include stop words and the Boolean OR. Next to each entry in the results list is an option to “Search this Site” which does work well. Although searching is free, you may find that you have to pay for articles on some sites.

Overall, RefSeek does a reasonable job of limiting your search to more serious scientific and academic information but there are far too many omissions for it to be reliably used on its own.  There are several other science search engines that I would recommend you investigate and use along side of RefSeek: see Ten Science Search Engines at
http://hwlibrary.wordpress.com/2008/09/22/science-search-engines/

Update on MSE360

Soon after I had published yesterday’s posting on MSE360,  I sent feedback to them on a couple of issues  including the lack of a ‘NOT’ command. They replied within a couple of hours and overnight have corrected a bug that had caused the problem.  So the Boolean NOT does work in MSE360.

I raised another question about a green icon that appeared to the left of some entries in the results list.  This was most obvious in my ego-surfing and I had hoped that that the icons next to my web pages and blog postings meant that they were highly recommended. Unfortunately it is not so.  The little green icon represents a site that you have visited before, but it is  a feature that only Firefox supports.  It is a really neat feature, though, especially for those of us who do a lot of desk research.  It is useful to be able to ignore those sites we already know about and have visited and, perhaps more importantly, to quickly track down the perfect site that we found yesterday but cannot easily spot in the results list.

MSE360 Search

One search engine bites the dust (Accoona) and another one is launched. I picked up details of Search (MSE360) via Phil Bradley’s blog posting and so far am very impressed with it. The home page is minimalist as is the norm these days and apart from the search box the only other obvious search option is a pull down list of countries. Hidden at the bottom of the page is an Options link that allows you to set safe search for images, change the default country, enable/disable WOT, and choose a different style sheet for your results. WOT is short for “Web of Trust” and is a community whose members exchange knowledge of websites.  If a site has a bad reputation, WOT will warn you by inserting an icon next to the results. The colour of the icon ranges through shades of green, amber and red, red indicating sites about which you might want to exercise some caution. Hover over the icon and you can view the WOT ‘scorecard’.

The results page is ‘three tiered’. The centre panel contains the usual web listings, and the default style has images on the left, and  wikipedia and blog postings on the right. The layout can be changed by selecting a different stylesheet. I eventually decided to have both sidebars on the right hand side of the screen. There are the inevitable ads (Google) but these are in the sidebar and clearly labelled as Paid Results.

The quality of the results for my standard, basic test searches was excellent and compared favourably with Google.  What did concern me initially was that there is no advanced search screen: I include site/domain and filetype commands in many of my searches and, for me, a search engine without them is a non-starter. After some experimentation, though, I discovered that that you can use the commands as part of your search strategy, for example

“car ownership” UK site:gov.uk filetype:pdf

I also found that you can use Boolean AND, OR and parentheses but not NOT (reminiscent of Yahoo!) . The minus sign can be used in a simple search if you want to exclude pages containing term but it does not seem to work when combined with Boolean operators.

Moving on to general issues, MSE30 stores no private data The only stored data are customization cookies on your own computer.  Your IP is not kept, nor is any other identifiable information. To help combat spyware, they  use an internal spyware alert program to provide warnings next to sites that may host spyware.

MSE360 say that they are still a test site so there are bound to be bugs,  and no doubt some changes will be made to the interface. They say ” We’re not at a stage in which we can say we’re ready, but you still love hearing your feedback, good or bad”. My first impression is that they are very close to being ready and light years ahead of some of the appalling, over-hyped  search engines that have been launched recently.  I definitely recommend that you pay them a visit.

Accoona is no more

Search engine Accoona has at last closed down. There have been rumours for many months that it was on the way out and now the home page merely says

“Due to an overwhelmingly competitive search market, Accoona.com and Accoona.cn will no longer be active.

We thank you for your previous support.”

Accoona came up with some interesting and innovative features such as the talking toolbar and SuperTarget (see my previous postings on Accoona). I was never impressed with the web search, and the business option was just a search on the D&B marketing directory. I did visit  the News section quite a lot, though, and in particular SuperTarget,  which  could be used to identify people and companies mentioned in your search results. More often than not they were the ‘usual suspects’ but occasionally it would flag up an unexpected association.

The service never realised its full potential and when it started dropping search options and cancelled its IPO, the rumours started flying and I stopped recommending it in my workshops. The search engine business is tough so no doubt many more will bite the dust before too long.

TouchGraph Google Browser

Thanks to Phil Bradley for the alert on this search visualisation tool.

TouchGraph is a  Java application that loads into your browser window and displays the networks and connections in your data or a search results list.  There are three free demos that tie into Google,  Amazon and Facebook.  I found the last two very cumbersome to use and limited in their usefulness but the Google one is worth a try.  You type in your keywords or a URL and Google’s results are presented as groups of interconnected ‘blobs’. You can zoom in and out, hide or expand individual groupings and filter results, although I must admit to not having yet mastered the last feature.

TouchGraph

An interesting way to explore search results but I still prefer Cluuz‘s visualisation of web based data.  I may find TouchGraph more useful for looking at connections and relationships within my locally held information so I have signed up for a TouchGraph Navigator trial.

Yahoo! drops Boolean NOT operator

I go away on holiday for three weeks and on my return disaster strikes. ” Yahoo! drops Boolean support” was the headline that jumped out at me as I fired up my RSS reader.  The story, carried by Pandia Search Engine News, refers to a report by Greg Notess in Online Magazine. I was not able to access the original in Online so had to make do with Pandia’s summary and test out the claims myself.

It turns out that it is only the NOT operator that has vanished. Not such a catastrophe after all as you can still place a minus sign in front of the term that you want excluded from the search results. The posting goes on to say that nesting with parentheses has also been dumped. That was definitely not the case when I carried out test searches on both the UK and US versions of Yahoo. All of my nested Boolean searches worked, once I had replaced the NOTs with minus signs.

I shall wait a a week or so before assuming that the change is permanent.  Earlier this year, the link and linkdomain commands went AWOL. They were no longer available on the main Yahoo sites but still worked on AltaVista. After a couple of days they reappeared on Yahoo as suddenly as they had vanished.  Hopefully NOT will be back – it is easier to remember to use only Boolean operators rather than a mixture of Boolean and math signs – and fingers crossed that none of the other commands disappear.

Cuil not so cool

Cuil, pronounced ‘cool’, was launched today claiming to have the largest database of all the search engines and more relevant results. A detailed article on the company’s background, together with a critique of its PR, can be found on Search Engine Land – Cuil Launches — Can This Search Start-Up Really Best Google?

Cuil’s index of 120 billion web pages might indeed make it the most comprehensive search engine on the web. Google no longer discloses the size of its index, although it claims to “know about a trillion unique web pages”.  Techcrunch has put Cuil through its paces (Google Beats Cuil Hands Down In Size And Relevance, But That Isn’t The Whole Story and says:

“It seems pretty clear that Google’s index of web pages is significantly larger than Cuil’s unless we’re randomly choosing the wrong queries. Based on the queries above, Google is averaging nearly 10x the number of results of Cuil.”

Many of Techcrunch’s searches are single word searches and Google’s hit count is never accurate for those and is usually grossly overestimated. Add in more terms and the hit count starts to approach reality. My test searches found approximately the same number of pages in both Cuil and Google, apart from those queries that had more than three terms. For example, my standard ‘gin vodka sales UK’ search came up with:

Cuil

This is a simple search for which all of the other major search engines find a highly relevant answer from the Gin and Vodka Association (yes there really is one) . Reduce the number of keywords to a maximum of three and Cuil recovers but returns irrelevant results. It looks as though Cuil cannot cope with long search queries and this could be related to the way it ranks pages. It claims not to just catalog keywords on a site and then rank the site based on its importance; it also tries to understand how words are related. This seems to work best with one or two word queries. Results are shown in three columns and on the right hand side of the screen it offers options to refine your results by category. Apparently,  Cuil also recommends related searches in tabs across the top of search results but I never saw these in any of my test searches.

Cuil Results

So far I was not impressed and for me the final straw was the total absence of any advanced search features. No file format searching, no limiting to media types, no domain or site searching. I could be charitable and attribute the absence of results for multi term strategies to the service being new and not being able to cope with the traffic generated by the launch announcement. The relevance of the results from my test searches, though, was not sufficiently high for me to be able to dispense with advanced search. Cuil is going to have to try a lot harder if it is to persuade me to return. At present this is not a Google beater and neither does it out-perform the other top search engines.

P.S. Just as I finished this review the Cuil web site collapsed under the pressure. The home page now re-directs to:

We’ll be back soon…

Due to overwhelming interest, our Cuil servers are running a bit hot right now. The search engine is momentarily unavailable as we add more capacity.

Thanks for your patience.”