All posts by Karen Blakeman

I have worked in the information profession for over twenty years and have been a freelance consultant since 1989. My company (RBA Information Services) provides training and consultancy on the use of the Internet, and on accessing and managing information resources. Prior to setting up RBA I worked at the Colindale Central Public Health Laboratory, and then spent ten years in the Pharmaceutical and Health Care industry before moving to the International management consultancy group Strategic Planning Associates. I edit and publish an electronic newsletter called Tales from the Terminal Room. Other publications include Search Strategies for the Internet. I am a Fellow of CILIP: The Chartered Institute of Library and Information Professionals, an active member of the UK eInformation Group (UKeiG) and a member of the Association of Independent Information Professionals (AIIP)

Yahoo! drops Boolean NOT operator

I go away on holiday for three weeks and on my return disaster strikes. ” Yahoo! drops Boolean support” was the headline that jumped out at me as I fired up my RSS reader.  The story, carried by Pandia Search Engine News, refers to a report by Greg Notess in Online Magazine. I was not able to access the original in Online so had to make do with Pandia’s summary and test out the claims myself.

It turns out that it is only the NOT operator that has vanished. Not such a catastrophe after all as you can still place a minus sign in front of the term that you want excluded from the search results. The posting goes on to say that nesting with parentheses has also been dumped. That was definitely not the case when I carried out test searches on both the UK and US versions of Yahoo. All of my nested Boolean searches worked, once I had replaced the NOTs with minus signs.

I shall wait a a week or so before assuming that the change is permanent.  Earlier this year, the link and linkdomain commands went AWOL. They were no longer available on the main Yahoo sites but still worked on AltaVista. After a couple of days they reappeared on Yahoo as suddenly as they had vanished.  Hopefully NOT will be back – it is easier to remember to use only Boolean operators rather than a mixture of Boolean and math signs – and fingers crossed that none of the other commands disappear.

Cuil not so cool

Cuil, pronounced ‘cool’, was launched today claiming to have the largest database of all the search engines and more relevant results. A detailed article on the company’s background, together with a critique of its PR, can be found on Search Engine Land – Cuil Launches — Can This Search Start-Up Really Best Google?

Cuil’s index of 120 billion web pages might indeed make it the most comprehensive search engine on the web. Google no longer discloses the size of its index, although it claims to “know about a trillion unique web pages”.  Techcrunch has put Cuil through its paces (Google Beats Cuil Hands Down In Size And Relevance, But That Isn’t The Whole Story and says:

“It seems pretty clear that Google’s index of web pages is significantly larger than Cuil’s unless we’re randomly choosing the wrong queries. Based on the queries above, Google is averaging nearly 10x the number of results of Cuil.”

Many of Techcrunch’s searches are single word searches and Google’s hit count is never accurate for those and is usually grossly overestimated. Add in more terms and the hit count starts to approach reality. My test searches found approximately the same number of pages in both Cuil and Google, apart from those queries that had more than three terms. For example, my standard ‘gin vodka sales UK’ search came up with:

Cuil

This is a simple search for which all of the other major search engines find a highly relevant answer from the Gin and Vodka Association (yes there really is one) . Reduce the number of keywords to a maximum of three and Cuil recovers but returns irrelevant results. It looks as though Cuil cannot cope with long search queries and this could be related to the way it ranks pages. It claims not to just catalog keywords on a site and then rank the site based on its importance; it also tries to understand how words are related. This seems to work best with one or two word queries. Results are shown in three columns and on the right hand side of the screen it offers options to refine your results by category. Apparently,  Cuil also recommends related searches in tabs across the top of search results but I never saw these in any of my test searches.

Cuil Results

So far I was not impressed and for me the final straw was the total absence of any advanced search features. No file format searching, no limiting to media types, no domain or site searching. I could be charitable and attribute the absence of results for multi term strategies to the service being new and not being able to cope with the traffic generated by the launch announcement. The relevance of the results from my test searches, though, was not sufficiently high for me to be able to dispense with advanced search. Cuil is going to have to try a lot harder if it is to persuade me to return. At present this is not a Google beater and neither does it out-perform the other top search engines.

P.S. Just as I finished this review the Cuil web site collapsed under the pressure. The home page now re-directs to:

We’ll be back soon…

Due to overwhelming interest, our Cuil servers are running a bit hot right now. The search engine is momentarily unavailable as we add more capacity.

Thanks for your patience.”

Guernsey Registry

The new Guernsey Registry was launched on 1 July 2008 following a review of the Companies (Guernsey) Law and the existing company processes. The Registry also includes the Bailiwick of Guernsey Intellectual Property Office which administers all Trade Mark and Design registrations and activities, and provides information on the Intellectual Property Laws enacted in the Bailiwick. As well as providing e-filing services for companies, there is a searchable database. You can search by registered number or company name (current and previous). Company number, name, type, status, registered address, activity, and date registered are available free of charge. Company profiles, certificates of good standing and of incorporation are priced as are the annual reports.  There is a link to the Companies (Guernsey) Law 2008, which can be downloaded free of charge.  Be warned – it is 569 pages long!

BookMooch – book swap

BookMooch lets you give away books you no longer need in exchange for books you really want. Every time you give someone a book, you earn a point and can get any book you want from anyone else at BookMooch. You receive a tenth of a point for every book you make available on BookMooch, and one point each time you give a book away. In order to keep receiving books, you need to give away at least one book for every three you receive. You receive three points when you send a book out of your country, to help compensate you for the greater mailing cost, but it only costs the moocher two points to get the book. Sounds a little complicated but in practice the process is very straightforward. The only minor downside is that it taps into the Amazon database and if your book is not listed on Amazon – and Amazon does sometimes manage to mess up bibliographic data – you can’t add it to your BookMooch ‘inventory’.

Say no to 0870 – alternative telephone numbers

Say No to 0870 has been around for a while but it is only recently that I have had to make serious use of its database. I have been researching hotel business meeting facilities and was amazed at how many still use 0870 numbers. My middle name ought to be ‘Scrooge’ because I resent having to pay for a phone call when I could be connecting using my mobile’s very generous free minutes (geographic numbers only) allocation. When that is used up, I have various VOIP services that I can use for a few pence.

This web site gives alternative geographic numbers for 0800, 0808, 0844, 0845, 0870 and 0871 numbers. Search on the company name or its non-geographic number and you are offered alternative geographic numbers. Sources include web sites and contributions from users. You may wonder why freephone numbers are included: these often take you to a central call centre or redirect you to a branch location based on your telephone area code (if available). You then spend what seems to be forever trying to get connected to the right branch or location. “No I don’t want to book a room …. and, yes, I know I am phoning from Newcastle but I want to contact the Events Sales Manager for Whizzo Hotels in Oxford!”

Allplus search tool

Allplus is yet another search tool that searches Ask, Google, Live and Yahoo in one go. As well as web pages, you can also search the News, Image, Video and Blog databases of the search engines. Moreover is added to the mix for News. In addition to displaying the results for your selected type of information, it  gives you items from the other resources on the right hand side of the page.  On the left hand side of the screen, it organises your results into a ‘cluster tree’ and you can also view a ‘cluster graph’.  Clicking on the topic in the tree or graph brings the pages in that cluster to the top of the results list and is a useful way of focussing your search.

Allplus

Interlend 2008

The presentation and workshop that I gave at the Interlend 2008 conference in Peebles are now available at http://www.rba.co.uk/presentations/index.htm . As well as the PowerPoints there are links to versions available on Slideshare and authorSTREAM.

The tag for the conference was interlend08. Two of us were regularly ‘twittering’ at the conference and most of our tweets are on Hashtags and Twemes (search on interlend08) .  A record of my own tweets is at http://karenblakeman.livejournal.com/2008/07/01/ and http://karenblakeman.livejournal.com/2008/07/02/.

Tag clouds for analysing documents

CV not getting you those all important interviews? Nobody answering your job advert? Or perhaps your corporate publicity is not doing the biz? Processing your document through a tag cloud generator might give you a clue as to where you are going wrong. Sue Hill gave a presentation at the recent City Information Group open day on CPD and skills. In passing she mentioned that they sometimes run a CV or job description through a tag cloud generator to show people why their lovingly created prose is way off the mark. The tag cloud brings to the fore your most used terms and it can be a shock to discover that you have placed the emphasis in totally the wrong area. It then struck me that you could do this with any form of literature – a web page, training publicity, membership recruitment forms.

There are dozens, if not hundreds, of tag cloud generators on the Web and most of them are free. For starters try Wordle, Tagcrowd, or Tag Cloud Generator. The example below is a tag cloud of the UKeiG home page generated by Wordle.

UKeiG tag cloud generated by Wordle

Top Search Tips

I ran another advanced search workshop (Google and Beyond) for UKeiG on June 11th, this time in London. Twenty people attended the event and came up with the following list of top search tips at the end of the day.

1. Use the Advanced Search screen. There are lots of goodies to be found on the advanced search screens: options for focussing your search by file format (e.g. xls for data and statistics, ppt for expert presentations, pdf for industry or government reports); site and domain search to limit your search to just one web site or a type of organisation (e.g. UK government, US academic); and in Google there is a numeric range search.

2. Google Custom Search Engines (Google CSE) at http://www.google.com/coop/cse/. This made its first appearance in the Top Tips from the Liverpool workshop earlier this year. Ideal for building collections of sites that you regularly search, to create a searchable subject list, or to offer your users a more focused search option.

3. See what Google does with your search string.

a) If you use the default search box and Google comes back with odd results, click on Advanced Search to see what it has done with your search terms.

b) If you use the Advanced Search screen and fill in the boxes, see how Google formats the search strategy by looking the search box at the top of the results page. By learning the commands and prefixes you can build more specific searches more quickly on the default search page.

4. Cached copies. Look at the search engines cached copy of a web page if you can’t find your search terms in the document or if the page is nothing like the description in the results list. You will see the version of the page that has been used by the search engine for indexing and with your terms highlighted.

5. Use tools such as Intelways and Zuula for quick and easy access to a wide range of search tools covering different types of information. Enter your search once, click on the tab for the type of resource for which you are searching (video, images, reference, news etc.), and then work your way through the list of search engines.

6. Alacrawiki. The Alacra Spotlights section is a good starting point for evaluated sites and information on industry sectors. It is also a good example of what to look for when assessing the quality of a wiki and how easy it is for anyone to edit the pages. In the Spotlights sections there is no edit option , not even if you register for an account and login. Only the Alacra editors can edit the pages.

7. Open access journals. Google Scholar sometimes leads you to copies of journal articles in institutional repositories and open access journals, but there are also directories of open access journals. For example: http://www.doaj.org/ , http://www.wsis-si.org/oa-journals.html, http://www.abc.chemistry.bsu.by/current/fulltext.htm . This is not my area of expertise so comments on other directories are welcome.

8. Social bookmarking sites. Try social bookmarking sites, not only for creating your evaluated lists of sites but for searching other peoples. For example FURL, Del.icio.us, Connotea, 2Collab . Connotea (owned by the Nature Publishing Group) and 2Collab (owned by Elsevier) are aimed at researchers and scientists.

9. Search results visualisation. Try out some of the newer search tools that present results and search options in a different way. For example Cluuz, Kartoo, Kvisu, Quintura. [Some of the participants specifically mentioned Cluuz and Kvisu].

10. The Internet Archive (Wayback Machine) at http://www.archive.org/ for pages, sites and documents that have disappeared. Ideal for tracking down lost documents, seeing how organisations presented themselves on the Web in the past, and for collecting evidence for a legal case (e.g. ‘passing off’, copyright infringement).

Energy Export Databrowser

The Energy Export Databrowser, set up Jonathan Callahan, is based on BP’s 2007 Statistical Review and provides a quick and easy way to view country data on consumption, import and export of crude oil and natural gas. It covers over 80 countries and data goes back to the 1960s. There is feedback on the browser itself and an interesting discussion on the accuracy and validity of the underlying data on The Oildrum.