Category Archives: Search Engines

Top Search Tips

I ran another advanced search workshop (Google and Beyond) for UKeiG on June 11th, this time in London. Twenty people attended the event and came up with the following list of top search tips at the end of the day.

1. Use the Advanced Search screen. There are lots of goodies to be found on the advanced search screens: options for focussing your search by file format (e.g. xls for data and statistics, ppt for expert presentations, pdf for industry or government reports); site and domain search to limit your search to just one web site or a type of organisation (e.g. UK government, US academic); and in Google there is a numeric range search.

2. Google Custom Search Engines (Google CSE) at http://www.google.com/coop/cse/. This made its first appearance in the Top Tips from the Liverpool workshop earlier this year. Ideal for building collections of sites that you regularly search, to create a searchable subject list, or to offer your users a more focused search option.

3. See what Google does with your search string.

a) If you use the default search box and Google comes back with odd results, click on Advanced Search to see what it has done with your search terms.

b) If you use the Advanced Search screen and fill in the boxes, see how Google formats the search strategy by looking the search box at the top of the results page. By learning the commands and prefixes you can build more specific searches more quickly on the default search page.

4. Cached copies. Look at the search engines cached copy of a web page if you can’t find your search terms in the document or if the page is nothing like the description in the results list. You will see the version of the page that has been used by the search engine for indexing and with your terms highlighted.

5. Use tools such as Intelways and Zuula for quick and easy access to a wide range of search tools covering different types of information. Enter your search once, click on the tab for the type of resource for which you are searching (video, images, reference, news etc.), and then work your way through the list of search engines.

6. Alacrawiki. The Alacra Spotlights section is a good starting point for evaluated sites and information on industry sectors. It is also a good example of what to look for when assessing the quality of a wiki and how easy it is for anyone to edit the pages. In the Spotlights sections there is no edit option , not even if you register for an account and login. Only the Alacra editors can edit the pages.

7. Open access journals. Google Scholar sometimes leads you to copies of journal articles in institutional repositories and open access journals, but there are also directories of open access journals. For example: http://www.doaj.org/ , http://www.wsis-si.org/oa-journals.html, http://www.abc.chemistry.bsu.by/current/fulltext.htm . This is not my area of expertise so comments on other directories are welcome.

8. Social bookmarking sites. Try social bookmarking sites, not only for creating your evaluated lists of sites but for searching other peoples. For example FURL, Del.icio.us, Connotea, 2Collab . Connotea (owned by the Nature Publishing Group) and 2Collab (owned by Elsevier) are aimed at researchers and scientists.

9. Search results visualisation. Try out some of the newer search tools that present results and search options in a different way. For example Cluuz, Kartoo, Kvisu, Quintura. [Some of the participants specifically mentioned Cluuz and Kvisu].

10. The Internet Archive (Wayback Machine) at http://www.archive.org/ for pages, sites and documents that have disappeared. Ideal for tracking down lost documents, seeing how organisations presented themselves on the Web in the past, and for collecting evidence for a legal case (e.g. ‘passing off’, copyright infringement).

Top Search Tips – May 2008, Liverpool

UKeiG’s recent Liverpool Internet search workshop was filled to capacity. It was a packed day with a significant amount of new content and plenty of time for participants to try out the tools and techniques for themselves. At the end of the day they were asked to compile a list of their top tips. There were the usual suspects but the Google Custom Search Engine was new. It is the first time that we have covered Google CSE in the workshop and it generated so much interest that UKeiG will be producing a fact sheet on it. The full list of top tips is as follows:

1. Use the ‘site:’ command to search individual web sites that have appalling navigation and useless site search engines.

2. Search for file formats to narrow down and focus your search. For example search for Word documents or PDFs if you are looking for government or industry reports; xls for data and statistics; ppt or pdf for presentations.

3. Try something else other than Google. Have one Google free day or hour a week. Change the home page in your browser if it is set to Google.

4. Use the OR command in combination with the site: command to search more than one site or type of site. For example,

"carbon emissions trading" filetype:ppt site:ac.uk OR site:gov.uk

5. Don’t believe all you see, especially when it comes to people searches and mashups. [Mashups combine information from several different sources to produce a single new resource.]

6. If the information is critical, always cross and double check the accuracy of the information with independent sources.

7. Books are still relevant. For example, if you are new to a subject or industry sector try and find an introductory text that can help you with the terminology. They are also excellent for historical information. As well as Amazon, try Google Books (http://www.google.com/books/) for older texts, and Live Books (http://search.live.com/books/).

8. Use services such as Zuula or Intelways to remind you of the different types of information that are available and their appropriate search engines. Type in your search once and click on the search tools one by one.

9. Build your own Google Custom Search Engine for collections of sites that you regularly search, to create a searchable subject list, or to offer your users a customised, more focused search option.

10. Try good old fashioned Boolean. Yahoo, Exalead and Live support AND, OR, NOT and ‘nested’ searches, but don’t go overboard. Remember to type in the operators as capital letters. otherwise the search engines will ignore them as stop words.

11. Make use of proximity searching.

a) Double quote marks around your search terms to force a phrase search works in all of teh search engines. For example

"carbon emissions trading"

b) In Google, use the asterisk (*) to find your terms separated by one or more terms but close to one another. There is no information in the help files on the maximum separation. Increasing the number of asterisks is not supposed to make a difference but it does and it appears that one asterisk stands in for one word.

c) The Exalead NEAR command finds words within a maximum of 16 terms within each other. You can control the degree of separation by using NEAR/n where ‘n’ is a number specified by you. For example

climate NEAR/3 change

12. Try social bookmarking services to track down other people’s research lists on a subject. For example del.icio.us, Furl, Connotea, Citulike,

13. If you are looking for formatted files search Yahoo as well as Google. One participant tested several searches on both and found that Yahoo consistently came up with more. This could be due to different coverage of the two services but is more likely to be down to the fact that Google indexes the first 100K of a document but Yahoo indexes 500K. [Karen Blakeman comments: also search in Live.com. I recently found two unique documents via Live.com that contained vital information on a company that I was researching].

14. The Internet Archive (Wayback Machine) at http://www.archive.org/ for pages, sites and documents that have disappeared. Ideal for tracking down lost documents, seeing how organisations presented themselves on the Web in the past, and for collecting evidence for a legal case.

15. Partially Answer your question in your search strategy. For example

"A hippopotamus can run at"

New look for Live.com: two steps forward, three steps back

Microsoft have launched their new look for Live.com. It has the now obligatory minimalist look, which was already evident in the previous interface, but has at long last added a link to the Advanced Search option on the home page. Also new to this version is the option to receive an RSS alert for news searches. “Hurrah!” I shouted, “At long last they are listening to users and in danger of threatening Google’s crown”.

Unfortunately, my joy was short lived.

1. The advanced search screen is still pathetic compared to Google’s, Yahoo’s and Exalead’s, and there is no filetype search option. You have to use the ‘filetype:’ command in the default search box

2. A major issue I have had with Live is that it offers different search options and results displays depending on which ‘country version’ you are using (see the slide below from one of my recent presentations comparing the UK and US versions and number 3 on the News search).

Live.com: UK vs US versions – March 2008

In the previous version of Live.com you could force it to switch from, for example, the UK to the US version by going into the Language option and choosing English (US) instead of English (UK). Now, there is no differentiation between US and UK. I thought I might be able to solve this problem by going into Options and changing the location at which Live thinks I am based. It assumes London but even when I tell it that I am in New York, United States it still insists that I am in the UK! A minor issue you might think but if your ISP gives Live an IP address in Frankfurt, Australia or wherever and Live is telling you that it is going to give you customised results according to your location – well, what is the point? Google and Yahoo give you the option to switch between different country versions whenever you want.

3. At long last they have implemented RSS feeds for news search alerts, but then I realised that I was looking at the boring old news results for the UK and not the super-duper display that the US now sees (see my earlier posting on this issue). And there is no way that I can find, other than going through an anonymous proxy server based in the US, to gain access to the US version.

The verdict? I have to partially agree with Phil Bradley’s comment “If I get a delegate on a course asking me why Live Search should be their preferred search engine I simply couldn’t give them a good reason.” They have so much going for them and then they totally mess it up. Their database is the most up to date for many of the sites that I search on; the coverage seems to be better; they have a worthy competitor to Google Scholar in Academic Live; Maps, Books and Live Earth are pretty good too. So why do they keep shooting themselves in the foot with c**p interfaces?

PharmaLive Search

PharmaLive Search

PharmaLive Search (http://www.pharmalivesearch.com/) is maintained by publisher PharmaLive and powered by Convera. It searches PharmaLive’s collection of publications, selected industry and therapeutic web sites, and relevant blogs. You can search Publisher Recommended Sites, PharmaLive.com, or the Web. Although the web search option is not limited to specially selected sites, they are ranked for relevance with respect to the subject and have been filtered for spam.

‘Publisher recommended sites’ covers over 25 000 000 documents from more than 2600 selected domains. ‘Search PharmaLive’ provides results from the PharmaLive itself and its print publications. Much of the information is free but articles from PharmaLive’s own publications are priced.

Variations in spellings, for example hypoglycaemia and hypoglycemia, are automatically included in your search as are synonyms. My search on mushroom poisoning picked up terms such as toxicity, and ‘adverse reactions’ picked up ‘adverse effects’. As one would expect from a vertical search engine, the synonym identification is vastly superior to Google’s efforts. If you make a complete mess of the spelling it automatically prompts you with a “did you mean..” option.

At the top of your results page there are links to broader, narrower and related terms, and to the left of the screen are displayed most popular searches and related concepts. You can focus your search further by using the category tabs. For example, if you have chosen to search publisher recommended sites there are tabs for All, Associations, Media, Government (as far as I could see this covers relevant bodies world-wide), Companies, R&D, and Blogs. When searching PharmaLive.com you are offered options for Med Ad News, R&D Directions, and Pharma Live News. Those articles that are only available on subscription are marked with a blue padlock.

If you regularly search for pharmaceutical and health care information , add this one to your list of tools and also consider downloading the PharmLive Search toolbar.

Tinfinger people search

Phil Bradley was not very complimentary in his blog about Tinfinger, a people search tool. His search on Gordon Brown, the current UK prime minister, came up with an entry that had clearly been computer generated and which was full of inaccuracies. Tinfinger’s Paul Montgomery confirmed in a comment to Phil’s posting that a lot of the content is pre-generated, but that the intention is for people to use that raw data to write full profiles.

It seems that Phil’s posting encouraged someone to update and correct Gordon Brown’s profile as the information is now correct – or was when I looked at it. Tony Blair’s is a different matter, though. There is no profile but there is a picture and several “tags”:

“Blair has the name of The Rt Hon. Tony Blair. Blair has the type of Alumni of the Inns of Court School of Law. Blair is on the Sedgefield (UK Parliament constituency) list. Blair is listed with the external id of 0086363. Blair came before “(constituency created)“. Blair is an devotee of the Anglican religion. Blair started in 1983. Blair was preceded by John Major. Blair’s spouse is Cherie Blair. Blair has the order of Prime Minister of the United Kingdom. Blair’s term started on 1997052. Blair represents the constituency of Sedgefield (UK Parliament constituency). Blair’s deputy was John Prescott. Blair’s job is politician.”

I would love to know from where that information was collected! It is early days for Tinfinger and hopefully people will get on board and start editing and creating more accurate profiles. At present it is an example of how wrong information can be when it is automatically generated, and the importance of being aware of how content is compiled.

Tinfinger Profile of Tony Blair

FuzzFind Web Search

Yet another gem discovered via Phil Bradley’s blog. FuzzFind Web Search is a meta search tool that combines Google, Yahoo and Live searches with results from social bookmarking site Del.icio.us. The results are amalgamated and deduplicated into a single list. So what? Meta search tools are 10 a penny. But I really like this one because of the way it shows me where each result has been found in each search tool by placing icons and numbers to the left of each entry in the results list. I don’t know why that particular and very simple feature impresses me but it does. Search, and the presentation of the results, is a very personal thing and what works for one person will not necessarily work for someone else. This one really works for me.

fuzzfind.gif

Zuula Search

Zuula is another search engine along the lines of Intelways and Trovando. You type in your search once and then run it through several search engines one by one. The search engines are grouped into Web, Images, Video, News, Blog and Jobs. It is not as wide ranging or as comprehensive as Intelways, for example it does not have a group for searching by file formats, but it does offer an Advanced Search screen that includes a domain search. You can also keep a list of your previous searches.

Zuula

Chipwrapper – Search UK newspapers

Chipwrapper

Chipwrapper is a Custom Google Search Engine that searches across the UK’s major national newspapers: The Daily Express, Daily Mail, Daily Mirror, Financial Times, The Guardian, The Independent, The Sun, The People, News of the World, The Scotsman, Daily Star, The Telegraph and The Times. It also searches the BBC News web site, ITN and Sky. It carries out a Google web search of these sources, not a Google News search, so although you cannot sort the results by date you do pull up older, archival material that is not included in the standard 30 days of Google News. As many of the newspapers give the date as part of the text of the stories you can often limit your search to a year and sometimes a specific month by simply including the month and year in your search. Also available are RSS feeds for the top headlines, general sport, rugby and football headlines, and a Chipwrapper browser search plugin is available for IE7 and Firefox 2.

I ran one of my standard test searches on Richard Budge the UK coal magnate, and included 2007 in the search strategy to limit the stories to the current year. It worked impressively well but the FT was conspicuous by its absence, and I knew that there had been at least two articles about him in the FT this year. I went direct to the FT site, ran a similar search and found three articles. I then did a Google Web search using my Chipwrapper strategy but added site:ft.com. Again three results. Back to Chipwrapper with some different searches, and it did pick up FT articles. Obviously there is something about my Richard Budge search that it does not like.

Comparison with Google News Archive

Of course Google News is probably the service that most people will use as the benchmark and this is where it becomes really interesting. Back to dear old Richard and a search in the Google News Archive. I went into the Advanced Search screen, entered the phrase Richard Budge, typed in 2007 for the year and selected Show Timeline. Unlike the Advanced Search in current Google News you cannot specify a country of origin for the source so I had to resign myself to the possibility of wading through a substantial number of articles. Google Archive News does, though, give you an option to home in on a specific month via the Timeline (see below).

Google Archive News Results with Timeline

It came up with 94 results, about twenty of which claimed to be from the Financial Times and dated Jan 1 2007. I clicked on a few of the links and they took me to the “Access My Library” site where I was repeatedly told that the articles had been deleted. I gave up after eight or nine, but I think we can assume that the FT has decided not to play ball. Three links with $$ signs next to them took me to Press Display but I was told that the items had been “removed from the back issues access”. A minority of the links took me direct to the news source, for example The Telegraph, BBC, Doncaster Today. The articles from the Guardian, Independent and Times that had been picked up by Chipwrapper were nowhere to be seen. I can only think that like the FT they have declined Google’s offer to be part of the Archive.

Conclusion

I have to confess that this is the first time I have analysed the results from Google News Archive in any depth. I was not surprised to find the FT absent but amazed that so many of the other UK daily papers were not there. Even worse, Google still has in its index links to stories that were carried by third party services, such as Access My Library and Press Display, but which have now been removed. The Timeline is still a good way of looking at major stories relating to a company or person but be aware that some of the key resources are not included.

Despite the glitch with the FT in my test search, and that there is no date sort option, Chipwrapper is a great tool for searching new and archival stories appearing in the leading UK papers. I recommend that you give it a go next time you need to research a UK story.

Note on the name Chipwrapper for non UK readers

A favourite take-away food in the UK is deep fried, battered fish with chips (mushy peas are optional but an essential component as far as I am concerned). Today, EU and Health and Safety regulations dictate that these have to be placed in grease proof paper (in practice not grease proof at all) and then wrapped in large plain sheets of off white paper. I recall that in my dim and distant youth newspaper was regarded as the superior wrapping material. Some connoisseurs claimed that the ink, which dissolved in the presence of the salt and vinegar, gave extra flavour.