Category Archives: Search Strategies

Google dumps ‘+’ operator

You will either have read about this on other blogs or found out yourself when searching that Google has dumped the ‘+’ operator. This was a useful way to stop Google automatically searching for variations and synonyms of your terms. The theory was that by prefixing your term with a plus sign Google would be forced to look for an exact match. Try it now and Google tells you to use double quotation marks around the term instead. To be honest, the + sign has not worked reliably for several months but I often have the same problems with the double quotation marks. If I search on St. “Laurence” or “St. Laurence” Google still includes page on St Lawrence in my results.

Search Engine Land has covered the news and the reason why + has been dumped in “Google Removes The + Search Command” http://searchengineland.com/google-sunsets-search-operator-98189. It suggests + has been dumped because of Google+, their social network, and Google now suggests auto completing your friend’s names when you use the operator. As Danny Sullivan comments “it seems to have been tossed out and replaced by quotes because of a problem Google created for itself, by picking stupid names for its social network.

I’ve noticed anther worrying trend – Google does not always look for all of my terms in the page. Viewing the cached copies of some of my results I see that not only are some of my terms missing from the page itself but they are not even in links to the page. So is Google now deciding when to ‘OR’ our search terms?

Google Related is not Google related:

Google recently announced a new toolbar called Google Related (http://www.google.com/related/). It is available as a Chrome extension and as part of the Google Toolbar for Internet Explorer but not for the latest versions of Firefox. Google Related displays a bar along the bottom of your screen that shows videos, maps, images and web pages relevant to the page you are currently on. Nothing new here, I thought. Google has had a related: command and links to similar pages next to entries on your results page for a long time but Google Related is in fact very different.

If I view similar results for the  First Great Western home page or type in related:http://www.firstgreatwestern.co.uk/ I see a list of sites that are similar in type and content to First Great Western for example National Rail Enquiries, Virgin Trains, Trainline, Arriva, Southwestern Trains.

Google related command

If you have found a page that contains the type of information you are looking for ‘related:’ has always been a good way to find additional sites containing similar content and of similar quality. The option is also available on the Advanced Search screen, but you have to click on the ‘+ Date, usage rights, region, and more’ link and it is at the bottom of the page under ‘Page specific tools’ (Anyone would think they were trying to hide it from us!).

Google Related’s results are very different. Once installed it monitors every page you browse and offers content about or directly related to your current page.

Google Related Toolbar

Google does not always find related information for every page and currently only supports English language pages. I found that it worked for about 60% of pages that I viewed. There also seems to an unhealthy abundance of Wikipedia articles: one would be more than enough. It does sometimes come up with interesting additional information but there are times when it is not needed and its intrusion infuriating. I really do not want it getting in the way when I am booking train tickets, as in the above example, or listening to BBC Radio 4. Thankfully it can be switched off for individual pages or whole sites but I cannot see a way of switching it back on if I later change my mind. You should also be aware that Google personalizes the results of the Related bar by watching how you search and navigate the web, so what I see in the bar for a particular page will eventually not be what you see.

Overall, some of the information it finds is interesting but to be honest I have not found much that is vitally important to my search or browsing activity. If you are curious have a go and see if it works for you.

Google still thinks coots are possibly cats (or cows)

I have been dining out on the ‘Google thinks cats are lions’ story for several months but decided that its inclusion in my presentation at INFORUM 2011 in Prague should be its last outing. (See my blog postings at http://www.rba.co.uk/wordpress/2011/02/12/google-decides-that-coots-are-really-lions/ and http://www.rba.co.uk/wordpress/2011/02/21/update-on-coots-vs-lions/ for the details on this story). Towards the end of my talk I pointed out that Google has now abandoned coots=lions and carries out what I consider to be a normal search for coots mating behaviour, or as normal as any Google search can be. I had checked in Google.co.uk, Google.com and Google.cz a couple of weeks before the INFORUM conference and coots were definitely black, medium sized water birds and not large furry mammals with huge fangs and claws. As I concluded my presentation, though, I saw a few people in the audience staring at their laptops and shaking their heads. One of them came up to me during the break and pointed out that Google Czech Republic was offering cats instead of coots for the first two results. This prompted a quick review of the Google coots/cats/lions situation.

The search: coots mating behaviour

Google.co.uk gives a reasonable set of results but having blogged and included details of the search in so many presentations and newsletters my own pages are taking over the top positions in the results.

Google UK coots search

Google.com gives similar results.

Google.cz however has different ideas. It offers me three articles from Google Scholar and then says “Did you mean cats mating behaviour” in Czech and gives me two results on that subject. The rest of the results are all about coots, so at least Google.cz is giving me my original search as an option rather than unilaterally deciding I really meant cats.

Google Czech Republic and Coots

Looking at other country versions of Google, Google.no and Google.se came up with similar results. Google Germany, however, thinks coots are cows and even throws in a Youtube video:

Google Germany Coots

I am not going to even begin to try and work out what is going on. Three of us nearly went mad attempting to get to the bottom of the original coots=lions oddity. But it does make one wonder even more whether Google can be trusted to come up with even a handful of useful results.

Kent’s Top Tips for Google search

This week I was in Kent running a full day workshop on using Google for search. The participants came from a variety of subject backgrounds and interestingly their Top Tips have a slight technical bias.

1. Preview. Click on the magnifying glass next to an entry in your results list to see a preview of the page highlighting areas containing your search terms

2. Google Trends. http://www.google.com/trends Enter up to five topics and see how often they’ve been searched in Google over time and in different geographic regions. This is a way of identifying how people are searching on a subject. Several workshop participants also explored the Websites option: enter a web site URL and see what other sites people visited. This can be useful if you are not sure about the main sites that cover a subject and want to expand your search from the one you have already found; for example enter moveto.co.uk and a list including other UK online estate agent sites appears. It was interesting to see that typing in a UK university web site came up with job and CV pages!

3. Take control. A tip that has been proposed in other workshops – sometimes along the lines of “Don’t let the ******s grind you down!”. Switch off targeted advertising by going to http://www.google.com/ads/preferences/. Log out of your of your Google account so that you are not bombarded with tweets and other stuff from members of your social networks. Use advanced search commands and the side bar on the results page to focus your search.

4. Timeline and time restriction options in the results page side bar for the most recent news but also for historical information.

5. Public data explorer. http://www.google.com/publicdata/home There was a lot of interest in this but some of the participants wanted to merge data sets from different sources, which led them to….

6. Fusion tables. http://www.google.com/fusiontables/Home “Merge data from multiple tables. Visualize your data on maps, timelines and charts.” This in turn led them to ….

7. Google Labs http://www.googlelabs.com/ – lots of interesting experiments including Google Correlate http://correlate.googlelabs.com/

8. For a different perspective, search for pages and sites in other languages using the Translated foreign pages option in the results page side bar.

9. Don’t forget the advanced search commands. It is all too easy to slip into lazy searching habits when everything goes well and results are relevant. If Google starts to do its own thing and comes up with bizarre results then using an advanced search command (e.g. filetype: or site:) or one of the two tips mentioned below usually brings it heel.

10. Two reliable standbys for getting out of a search rut or forcing Google to give you different results: repeat your main term(s) to change your  results (sometimes radically), and/or change the order of your terms in your search strategy.

(Some of my recent Google and search workshop presentations and notes can be found at http://www.rba.co.uk/as/)

All About Google – Top Tips

As well as the “Anything BUT Google” sessions, I have also been running “All About Google” workshops. The participants are asked to come up with a group Top 10 Tips and a combined list from the last three events is listed below. Many tips were common to all three so the final list has 16 tips. I also spotted people experimenting with the Google Art Project (http://www.googleartproject.com/), Fusion Tables (http://www.google.com/fusiontables/), Google Custom Search Engines (http://www.google.com/cse), Google Internet Statistics (http://www.google.co.uk/intl/en/landing/internetstats/), and one person found Google Labs Transliteration (http://www.google.com/transliterate/) very useful.

1. Use the filetype: command or the file format option on the Advanced Search screen to limit your research to PowerPoint for presentations, spreadsheets for data and statistics or PDF for research papers and industry/government reports. Note that filetype:ppt, for example, will not pick up the newer .pptx so you will need to incorporate both into your strategy, for example filetype:ppt OR filetype:pptx

2. Use the plus sign (+) before a term or phrase to try and force an exact match – be aware, though, that Google sometimes still does what it wants with your terms – or use the minus sign immediately before a term to exclude pages that contain it. The minus sign can also be used with commands to exclude, for example, a specific site (-site:nameofsite.com) or a file format (-filetype:ppt) from your results.

3. Include the site: command in your strategy or use the domain/site box on the advanced search screen to focus your search on particular types of site, for example site:nhs.uk

4. Try the two proximity commands. An asterisk (*) between two words will look for your words in the order specified and separated by one or more terms, for example solar * panels. The AROUND(n) command, which is undocumented, looks for your terms in either order separated by the number of words (n) specified, for example solar AROUND(2) panels. Note that AROUND did not work for everyone on the workshop.

5. Usage rights. Use the Advanced Search screens for the web and image search to limit your search to Creative Commons material. The options are in the pull down menu under Usage Rights.

6. Use Google Realtime (http://www.google.com/realtime) for searching Twitter. Other social networks are supposedly included but the results are usually dominated by Twitter. Archives go back to February 2010 and there is a useful timeline that enables you to visualise activity over time and look at specific dates.

7. Use the tilde (~) before a term to search for synonyms. For example ~energy will search for energy, power, oil, gas, electricity or electric.

8. Wonder wheel. This can be found in the side bar to the left of your web search results page. Google pulls out terms and phrases from the top results and represents them as spokes on a wheel. Click on one of them and your search is revised and another wheel created. You can view the list of results to the right of the wheel. Note: the Wonder wheel is not available if you have Instant Search switched on.

9. Change the order in which you enter your search terms. This will change the order in which your results are presented and in some cases can change the search completely.

10. Repeat important terms to change the order in which results are presented. Like changing the order of your search terms, this can sometimes significantly alter the results.

11. Google Reader (http://www.google.com/reader). As well as using to aggregate RSS feeds that you have entered individually the Add Subscription box also allows you to search for new feeds using keywords.

12. Google Scholar (http://scholar.google.com/). Although there are serious limitations to Google Scholar and the advanced search options are unreliable it can be very useful in tracking down the details of a half remembered reference. One member of the workshop explained that students often fail to accurately note down articles mentioned in lecturers. The specialist databases do not always retrieve the references in these cases whereas Google Scholar often does.

13. Google Scholar for citations. Although far from comprehensive and sometimes inaccurate not everyone can afford the more reliable but expensive databases. (Note: although it does not cover all subjects it is worth looking at Microsoft Academic Search at http://academic.research.microsoft.com/as an alternative).

14. Quality. Just because you found something through a Google search does not mean it is true or a trusted source, or that it is the most relevant document. Young students in particular often need to be reminded of this.

15. Open up the side bars to the left of your results. The options change depending on the type of search (general web search, images, news, books, recipes) and it is the key to narrowing down your search, especially by date.

16. Stand your ground! Don’t let Google take over. Clear your web history, cache and cookies. If you are responsible for access to the internet in your information centre or library, set up the browsers so that web histories and caches are cleared everytime a user logs out.  (You may need to enlist help from IT to set this up)

 

Anything but Google – URLs

I omitted to include the URLs of some of the specialist tools mentioned in the Anything but Google presentation. You could Bing or Yahoo the names of the services (we’re not going to Google them are we?) but to save time I’ve listed them below.

ChemSpider – Database of Chemical Structures and Property Predictions
http://www.chemspider.com/
Owned by the Royal Society of Chemistry Chemspider links together compound information across the web and provides free text and structure search of millions of chemical structures. Search by systematic name, synonym, trade name, registry number, SMILES or InChI.

Biznar http://biznar.com/
Live federated search from Deep Web Technologies and covering 60 business collections. As well as presenting you with a standard list of results, the pages are organised into folders on the left hand side of the screen covering topics, authors, publications, publishers and dates (years).

TechXtra http://www.techxtra.ac.uk/
This is an initiative of Heriot Watt Universit providing a free service for finding articles, books,industry news, job announcements, technical reports, technical data, full text eprints, thesis and dissertations in engineering, mathematics and computing.

Scirus http://www.scirus.com/
Owned by Elsevier, Scirus covers scientific information. (See the About Us section for the full details). Some of the information is from free web resources but it also includes many priced articles.

PhilPapers: Online Research in Philosophy http://philpapers.org/
Directory of online philosophical articles and books by academic philosophers. Its purpose is “to facilitate the exchange and development of philosophical research through the Internet. Our service gathers and organizes philosophical research on the Internet, and provides tools for philosophers to access, organize, and discuss this research.”

Microsoft Academic Search http://academic.research.microsoft.com/
Currently concentrates on chemistry, computer science, engineering, mathematics and physics. It has advanced search options that actually work (unlike Google Scholar!), lists citations and has a wonderful Visual Explorer.

Not mentioned in the slides but discussed briefly during the session was HealthMash http://healthmash.com/. A semantic metasearch health search engine with “clustering and advanced linguistic capabilities.” I’d be interested in people’s experiences and views of this one.

Update on coots vs. lions

If you have landed on this page thinking that this is a post about your favourite football or rugby team, please note that this is an update on my earlier article ‘Google decides that coots are really lions’ (http://www.rba.co.uk/wordpress/2011/02/12/google-decides-that-coots-are-really-lions/). It has nothing to do with sporting activities unless you count trying to work out what Google is doing with your search! The original post was about how and why Google decided that a search on coots mating behaviour should really have been lions mating behaviour.

The first response to my posting was a comment from Arthur Weiss (http://www.rba.co.uk/wordpress/2011/02/12/google-decides-that-coots-are-really-lions/comment-page-1/#comment-14207).
He suggested that Google was treating coots and lions as synonyms (both are living creatures). I thought that was pushing synonyms too far even for Google. (Sorry, Arthur).

I then had two comments in quick succession from Susanna Winter via Twitter (@Mrs_Figaro). The first is at (http://twitter.com/Mrs_Figaro/statuses/36714410223341568):

Twitter comment on lions vs coots

Moving coots from the beginning to the end of the strategy resulted in an exact match and not a single lion in sight:

Mating behaviour coots

Changing the order of the search terms is a trick I often use to change the order of my results or bring up pages that might be buried in the hundreds or thousands, but I have never seen such a dramatic change such as this.

Susanna’s search strategy ‘coots feeding behaviour’, which came up with an exact match, muddied the waters even more. Perhaps there is a search frequency algorithm coming into play? Are there more searches for lions mating behaviour than for coots, but not lions feeding behaviour? I am not convinced that this explains Google’s insistence on looking for lions rather than our animal of choice. Susanna’s next tweet suggests what is going on (http://twitter.com/Mrs_Figaro/statuses/36715389190676480):

Google spelling correction

What you see is:

Google coots search minus lions

So Arthur was on the right track. (My apologies, Arthur).  What probably happened with our search is, as Susanna said, that Google first assumed a typo and then did a synonym search on cats. What puzzles me, though, is how Google arrived at cats from coots. Surely coyotes or goats would be nearer when it comes to typographical errors?

I have two final variations on our search to confuse you even further.

The first is repeating coots at the start of the strategy. An exact match:

Repeating coots in the search

Now move one of the ‘coots’ to the end of the strategy and Google asks “Did you mean lions mating behaviour coots”:

Repeating coots in the search

I give up!

AROUND: Google proximity search operator

Several people have already blogged about Google’s AROUND proximity operator: Digital InspirationResearchBuzz, SearchReSearch and Phil Bradley to name just four. According to SearchReSearch the command has been available for 5-6 years, which begs the question “Why has no-one picked up on it before now?” Could it possibly be because the operator does not do what it says on the tin? Perish the thought and wash my brain out with soap and water for even considering such a thing. 

The AROUND command allows you to specify the maximum number of words that separate your search terms. The syntax is firstword AROUND(n) secondword. For example oil AROUND(2) production.

The reason I have not commented on AROUND so far is because – how can I put this politely – I am finding it difficult to find a search in which it is of practical value. I shall illustrate with just one of my searches, macular degeneration, but my experiences with other test and “real” searches are similar. When testing search features the relevance of the documents that appear on the first few pages of the results is more important than the number of  hits, especially as the latter are often guesstimates from Google and can vary enormously depending on which version of Google you use. Nevertheless, the numbers are interesting even if they only serve to confuse us further and I have included them with the screen shots. All of the following searches were run in Google.co.uk

Let’s kick off with a very basic version of my test search: macular degeneration

Number of results: 7,340,000

Macular Degeneration simple search

The results are relevant and as usual Google appears to be listing first those pages where the terms appear next to one another. If we did want to be more precise and reduce the number we could search for the phrase: "macular degeneration".

Number of results: 1,690,000

Macular degeneration phrase search

Not surprisingly the number of results has been reduced significantly to 1,690,000.

Let us now say that my enquirer has come back with an amendment to the original request. They have been told that there are several forms of macular degeneration, for example macular disciform degeneration, and they want a selection of articles covering as many of them as possible. I have a biomedical background and can easily identify the relevant phrases and run separate searches on them, but what if I didn’t have a clue where to start? I could use Google’s asterisk (*) between my two terms to stand in for one or more words.

The strategy macular * degeneration gives us a massive 21,500,000 results, far more than our first basic search if the numbers are to be believed.

Macular degeneration asterisk search

In just the first 6 results we have picked up vitelliform and disciform degeneration, and more are picked up in the subsequent 20-30 results.

Google’s search tips say “If you include * within a query, it tells Google to try to treat the star as a placeholder for any unknown term(s) and then find the best matches.” It is not clear from this whether the asterisk stands in for one or more terms. Adding more asterisks to the search does not alter the number of results, which in any case are only an estimate. We do, though, see very different content and now variations on our terms (for example macula)  are appearing emboldened in the page summaries.

Comparison of asterisk searches

We could try and force an exact match search by placing a plus sign before macular in our strategy, but let’s try and keep this exercise simple.

Now for three searches using AROUND(n). Note that AROUND must be in capital letters, otherwise Google will treat it as just another search term. Specifying the number of separating words as 1, 2 and 3 gave me 1,710,000, 1,710,000 and 1,720,000 results respectively.

Google AROUND operator

The results are very different from the searches incorporating the asterisk and AROUND(2) and AROUND(3) were identical. Also, it seems that with the AROUND operator Google is giving priority to documents where the terms are a phrase and not separated by any other words. It was only when I reached around 650 that I started to see phrases where my two terms were separated by one other word.

Using just AROUND without any number gave me 1,610,000 results that looked very similar to those obtained with AROUND(1).

Logically, one might think that macular AROUND(0) degeneration would be the same as a search on the phrase "macular degeneration". It isn’t!

Phrase versus AROUND(0)

Not only are the number of results different (AROUND(0) comes back with 4, 250,000 compared with 1,690,000 from the phrase search) but so is the content.

Finally, I decided to follow Phil Bradley’s lead and see what happens when I try and exclude the phrase from the AROUND(0) search: macular AROUND(0) degeneration -"macular degeneration". I got 43,000 results in which the terms seemed to appear anywhere within the document, in any order and separated by any number of other words.

In conclusion, despite what I said earlier I think AROUND does work but it is difficult to test because Google always seems to give priority to pages in which your terms appear as a phrase and not separated by any other words. Its effect is probably more obvious if you are dealing with a topic that would otherwise return a very small number of results. The ranking and sorting of the results changes significantly, though, when you use AROUND so it might be worth trying if you are fed up with seeing the same documents and sites again and again. In all of the test searches I have carried out so far I still prefer the asterisk, especially if I want to be able to identify expanded phrases quickly and easily. But, as the saying goes, your mileage may vary. Feedback on your own experiences, please.