Rediscovering BananaSlug for “long tail” search

I think it must have been seeing Phil Bradley the other night that made me think of revisiting BananaSlug.com (http://bananaslug.com/). I don’t mean that Phil reminds me of a banana slug but he did introduce me to the search tool via his blog way back in 2005. I have been looking at ways of getting out of what I call “search ruts”. You keep seeing the same results again and again but suspect that there may be something more relevant if only you could get to it. Million Short, which I mentioned in a previous blog post (http://www.rba.co.uk/wordpress/2012/10/04/million-short-unearthing-stuff-hidden-in-the-dungeons-of-googles-results/), is one way to tackle the problem. BananaSlug takes a different approach to what is known as long tail search. It adds a random term to your search and pulls up pages buried way down in the results list that you would probably never see. Just type in your search and then select a category, for example Animals, Great Ideas, Random Number, Themes from Shakespeare. BananaSlug then adds a random word from that category to your terms.

At first glance this approach to search may seem appropriate for frivolous, fun stuff only but I find that it works really well with serious research topics. Running one of my test searches zeolites "environmental remediation" through the categories pulled up information that could have taken me hours or even days to find otherwise. Bear in mind that BananaSlug uses Google so synonyms and variations of the random word will be included in the search. When I selected Colors as my category red was added to my search and Google included reddish and reds.

BananaSlug Search Results

Most of the categories came up with something useful although Random Number, inevitably for this type of search, came up with page numbers of journal articles. I didn’t think Themes from Shakespeare would work but the random word it suggested was storm and there were several interesting papers on storm water management and treatment.

Banana Slug Shakespeare Storm

This may seem a bizarre way to explore search alternatives but if you are stuck for ideas give it a go.

Note: for more information on the banana slug Ariolimax see http://en.wikipedia.org/wiki/Banana_slug. The Pacific banana slug is the second-largest species of terrestrial slug in the world, growing up to 25 centimetres (9.8 in) long.

Million Short: unearthing stuff hidden in the dungeons of Google’s results

Fed up with seeing the same results from Google again and again? Wondering if that elusive document is buried somewhere at the bottom of Google’s 2,000,000 hits? Then get thee hence to Million Short (http://millionshort.com/). Million Short runs your search and then removes the most popular web sites from the results. Originally it removed the top 1 million, as its name suggests, but the default has changed to the top 10,000. The principle remains the same, though: exclude the more popular sites and you could uncover a real gem. The page that best answers your question might not be well optimised for search engines or might cover a topic that is so “niche” that it never makes it into the top results. Million Short does not say what it uses for search results or how it determines what are the most popular web sites. According to Webmonkey “Sanjay Arora, founder of Exponential Labs, tells Webmonkey that Million Short is using “the Bing API… augmented with some of our own data” for search results. What constitutes a “top site” in Million Short is determined by Alexa and Million Short’s own crawl data.” (http://www.webmonkey.com/2012/05/million-short-a-search-engine-for-the-very-long-tail/).

Using Million Short is straightforward. Type in your search and select how many sites you want to exclude (top 10K, top million, top 100). The results page includes a list of the sites that have been removed and you can opt to add one or more back in. You can also block a site using a link next to it in the results or click on “Boost!” so that pages from the site go to the top.

Million Short results

Million Short automatically tries to detect which country you are in but you can change it under “Manage Settings and Country”. I didn’t notice much difference when I changed countries but then most of the queries I pass through Million Short tend to be scientific or technical. On the same page you can manage sites that you have blocked, added or boosted.

Does it work? I would not use it instead of the existing major search engines such as Google, Bing or DuckDuckGo but as an additional tool to surface material that is not easily found in the likes of Google. As well as web search there are image and news searches, but I’m not convinced that I’d find those all that useful.

If you are interested in comparing Million Short with Google try Million Short It On at http://www.millionshortiton.com/index.html. I had several goes at this and most of the results were a draw. That is no surprise as the searches I ran were very specific and I wanted to see if Million Short would pull up additional information, which it did. Million Short won outright on a couple and Google on one. The Google win was by default because Million Short did not come up with anything for comparison (the search in question was biofuels public transport carbon emissions).

There are a number of techniques that you can use to improve Google results for example changing the order of the words in your search, Verbatim, filetype or Reading Level but I would also recommend trying Million Short. The results should at least be different and may reveal vital information for your research.

Company information: Luxembourg and Belgium

I am updating the official registries section of  my business sources listings (http://www.rba.co.uk/sources/registers.htm) and there are changes to the entries for Luxembourg and Belgium.

The Registre de Commerce et des Sociétés – Accueil (http://www.rcsl.lu/) is the official register of companies and associations in Luxembourg. The search options are limited to company name or number and the interface is in German and French. Searching and company name, address and contact details are free of charge. Documents are priced.

Legilux Sociétés et Associations has more search options at http://www.legilux.public.lu/entr/search/index.php and it provides a history of the documents filed by a company. This is a free service but for the documents themselves you have to go back to the Registre de Commerce et des Sociétés where there is a charge per document.

In Belgium the KBO Public Search (http://economie.fgov.be/nl/ondernemingen/KBO/Pubd/PuS/)
enables you to search for public information on every registered active enterprise and establishment. Search by company number, name, branch number or name, address and municipality. Each record provides name, company number, activities, address, contact details and links to other sites for official documents and annual reports. The search interface is available in Dutch and French. The information is in Dutch or French and is free.

The CBSO (Central Balance Sheet Office) section of the National Bank of Belgium (http://www.bnb.be/) has the accounts of companies, associations and foundations active in Belgium. The search interface is available in Dutch, French, German and English and the services is free.

Many thanks to Inez de Bois for the information and updates.

But it’s all free on Google

This is the presentation I am giving tomorrow (27th September 2012) at the East of England Information Services Group of CILIP at the Bury St Edmunds Library. The event is “Information integrity on a decreasing budget”. Many of the slides are just images so I am not really giving that much away by releasing the slides early. And before anyone asks, no, I will be not providing notes and I will not be providing a version with a voice over – at least for free. Some will soon be made available in the resurrected Search Strategies section of my web site at http://www.rba.co.uk/search/index.shtml and in the subscription area only.

The straightforward, no notes, no voice over slides are available at

http://www.rba.co.uk/as/ – available for a few weeks only

http://www.authorstream.com/Presentation/karenblakeman-1549834-39-free-google/

http://www.slideshare.net/KarenBlakeman/but-its-all-free-on-google

Top search tips from North Wales

August is usually a quiet month for me with respect to work. Time for a holiday away and then a couple of weeks ambling along the Thames Path or pottering around the garden. This year, though, as soon as I was I back from my travels I was knuckling down and updating my notes for two search workshops in North Wales. Both were for the North Wales Library Partnership (NWLP), the first taking place at Coleg Menai in Bangor and the second at Deeside College. Both venues had excellent training facilities and IT, which meant we could concentrate on getting to grips with what Google is doing with search and experiment with different approaches to making Google do what we want it to do.

At the end of the workshops both groups were asked to come up with a list of  Top 10 Tips. I’ve combined the two lists and removed the duplicates to generate the list of 16 tips below.

  1. Repeat one or more of your search terms one or more times
    Fed up with seeing the same results for your search?  Repeat your main search term or terms to change the order of your results.
  2. Menus on left hand side of Google results pages
    Use the menus on the left hand side of the results page to focus your search and see extra search features. To see all of the options click on the ‘More’ and ‘More search tools’ links. The content of the menus changes with the type of search you are running, for example Image search has a colour option.
  3. Verbatim
    Google automatically looks for variations of your terms and no longer looks for all of your terms in a document. If you want Google to run your search exactly as you have typed it in, click on the ‘More search tools’ options at the bottom of the left hand menu on your results page and then on Verbatim at the bottom of the extended menu that appears.
  4. intext:
    Google’s automatic synonym search can be helpful in looking for alternative terms but if you want just one term to be included in your search exactly as you typed it in then prefix the word with intext:. For example carbon emissions buses intext:biofuels flintshire. The command sometimes has the effect of prioritizing pages where your term is the main focus of the article.
  5. Advanced search screen and search commands
    Use the options on the advanced search screen  or the search commands (for example filetype: and site:) in the standard search box to narrow down your search. A link to the advanced searchscreen can usually be found under the cog wheel in the  upper right hand area of the screen. If you can’t see a cog wheel or the link has disappeared from the menu go to http://www.google.co.uk/advanced_search. A list of the more useful Google commands is at http://www.rba.co.uk/search/SelectedGoogleCommands.shtml
  6. Try something different
    Get a fresh perspective by trying something different. Two most popular during these two workshops seemed to be DuckDuckGo (http://duckduckgo.com/) and Millionshort (http://millionshort.com). Other search engines to try include Bing (http://www.bing.com/) and Blekko (http://blekko.com/).
  7. Use the country versions of Google for information that is country specific
    This will ensure that the country’s local content will be given priority, although it might be in the local language. Useful for companies and people who are based in or especially active in a particular country, or to research holiday destinations. Use Google followed by the standard ISO two letter country code, for example http://www.google.de/ for Google Germany or http://www.google.no/ for Google Norway.
  8.  Filetype to search for document formats or types of information
    For example PowerPoint for experts or presentations, spreadsheets for data and statistics, or PDF for research papers and industry/government reports. Note that filetype:ppt will not pick up the newer .pptx so you will need to include both in your search, for example filetype:ppt OR filetype:pptx. You will also need to look for .xlsx if you are searching for Excel spreadsheets and .docx for Word documents. The Advanced Search screen file type box does not search for the newer Microsoft Office extensions.
  9. Clear cookies
    Even if you are logged out of your Google account when you search, information on your activity is stored in cookies on your computer. These can personalise your results according to your past search and browsing history. Many organisations have set up their IT systems so that these tracking cookies are automatically deleted at least once a day or whenever a person logs in or out of their computer account. At home, your anti-virus/firewall software may perform the same function. If you want to make sure that cookies are deleted or want to control them manually How to delete cookies at http://aboutcookies.org/Default.aspx?page=2 has instructions on how to do this for most browsers.
  10. Looking for research papers? Google Scholar (http://scholar.google.com/) is one place to look but there may be additional material hidden somewhere on an academic institution’s web site. Include advanced search commands, for example filetype:pdf site:ac.uk, in your search.
  11. For the latest news, comments and analysis on what is happening in an industry or research area carry out a  Google blog search and limit your search by date. Simply run your search as usual in the standard Google search box. On the results page click on Blogs in the menu on the left hand side of the screen and then select the appropriate time option.
  12. site: and -site:
    Use the site:command to search within a single site or type of site.For example:2011 carbon emissions public transport site:statistics.gov.uk to search just the UK official statistics web siteasthma prevalence wales site:gov.uk OR site:nhs.ukto search all UK government and NHS web sites

    If you are fed up with a site dominating your results use -site: to exclude it from your search.

    For example:

    Dylan Thomas -site:bbc.co.uk

  13. Reading level – from tourism to research
    Use this to option in the menus on the left had side of your results page to change the type of information. For example run a search on copper mines north wales. Then click on Reading Level in the left hand menus. Selecting “Basic” from the options that appear at the top of the results gives you pages on tourism and holiday attractions. “Advanced” gives you research papers, journal articles and mineral databases. Google does not give much away as to how it calculates the reading level and it has nothing to do with the reading age that publishers assign to books. It could involve sentence structure, grammar, the length of sentences on a web page, the length of the document, the terminology used and doubtless many other criteria.
  14. Google.com
    Apart from presenting your search results in a different order Google.com is where Google tries out new features. As well as seeing pages that may not be highly ranked in Google.co.uk you will get an idea of how Google search may look in the UK version in the future.
  15. Numeric range search
    Use this for anything to do with numbers – years, temperatures, weights, distances, prices etc. Use the boxes on the Advanced Search screen or just type in your two numbers separated by two full stops as part of your search.For example:world oil demand forecasts 2015..2030
  16. An understanding of copyright is important if you intend to re-use information found in the web and absolutely essential if you are going to use images. Creative Commons licences clearly state what you can and can’t do with an image but they are not all the same. The list at Creative Commons http://creativecommons.org/licenses/ outlines the terms and conditions. “FAQs – Copyright – University of Reading” at http://www.reading.ac.uk/internal/imps/Copyright/imps_copyrightfaqs.aspx gives some guidance on copyright but if in doubt always ask! An example of what can happen if you get it wrong is demonstrated by “Bloggers Beware: You CAN Get Sued For Using Pics on Your Blog” http://www.roniloren.com/blog/2012/7/20/bloggers-beware-you-can-get-sued-for-using-pics-on-your-blog.html.

 

Doing Business in the United Kingdom and France

Compiled and published by Bryan Cave LLP, Doing Business in the UK is an excellent summary of what is involved in setting up a business in the UK and the associated legislation. As well as describing the various types of company it also covers director’s duties, UK taxation, employment law, business immigration, intellectual property, data protection and competition law. There is a similar publication on Doing Business in France. Both are free of charge.

Tales from the Terminal Room issue 102 available

Tales from the Terminal Room, issue number 102 now available at http://www.rba.co.uk/tfttr/archives/2012/july2012.shtml

Most of the information in this newsletter will already have appeared in this  blog, the exception being the Twitter Notes.

In this month’s issue:

  • Search tools
    • Yet another irritating Google feature
    • Google maps UK canals
    • Google adds cycling routes to UK maps
  • Beating Google into submission – top tips
  • Twitter Notes

Yet another irritating Google feature

There was a time when Google would aggregate pages from the same website in your search results. There might be just a couple of entries for the site with a “More from….” link next to the result.

Google More From

Alternatively you might see a mini sitemap:

Google result for UKeiG

This has the advantage that you are not swamped with results from a single website but are given instead a variety of options that might provide you with a better answer to your question.

Not any more.

You may have noticed that multiple entries from single websites have started appearing in your results. For example, rather than just one Wikipedia entry you see 4, 5, 6 or even more. On the other hand, you might not have noticed anything at all. Some of my colleagues are seeing this and some are not. Google tests new features and algorithms on a small percentage of its users to see how they react so new or test features are not seen by everyone (see How Google makes improvements to its search algorithm – YouTube http://www.youtube.com/watch?v=J5RZOU6vK4Q). As far as I’m concerned this particular “improvement” is a disaster.

I was running a very general search on the use of biofuels by public transport in the UK. I just want to get an idea of some of the issues that were being discussed before refining my search and went, by default, to Google. My first screen had nothing but results from the UK government Department for Transport (DfT).

Google search DFT

I scrolled down and saw more DfT pages. I scrolled down further and yet MORE dft pages. OK, Google, so dft.gov.uk is a good place for me to look at biofuels in public transport. I get the message. STOP! There were 27 DfT pages in total flooding the top of my results page, which I have set to display 100 entries at a time. Creeping in at number 28 came the Guardian with 5 results.

Google biofuels search Guardian

The Friends of the Earth website had 7 results, and then at last I started to see more variety in my results at around number 40, but still with a lot of repetition.

 

Google Biofuels

Google may think that the DfT is a very important source of information on the topic but I want to decide whether or not to explore more of a particular site. Spamming my results list annoys me and makes me want to go elsewhere. So I did.

DuckDuckGo (http://www.duckduckgo.com/) is my main Google alternative and it came up with a decent and varied set of results without repetition, hesitation or deviation.

Duck Duck Go biofuels search

Bing (http://www.bing.com/) and Yandex (http://www.yandex.com/) came up with similar, non-repetitive results.

Blekko (http://www.blekko.com/) came up with some interesting alternative pages for me to consider. These would not have been that useful to me in the earlier stages of my research but this test confirmed my feeling that Blekko is good at pulling up information that explores more than the mainstream issues.

Blekko results for test search

 

If you want to stay with Google how do you deal with multiple listings of sites? The most obvious approach would be to incorporate a ‘-site:’ command in your search, for example:

biofuels public transport -site:dft.gov.uk

If you are conducting in depth research and are likely to be running many variations on a search, incorporating ‘-site:’ each time can become a chore. Google’s own browser Chrome has a Personal Blocklist extension that enables you to block selected sites from results (https://chrome.google.com/webstore/detail/nolijncfnkgaikbjbdaogikpmpbdcdef). Once installed a block link appears next to each entry in your results. Click on the link to block the site from all future results. A message appears at the bottom of searches that would normally contain pages from the blocked site warning you about exclusions.

Blocklist message

The ‘show’ link displays and highlights the previously blocked pages and offers an option to unblock them.

Unblock option

Neither the -site: option nor the Blocklist approach should be necessary. There was nothing wrong with the previous ways of offering additional pages from a site in search results. It wasn’t broke but Google did break it by trying to fix it. For me, there are now several Google alternatives that produce quality results and with less irritation. I shall be using them more in future.

Google maps UK canals

First cycle routes and now canals. Google is collaborating with the Canal and River Trust to provide a Google Map guide to the UK’s canal network called In Your Area (http://canalrivertrust.org.uk/in-your-area). It is not available as part of the standard Google Maps. The map allows you to enter your address or postcode to find the nearest canal. The map shows the locations of canals, canal locks and bridges and also volunteering opportunities, places to eat and drink and boating services and moorings.

Canal and River Trust Google Map of waterways

It is early days and not everything is marked up on the map, or at least it isn’t for the Kennet and Avon Canal in Reading. Also planned for later this year is the addition of  ‘Street View’ images of the canal and river network. (Please, no lurking in the bushes by the side of the tow paths and pushing the Google cycles into the canal!)

News and comments on search tools and electronic resources for research