Update on coots vs. lions

If you have landed on this page thinking that this is a post about your favourite football or rugby team, please note that this is an update on my earlier article ‘Google decides that coots are really lions’ (http://www.rba.co.uk/wordpress/2011/02/12/google-decides-that-coots-are-really-lions/). It has nothing to do with sporting activities unless you count trying to work out what Google is doing with your search! The original post was about how and why Google decided that a search on coots mating behaviour should really have been lions mating behaviour.

The first response to my posting was a comment from Arthur Weiss (http://www.rba.co.uk/wordpress/2011/02/12/google-decides-that-coots-are-really-lions/comment-page-1/#comment-14207).
He suggested that Google was treating coots and lions as synonyms (both are living creatures). I thought that was pushing synonyms too far even for Google. (Sorry, Arthur).

I then had two comments in quick succession from Susanna Winter via Twitter (@Mrs_Figaro). The first is at (http://twitter.com/Mrs_Figaro/statuses/36714410223341568):

Twitter comment on lions vs coots

Moving coots from the beginning to the end of the strategy resulted in an exact match and not a single lion in sight:

Mating behaviour coots

Changing the order of the search terms is a trick I often use to change the order of my results or bring up pages that might be buried in the hundreds or thousands, but I have never seen such a dramatic change such as this.

Susanna’s search strategy ‘coots feeding behaviour’, which came up with an exact match, muddied the waters even more. Perhaps there is a search frequency algorithm coming into play? Are there more searches for lions mating behaviour than for coots, but not lions feeding behaviour? I am not convinced that this explains Google’s insistence on looking for lions rather than our animal of choice. Susanna’s next tweet suggests what is going on (http://twitter.com/Mrs_Figaro/statuses/36715389190676480):

Google spelling correction

What you see is:

Google coots search minus lions

So Arthur was on the right track. (My apologies, Arthur).  What probably happened with our search is, as Susanna said, that Google first assumed a typo and then did a synonym search on cats. What puzzles me, though, is how Google arrived at cats from coots. Surely coyotes or goats would be nearer when it comes to typographical errors?

I have two final variations on our search to confuse you even further.

The first is repeating coots at the start of the strategy. An exact match:

Repeating coots in the search

Now move one of the ‘coots’ to the end of the strategy and Google asks “Did you mean lions mating behaviour coots”:

Repeating coots in the search

I give up!

Google includes your social circle in search

Google has been including search results from your social circle for quite a while. If you are logged in to a Google account it sometimes includes a box either at the bottom or in the middle of your search results page with a couple of “Results from people in your social circle”:

Google Social Circle

Click on the link and it will show you more from your circle.

If you are using Google.com and open up the search options in the side bar to the left of your results there is a “Social” option that will do exactly the same thing.

Google Social search option

Want to know who is in your social circle? Head straight to your Google dashboard at http://www.google.com/dashboard and scroll down to Social Circle. This not only tells you which of your social networks Google is using but also lists who.

Google lists your social circle

Any social networks that you have mentioned in, for example, your Blogger profile or your general Google profile such as Twitter will be included as will contacts in Google Reader, Google Buzz, Google Contacts and Picasa. In addition to your direct connections Google also searches the content of secondary connections that are publicly associated with your direct connections. The A-Z reveals all!

Until now the Social option has been kept separate but Google has started integrating all of these results with the rest of  your search starting with Google.com (see Official Google Blog: An update to Google Social Search
http://googleblog.blogspot.com/2011/02/update-to-google-social-search.html). So if you are using Google.com and are logged into your Google account you will now start seeing results automatically from your social circle.

Google's new social search

Underneath the social search entry it will say something like “Joe Bloggs shared this on Twitter”. Hover over this and it tell you how you are connected.

I frequently use social media to search for information and advice but I much prefer to choose when and how to do it. I do not always want Google unilaterally deciding to add opinions from all and sundry in my network. Remember, this is not just the direct connections that you have chosen to make but others who are connected to them and over which you have no control. The only way you can shut this off completely is to log out of your Google account. I will reserve final judgement until I have used it more but after only 2 hours of experimentation I am already finding that some searches are dominated by results from Twitter, Google Reader and Flickr. The results look even more cluttered than before and I fear it gives Google even more opportunity to completely mess up the search.

Pancakes and Mash: mashed library unconference – places still available

Pancakes and Mash
Pancakes and Mash http://www.flickr.com/photos/pstainthorp/5388040023/

There are 14 places remaining on Mashed Library “Pancakes and Mash”, a library technology un-conference taking place at the University of Lincoln on 8 March 2011 (Shrove Tuesday).

You can claim one of the last 14 places at: http://mashlib.blogs.lincoln.ac.uk/booking/

Booking costs only £12 and will remain open until Monday, 28 February.

There’s a growing list of delegates, drawn from a range of academic, public and commercial libraries at http://mashlib.blogs.lincoln.ac.uk/who/

The programme is taking shape and the organisers are confirming speakers/mashup ‘experts’ for the various workshop sessions. The keynote presentation will be from Gary Green (Technical Librarian, Surrey County Council Library Service) on: “Loud Library Voices: Campaigning, The Web, Journalists & The Offline World”.

Further details and programme (in perpetual beta) at http://mashlib.blogs.lincoln.ac.uk/

For photos of previous events search Flickr.com for mashlib, haggisandmash, mashliv, mashspa, chipsandmash or middlemash just for starters!

UK crime data as clear as mud

I’m a nosy neighbour. I like to know what’s going on in my area: who’s bought the house next door, local planning applications, any dodgy activity going on? My husband and I are both self employed so there is usually at least one of us out and about in Caversham during the day. That means we have the chance to chat with our local postman, workmen digging up the road, Police Community Support Officers doing their rounds and with people in the local shops, bank and post office. Crime, not surprisingly, is a major topic on our “watch list” and just over two years ago police forces in England and Wales started to provide access to local crime statistics via online maps. The new service allowed you to drill down to ward level and view trends in burglary, robbery, theft, vehicle crime, violent crime and anti-social behaviour.

The format varied from one police force to another. For example Thames Valley Police provided a basic map and tables of data:

Thames Valley Police 2008 crime rates

Others such as the Metropolitan Police included additional graphical representation of the statistics such as  bar charts:

Metropolitan Police 2008 Crime Rates

None of them pinned down incidents to individual streets or addresses but they did give you an idea of the level of crime in a particular neighbourhood, how it compared with the same period the previous year and whether it was high, above average, average, below average, low or no crime. They were short, though, on detailed definitions of what each category of crime included. I looked at these maps out of personal curiosity rather than using them for any serious business application, and I made certain assumptions such as murder being included under ‘Violence against the person’. That may not have been the case.

Some police forces placed obvious links to the information on their home pages whilst others buried the data in obscure corners of their web sites. The crime maps where then all moved to the CrimeMapper web site – the Thames Valley Police map can still be seen at http://maps.police.uk/view/thames-valley – but that has now been integrated into Police.uk website, which “includes street-level crime data and many other enhancements“.

All you have to do is go to http://www.police.uk/, type in your postcode, town, village or street into the search box and “get instant access to street-level crime maps and data, as well as details of your local policing team and beat meetings“. The first screen looks good with news of local meetings, events, recent tweets, YouTube videos and – as the home page promised – information on my local policing team.

Police UK page for RG4 5BE

When I focus on the map to look at the detail there are markers for the location of the crimes and clicking on them gives you a brief description of the crime:

Detail on Police UK crime rates for Caversham

In this example, the detail box had details of two crimes “on or near Anglefield Road” and this is where I started to become confused. Were the burglary and the violent crime part  of the same incident or totally separate? Furthermore, if you look in the left hand panel of the screen you will see “To protect privacy, individual addresses are not pinpointed on the map. Crimes are mapped to an anonymous point on or near the road where they occurred.” Fair enough, but I would like to know how near ‘near’ is. 100, 200, 400 yards? Half a mile, a mile? And does the focus shift from one street to another from one month to the next? If it stays put then a street could gain a crime rate reputation that it does not deserve but if it shifts there is no way one can compare data from one month or year to another, which brings me to my next question.

Why is there only one month’s data? Previous versions of the crime maps gave you three months data for the current and the previous year for comparison. There is nothing about this in the Help section of  the Police UK site but the Guardian reports:

police forces have indicated that whenever a new set of data is uploaded – probably each month – the previous set will be removed from public view, making comparisons impossible unless outside developers actively store it.” (Crime maps are ‘worse than useless’, claim developers http://www.guardian.co.uk/technology/2011/feb/02/uk-crime-maps-developers-unhappy?CMP=twt_iph).

This means that if you want to run comparisons over time you will have to download the files and store them on your own system each month, or find someone else who is already doing it.

The Guardian article also says:

the Information Commissioner‘s Office (ICO) advised that tying crime reports to postcodes or streets with fewer than 12 addresses would render the individuals involved too identifiable. The police have also decided to remove data about murders or sexual assaults.

With respect to the latter the help file on the Police UK site suggests otherwise:

Crimes have been grouped into six categories following advice from the Information Commissioner’s Office. This doesn’t mean that the crimes listed under ‘other’ are not seen as important. Rather it ensures that for some of the more sensitive crimes there is even greater privacy for the victims.

So which is it: murders and sexual assaults are not included at all or aggregated under “other”? Jonathan Raper says on his blog Placr News (“Five reasons to be cautious about street level crime data” http://placr.co.uk/blog/2011/02/five-reasons-to-be-cautious-about-street-level-crime-data/):

Some data is redacted eg sexual offences, murder. The Metropolitan Police has already released this data to ward level though… and it is easy to cross-reference one murder in one ward to reports in the local press at the same time

Data visualisations and mashups are becoming increasingly popular and make it considerably easier to assess a situation and view trends. The Guardian Datablog (http://www.guardian.co.uk/news/datablog), for example, encourages people to take data sets, mash them up and create their own visualisations, and upload a screen shot to  the Guardian Datastore on Flickr (http://www.flickr.com/groups/1115946@N24/). It is vital, though, that the source of the data, whether the full data set or just a selection has been used, and whether or not it is going to be updated is clearly spelt out. All too often one or even all of these are missing from the accompanying notes, and in some cases there are no notes at all!

An example of good practice is “UK transport mapped: Every bus stop, train station, ferry port and taxi rank in Britain” (http://www.guardian.co.uk/news/datablog/2010/sep/27/uk-transport-national-public-data-repository). The posting clearly states the source (http://data.gov.uk/dataset/nptdr) and its coverage:

“A snapshot of every public transport journey in Great Britain for a selected week in October each year. The dataset is compiled with information from many sources, including local public transport information from each of the traveline regions, also coach services from the national coach services database and rail information from the Association of Train Operating Companies”

It then goes on to specify the time period  (5-11 October, 2009) and the tools that were used to create the visualisation.

Another is the “Live map of London Underground trains” (http://traintimes.org.uk/map/tube/). This shows “all trains on the London Underground network in approximately real time“. The source is a live data feed from Transport for London (TfL) and the notes state that a “small number of stations are misplaced or missing; occasional trains behave oddly; some H&C and Circle stations are missing in the TfL feed.” It would be helpful to have a list of those missing stations, but the site has at least brought the issue of potential missing data to the users’ attention.

Returning to the Police.uk crime data, there are three major problems with the site for me as a researcher:

1. Are all crimes included in the database, or are some such as murders and sexual assaults excluded altogether or aggregated under “other”? More detailed and unambiguous scope notes please.

2. The street data level is useless. The markers are not exact locations but “near” to, there is no definition of “near”, no information on how the position of the marker is calculated or the geographic radius that it covers. It would be better to return to aggregated data at the ward level.

3. There are no options for comparing time periods and it seems that historical data will not be available on the web site. An ad hoc researcher will have to spend time and effort tracking down a developer or a web site that is downloading and keeping copies of all of the datasets as they are published.

The new crime data web site is a retrograde step. We need transparency and clarity rather than the muddle and confusion that has been generated by the lack of information on what is being provided.

Google decides that coots are really lions

First of all let us make sure we all know the difference between lions and coots. As far as I can recall, lions are huge, snarly, growly, land animals that are liable to eat you if you cross their path. This appears to be confirmed by Wikipedia (http://en.wikipedia.org/wiki/Lions) but of course Wikipedia could be wrong. Coots are  medium sized water birds (http://en.wikipedia.org/wiki/Coots) and the worst that could befall you should you antagonise one is a severe pecking.

I was walking by the Thames in Caversham today and took several photos of the birds on the river. One was of two coots who were having what appeared to be a minor domestic or an argument over territory, but a friend suggested to me that what I saw was coot mating behaviour. What do you do in a situation such as this? You Google.

My search on coots mating behaviour came up with:

Google''s interpretation of search on 'coots mating behaviour'

Where the [expletive deleted] did the lions come from?? I just do not understand how Google managed to replace coots with lions. One is a water bird with wings, feathers, and a beak and the other a large, aggressive land mammal with fur, claws and big teeth. But Google, yet again, has decided to go off and run its own search. (See my posting Oi! Google – you have seriously overstepped the mark http://www.rba.co.uk/wordpress/2011/01/03/oi-google-you-have-seriously-overstepped-the-mark/).

So did I get what I wanted by clicking on “Search instead for coots mating behaviour”? Yes I did, but Google still thinks I really want to search for lions and asks “Did you mean: lions mating behaviour”. Google has totally lost the plot.

What Google should have given me in the first place

And the photo that started it all? That can be found on my Flickr account at http://www.flickr.com/photos/rbainfo/5438769506/. I think you will agree that coots are very different from lions (http://commons.wikimedia.org/wiki/File:P_l_Bleyenberghi.jpg)

Tweetdeck sneakily forcing users to use deck.ly?

This morning I updated Tweetdeck and one of the new features is the incorporation of a service called deck.ly. This allows you to tweet more than 140 characters. Type in your tweet in Tweetdeck and as soon as you go over the limit of 140 the background in your tweet box turns yellow (this may vary depending on whether or not you have customised your colours). Next to the Send button you’ll also see a new button “Long update using deck.ly”. Tweetdeck users will see the full text of the extended tweet but others will have to click on a link. A debate has already started around this new option and I am not going to repeat the arguments here. I am more concerned about what has happened to auto URL shortening.

After the update I wrote a tweet and then added a URL that took my tweet over the 140 character limit. The background turned yellow and I waited for the URL to be shortened. It wasn’t. I had to click on the link to shorten it. I mentioned this on Twitter and, at the time of writing, no-one else seems to be having the same problem. So I decided to run some test tweets and this is what I am seeing.

1. Tweet including URL within 140 character limit

The URL is automatically shortened

Tweetdeck URL shortening

2. Tweet including URL over 140 characters

The URL is not shortened and there is the new option to send via Deck.ly

Tweetdeck URL not shortening

Auto URL Shortening is turned ON but nothing happens. I sit and wait and nothing happens. I turn URL shortening off and then back on again – still nothing happens. I have to click on the link to shorten it:

Tweetdeck URL in long tweet finally shortened

I hope that this a bug and not some attempt by Tweetdeck to force us to use Deck.ly. I have experimented several times over the last couple of hours with different tweets and see the same results every time. No-one else in my network seems to be having the same problem, though, which is making me feel a little paranoid. If you have updated to the latest version of Tweetdeck I’d be interested to hear whether or not your experiences are the same as mine.

Further background information: I ran these tests using  Tweetdeck version 0.37.2 under Windows 7 professional (updated with multiple updates earlier this morning) on an HP laptop.

Company registers: Slovenia and Croatia

I have added two more company registers – Slovenia and Croatia – to my Official Company Registers page at http://www.rba.co.uk/sources/registers.htm.

Slovenia

AJPES – Agency of the Republic of Slovenia for Public Legal Records and Related Services (http://www.ajpes.si/)
AJPES provides access to the Slovenian Business Register online (ePRS), and Annual Reports of companies, cooperatives and sole proprietors (JOLP) free of charge. ePRS currently has around 211,000 entries and includes companies (partnerships and corporations), sole proprietors, legal entities governed by private law, societies, natural persons performing registered or regulated activities, subsidiaries and other divisions of business entities and main offices of foreign business entities.

You can search on a range of criteria: name, identification number, tax number, address, activity, sector and legal form. The amount of information available on each business entity varies depending on its legal form but typically includes identification number, company name, tax number, details of representatives and founders, etc. Credit rating reports (eS.BON) are also available for a fee. The interface to the site is available in English, Slovene, German and Italian. The information is available in English and Slovene. Free registration is required to view the free documents.

Croatia

Biznet (http://www.biznet.hr/) is maintained by the Croatian Chamber of Commerce and provides information and statistics on the Croatian economy, and access to the Register of Business Entities. The Biznet home page is in Croation and the link to the Register is under Registar poslovnih subjekata. You can view data for a single company or a selected set of companies grouped according to certain criteria for example industry, company size. Available information includes identification number, short and long name, county, municipality, address, phone, fax, E-mail, company size, main activity, legal form, number of employees, membership of professional associations. The Register is available in both Croatian and English and is free of charge.

Many thanks to Tomaz Lajovic for the information on both services.

Although much of the data on both sites is available in English some of the detailed reports and guides are in the local languages. Google’s Chrome automatically offers to translate texts and the Google toolbar for both Internet Explorer and Firefox has a translate option. If you are not allowed to install the toolbar then you can copy and paste the text or the URL of the page you wish to translate into Google’s translation tools at http://www.google.co.uk/language_tools. The quality of Google’s translations has improved significantly but you should still treat the results with caution. If the information is mission critical then find someone who is fluent in the language concerned to double check Google’s effort.

M&A Portal – free information from BvD

Bureau van Dijk’s M&A portal  (http://www.mandaportal.com/) is a gateway to news, events, research and analysis on mergers and acquisitions. The information on the portal home page is free of charge and there is a free search option for tracking down deals and rumours contained in BvD’s Zephyr database. The deals can be sorted by value, date or status.

M&A Portal search results

Basic information is free of charge but you can purchase the full details from the Zephyr database. The cost of the reports varies depending on the amount and type of information available.

The portal also has M&A news feeds for 36 industry sectors. Registered users – registration is free – can select the sectors that interest them most and create a tailored news tab.