Saturday, February 13, 2010

The language wars and the TIOBE index

Recently, I had a bit of fun by writing a modestly titled blog post arguing in favor of Groovy's adoption over Scala. Among other things, I pointed out that the Indeed.com job trends graphs show much stronger Groovy adoption over Scala. One comment there made the interesting observation that the TIOBE Programming Community Index said precisely the opposite. I did not exactly carry out meticulous research so I confess I did not pay attention to this index when I made that blog post. I thought I'd remedy that oversight in this post.



First of all, what is this index? It rates the popularity of various programming languages by counting the occurrences of the phrase "language programming" on the search engines. So for example every web page that has the phrase "Scala programming" will count in favor of Scala, whereas the phrase "Groovy programming" will count for Groovy. There is also some post-processing of the numbers for weighing and confidence, but that is essentially it.

People have expressed some concerns about this methodology's validity. For example, this blog entry will count as a hit for Scala because it contains the magic phrase "Scala programming" even though it says nothing about the subject. Apart from this one, none of my other blog posts with the Groovy tag will count as a hit for Groovy because they do not contain the magic phrase "Groovy programming". Moreover, here the act of observation will modify the subject. People know that this index can be gamed, and can choose their words to maximize their favorite language's ranking.

So how reliable are TIOBE's rankings for Groovy and Scala? Let's use their search criteria on Google and count the hits:
  • +"scala programming": 130,000
  • +"groovy programming": 78,900
Sure enough, Scala is way ahead, which explains its superior TIOBE index ranking. We know those are the phrases that TIOBE uses. What if we change the wording a little?
  • +"programming with scala": 63,000
  • +"programming with groovy": 124,000
Or:
  • +"developing in scala": 2910
  • +"developing in groovy": 16,600 
Or:
  • +"scala development": 6,030
  • +"groovy development": 30,900
  • +"grails development": 25,700
Now Groovy is way ahead. Is this making any sense to you? If we can so dramatically change the apparent relative popularity by minor wording variations in comparable phrases, I suggest that the search engine method is not very reliable.

Let me offer a guess as to what's going on. I don't think the Scala community is being sneaky. Rather, it's a difference of idiom. When we talk about Scala, we are more likely to follow the Scala homepage's example and refer to it as the "Scala programming language" or use the word "programming". So TIOBE is more likely to find the magic phrase "scala programming" that it searches for. By contrast, notice that the Groovy homepage does not have the phrase "Groovy programming" anywhere on that page. We're more likely to follow that example and refer to the language simply as "Groovy". No TIOBE hits. Since TIOBE does not compensate for such distortions, the result get skewed in Scala's favor. At least, that's my guess.

I had previously used Indeed.com's job trends charts to judge adoption, and I still think it is the better measure. This measure counts word hits in job postings. Job postings contain specific keywords for job requirements, so you can search with more precision. If you search job postings for "Scala" or "Groovy", it's not likely that you'll turn up a hit unless the language is a requirement. Moreover, it's a much more valid measure of adoption. By the time a language becomes a job requirement, it's either already in use or its use is imminent at that site.

2 comments:

  1. The interesting thing about this is that most are searching for java developers with groovy scripting knowledge. So an only groovy developer will not get the job compared to the scala developers.

    ReplyDelete
  2. you missed my point entirely. this 'my bigger than yours' attitude always leads to selective thinking. you would have included TIOBE without a problem if it had shown groovy is ahead.

    use groovy if it fits your needs but arguing groovy/groovy++ is 'better' or 'more popular' than any other "secondary" jvm language is just silly.

    ReplyDelete