Read the fine print: IP Statistics

Lies, damn lies and statistics.  This Kat's first appearance on this blog was in 2007 when she queried the statement, "It has been said that 80% of the information found in patents cannot be found anywhere else."  It was an oft-used number with no citation.  We never really found one.

The basic problem hasn't gone away.  IP statistics are thrown about on a regular basis without sufficient caveats, out of context and, in some cases, stats are so poorly calculated they should never leave the back of the envelope. Counterfeiting and piracy statistics used in early in copyright debates are an excellent example (more here.)

Things are improving, but there is a long way to go. Even the Daily Mail finds misleading stats and graphs funny.

All is not lost.  The PatStat (IP Statistics for Decision Makers) annual conference discusses these challenges. The UK IPO has produced a handy guide on the use of patent data, which merits a post by itself. (IPKat discussions on patent stats here and here.)

The folks at CIGI Waterloo have just published a paper looking at stats in cybercrime entitled, "Global Cyberspace is Safer than You Think: Real Trends in Cybercrime" by Eric Jardine. Eric examines recent stats in cybercrime and seeks to normalise them. In this case, normalisation is done by adjusting absolute figures in the form of totals (e.g. 1,000 attacks per year), or growth, (e.g. 50% more attacks in 2014 than 2013), for the growth of the internet.

To illustrate normalisation, consider the following, totally made up example: 30% more thefts of iPhone 6 in 2015!!!  Without context, rather shocking.  However, the iPhone 6 was only introduced September 2014. There are vastly more iPhones on the market in 2015, and the risk of theft may have actually decreased. Accounting for the growth in number of iPhone 6s would normalise these figures.

Eric investigated 13 absolute figures on cybercrime and found his normalised stats show a much less scary situation:
  • in 6 cases, the absolute figures showed the situation getting worse whereas his normalised figures showed the situation actually improving
  • in 6 cases, both the absolute and normalised figures show the situation getting better, but the normalised figures show improvement happening sooner and faster
  • in 1 case, both the absolute and normalised figures show the situation getting worse, but the normalised figures show this deterioration happening slower
So, good news!  Cyberspace is safer than you think.

Tips for reading stats:
  1. Check citations. Citations are like PDO.  Know where your calculations come from.
  2. Check context.  Read the fine print. Most stats come with caveats.
  3. Check methods. Read the ingredients. Your stats should have good data and processes.
  4. Check with an expert.  When in doubt, ask your friendly statistician or economist. 
Examples of egregious manipulation of data this Kat has seen in her career:
  • exploiting Excel's rounding to display 9.45% as 9.5% (0.05% can make a huge difference)
  • 'massaging' data so three firms had a market majority 
  • a protest described as '100 people' by protestors, 'approximately 50' by police, and '20' by the organisation being protested
And remember, 78% of stats are made up on the spot, the other 29% are drivel. Check out Full Fact and the BBC's More or Less for good explanations and debunking of stats.
Read the fine print: IP Statistics Read the fine print: IP Statistics Reviewed by Nicola Searle on Tuesday, July 21, 2015 Rating: 5


  1. An excellent post.

    Here in the U.S., even fallacious data that has been shown to be of no merit continues to be "refreshed" and used over and over again - most times by an incestuous academia with an agenda unto themselves.

    And it is not just the academics here. Even our executive branch is guilty of using bogus data to further their agenda. There has been an extensively detailed request by Ron Katznelson officially lodged with Executive Branch to hew to our laws and restate its "anti-Tr011" manifesto that was riddled with spurious data.

    The "official" reply is now well past its original due date, and I remain eager to see just how our Executive Branch is going to spin its answer to the meticulously documented "made up facts" that riddle its Policy Paper.

  2. Does this post have anything to do with the OHIM press release stating that "The manufacture and distribution of fake clothes, shoes and accessories (like ties, scarves, belts and gloves) takes over €26 billion every year from legitimate EU businesses"?

    From a skim of the press-release the relevant study seems riddled with ill-supported statistics, suppositions, and assertions. My personal favourite being the assertion that "(all) producers and sellers of fakes do not pay tax, social contributions and VAT" and therefore we can assume that all of the tax they would have paid is a loss to the European economy of €8bn.

  3. If you read the actual report and not just the press release, you will get a more nuanced picture. The authors qualify their numbers, for example, on p. 10 the report states that to the extent counterfeits penetrate the legitimate sales channels, the tax losses calculated overestimate the real impact. Press releases are by necessity simplified (sometimes over-simplified). Better read the report.

  4. Reading the report is always sage advice.

    However, in this day and age of "soundbyte journalism," it is often the lack of reading - and the critical thinking that should follow - that "carries the day."

    One only has to see the (lack of) true dialogue even (especially) on leading patent blogs (certain other ones in mind) to see that a soundbyte repeated often enough is meant to gain traction where a full read (and understanding) would not stand for that view to be reasonably put forward.

  5. Aha! This is precisely the debate I had in mind. There is a big disconnect between 'reality' (whatever that is), the measurement of the reality, the description and analysis of the measurement, and finally the reporting of said measurements.

    The OHIM report, mentioned by Anon 16:43, has a lot of caveats which are not reflected in the press release. But how do you get a sufficiently caveated stat in a press release? There is no room for footnotes as anon 10:50 notes. The same is often the case with speeches and newsbytes.

    The challenge is when stats, good or bad, gain currency and are repeated without caveats and without the "critical thinking" US anon refers to. Surely we can do better than the 2007 situation of throwing around the 80% patent figure.

    And all of this is before we get to thinking about how things are measured and analysed...

  6. Nicola,

    Sadly, that big disconnect is often so on purpose. Especially on patent blogs (of a certain US variety).

    As I have often observed, propaganda exists because it does in fact work. Repeat an outright falsehood (or even more devious, a half-truth/full lie) often enough and some people will "think" it to be true. This type of forum is especially pliant because there is no way to force people NOT to engage in such petty dissembling, and the battle for the bully pulpit becomes one of NOT listening to any other point made in the online "discussions." Instead, it becomes a battle of flooding without regard to what others say - and as noted here, without regard to reading and critically thinking about the matter.

  7. I've read the OHIM report and I think the authors' qualifications are extremely poor. To me they appear to be nothing more than a justification for the creating the highest possible estimate of the damage to the European economy resulting from counterfeiting. For example, the report states

    "...some amount of direct and indirect taxes is levied on these (counnterfeit) products, and so the net reduction in government revenue may be smaller than the gross effect calculated here. Unfortunately, data currently available do not allow for calculation of these net effects with any degree of accuracy."

    So because the authors can't calculate an estimate of the tax that is paid on counterfeit goods with any accuracy they have assumed it is zero. Why not 50% or any other value?

    Similarly, they have assumed that precisely zero of the jobs lost in the sectors that produce authentic goods are replaced by jobs related to counterfeit goods that are sold in their place. No justification is provided for this assumption.

    This doesn't seem to be a case of simplified headlines twisting a more nuanced report. Rather it seems to be a case of a report being created with the intention of producing such headlines.

  8. US-anon

    You're absolutely right that propaganda works. But I don't think a single one of us in immune to confirmation bias. We inherently reject statistics and information that conflict with our existing beliefs and accept those that confirm. Which is why I think it's the responsibility of the IP community to look beyond headlines and, as you say, critically thinking.

    Also, I haven't figured out which blogs you're referring to!

    Anon 12:51

    That is a tricky one. In making assumptions like this, you're damned if you do, damned if you don't. Assuming the majority of criminal activity does not pay taxes is reasonable, but any precise amount will be arbitrary until there is better evidence. One way around this is to do a scenario analysis which gives you a range of figures. That allows people to pick which one suits their beliefs best.

    Having looked at the report in a bit more detail, there is some room for improvement in the econometric model. It is extremely difficult to have sufficient data to ascribe a difference between predicted sales and actual to counterfeiting. The caveats for this part of the analysis are later in the report.

    Communicating all of this is a challenge.

    Keep the questions and comments coming as I'd like to continue analysing these figures!

  9. For more made up statistics and bias see:-

    Anon 12:38

  10. Hi Anon 12:28, could you elaborate a bit more as to what you mean by 'made up' and 'bias'? (To keep the 'constructive' in constructive criticism.)


All comments must be moderated by a member of the IPKat team before they appear on the blog. Comments will not be allowed if the contravene the IPKat policy that readers' comments should not be obscene or defamatory; they should not consist of ad hominem attacks on members of the blog team or other comment-posters and they should make a constructive contribution to the discussion of the post on which they purport to comment.

It is also the IPKat policy that comments should not be made completely anonymously, and users should use a consistent name or pseudonym (which should not itself be defamatory or obscene, or that of another real person), either in the "identity" field, or at the beginning of the comment. Current practice is to, however, allow a limited number of comments that contravene this policy, provided that the comment has a high degree of relevance and the comment chain does not become too difficult to follow.

Learn more here:

Powered by Blogger.