Hannah Montana fan club members to sue the fan club. Hopefully the judge is a Bayesian
This is a re-post. For some reason pieces of the original did not upload and I was too tired to even check. This morning I added the regression stuff.
While I’m in the mood to update old stories: here’s the latest one about Hannah Montana
Thousands of “Hannah Montana” fans who couldn’t get concert tickets could potentially join a lawsuit against the teen performer’s fan club over memberships they claim were supposed to give them priority for seats.
The lawsuit was filed on behalf of a New Jersey woman and anyone else who joined the Miley Cyrus Fan Club based on its promise that joining would make it easier to get concert tickets from the teen star’s Web site.
“They deceptively lured thousands of individuals into purchasing memberships into the Miley Cyrus Fan Club,” plaintiffs’ attorney Rob Peirce said. His Pittsburgh firm and a Memphis firm filed the suit Tuesday in U.S. District Court in Nashville.
The fan club costs $29.95 a year to join, according to the lawsuit, which alleges that the defendants should have known that the site’s membership vastly exceeded the number of tickets.
What an interesting club they have. At least they like to do things together? It seems these are people who either (a) honestly did purchase membership with this club in order to get preferential access to concert tickets, or (b) are now saying they did because taking responsibility for things just not working out is so unfashionable, these days. Who knows.
I guess it’s just a lawsuit, like any other. I’ve looked around: I don’t see any mention of the actual number of members of this fan club (perhaps it’s made known once you are a member and log in?). If this number was known then, yes, I would say it should be clear to members that more people will want tickets than will get them. “Thousands” are in on this lawsuit, so I figure it ought to be a lot.
The solution is simple: compare the two sets of people, members and non-members. There must be some measure of the non-member fans of the girl – perhaps people who tried and failed to get tickets via the members’ site? If, conditional upon being a member, one was in fact more likely to have gotten tickets than the general public, there is no lawsuit. If the opposite is found (i.e. if there appears to have been no advantage), there there is a lawsuit.
Enter Bayes’ theory: suppose we want/need the probability of getting tickets conditional upon being a Miley Cyrus Fan Club member. We don’t have that, per se. What we do have is the probability of being a Fan Club member conditional upon (a) getting tickets, and (b) not getting tickets. With this, we can work.
First, define A1 = Getting Tickets, A2 = Not Getting Tickets, B1 = Fan Club Member and, finally, B2 = Not A Fan Club Member.
So, the probability we need is
to compare to
What we observe (or can observe) are , , and , where (for example)
and so forth. Now, the probability
for example, gives us
Repeating that, we see that the probability that we need is given by
This is because our outcomes are clearly defined: they are mutually exclusive, and they are exhaustive – i.e.
Same for B2. Thus will we get the two numbers that need to answer the questions: (1) what was the probability of getting tickets conditional upon being a Miley Cyrus Fan Club member; and (2) was it greater than the probability of securing tickets conditional upon not being a fan club member? I should point out here that the tricky part of this is going to be finding A2 and Pr(A2 ). Less so, perhaps for members of the Miley Cyrus Fan Club than for the general population. The value of that information will make a very big difference to our conditional probabilities: what if, for example, they are different numbers, but very similar numbers? How different do they have to be? Enter the (pronounced ky, to rhyme with sky) test for independence.
The test for independence will test for us the null hypothesis (the default hypothesis) that , versus the alternative that . For this we need all four possible joint observed cells:
If the two probabilities are in fact equal, then we would expect to see (for example):
Then we calculate our test statistic:
This equation refuses to convert. I’ll fix it later). Here you go (anyone want to explain why the equation beats the WordPress renderer?):
I.e. the sum of the squared values of the (observed – expected) cells for each of the two outcomes. This could also be done the other way around, or using the Tickets columns, rather than the Membership rows. With n – 1 = 1 degree of freedom, we just need that statistic to be greater than 3.84:
to reject our null hypothesis and conclude that the distribution of ticket-getting was in fact different for Miley Cyrus Fan Club members than for non-members. If the members had a higher conditional probability of securing tickets then, again, there is no case. If they are not statistically significantly different, they’ve been ripped-off. Again, whether they should have known this beforehand is a matter for a jury: we just do the numbers.
Done? Not even close. What if there was more to it than that?
Regression analysis: regression analysis will offer two distinct advantages in this instance; one for the prosecution, and definitely if the defence has demonstrated, above, than Miley Cyrus Fan Club members did in fact get a better deal on tickets than non-members, and one for the defence, for the same reason:
- Regression analysis will be able to quantify the degree to which being a member of the fan club increased the probability of securing a ticket to the show(s).
- Regression analysis will be able to identify the statistical significance of the relationship between fan club membership and ticket-securing, controlling for other factors.
Our regression model appears thus:
Keeping it simple Ordinary Least Squared. That is part (1): this model will positively identify whether being a member of the fan club (a dummy variable: 0 = not a member; 1 = member) affects the probability of securing tickets. For purposes of compensation, it will also quantify the degree to which that probability increased (if it increased at all).
However. What if there was some other difference? We know, for example, that scalpers landed on these tickets like (insert joke here – who don’t you like?). Suppose Miley Cyrus Fan Club members differed in some specific other respect? Perhaps they just didn’t log on as quickly? Do they have a slower connection? Was a child doing it with their parents credit card (the assumption being that they were slower to manoeuvre the system)? On to multiple regression! Controlling for these factors, our model becomes:
The more statistically significant explanatory variables we introduce into our model, the less statistically significant (and, probably, economically significant) will become, and the weaker will become the class action lawsuit against the Hannah Montana people.
Seems like a waste of perfectly good econometrics/statistics, one might think. The suit will probably contain every fan club member who did not get a ticket, though, asking for triple damages plus legal fees. I reckon it’s worth the effort for the companies being sued.
I keep telling my students that econometrics can do everything…