[ View menu ]

July 7, 2010

Navigate the Bermuda Triangle of Mediation Analysis

Filed in Articles ,Encyclopedia ,Ideas ,R ,Research News
Subscribe to Decision Science News by Email (one email per week, easy unsubscribe)

MYTHS AND TRUTHS ABOUT AN OFTEN-USED, LITTLE-UNDERSTOOD STATISTICAL PROCEDURE

If you go to a consumer research conference, you will hear tales of how experiments have undergone particular statistical rites: the attainment of the elusive crossover interaction, the demonstration of full mediation through Baron and Kenny’s sacred procedure, and so on. DSN has nothing against any of these ideas, but is opposed to subjecting all ideas to the same experimental designs, to the same tests, the same alternative hypotheses (typically a null of no difference), and the same rituals.

Zhao, Lynch, and Chen point out in their recent Journal of Consumer Research article that Baron & Kenny’s Mediation Analysis is incredibly popular (ca 13,000 cites between 1986 and 2010), prescribed reflexively, though flawed in ways its users probably aren’t aware of. This article was invited by the journal “to serve as a tutorial on the state of the art in mediation analysis”.

ABSTRACT
Baron and Kenny’s procedure for determining if an independent variable affects a dependent variable through some mediator is so well known that it is used by authors and requested by reviewers almost reflexively. Many research projects have been terminated early in a research program or later in the review process because the data did not conform to Baron and Kenny’s criteria, impeding theoretical development. While the technical literature has disputed some of Baron and Kenny’s tests, this literature has not diffused to practicing researchers. We present a nontechnical summary of the flaws in the Baron and Kenny logic, some of which have not been previously noted. We provide a decision tree and a step-by-step procedure for testing mediation, classifying its type, and interpreting the implications of findings for theory building and future research.

REFERENCES
Baron, Reuben M. and David A. Kenny (1986), Moderator-Mediator Variables Distinction in Social Psychological Research: Conceptual, Strategic, and Statistical Considerations, Journal of Personality and Social Psychology, 51(6), 1173–82.

Bullock, J. G., Green, D. P, & Ha, S. E. (2010). Yes, But What’s the Mechanism? (Don’t Expect an Easy Answer), Journal of Personality and Social Psychology, Vol. 98, No. 4, 550–558.

Zhao, X., Lynch, J. G., Chen, Q. (2010).Reconsidering Baron and Kenny: Myths and Truths about Mediation Analysis. Journal of Consumer Research, 37, 197-206.

R Package for Causal Mediation Analysis

SPSS Code (see the Zhao, Lynch, and Chen article)

July 1, 2010

Maps without map packages

Filed in Ideas ,R
Subscribe to Decision Science News by Email (one email per week, easy unsubscribe)

LATITUDE + LONGITUDE + OVERPLOTTING FIX = MAPS

Decision Science News is always learning stuff from colleague, physicist, mathlete, and all-around computer whiz Jake Hofman.

Today, it was a quick and clean way to make nice maps in R without using any map packages: just plot the latitude and longitude of your data points (e.g. web site visitors) along with the “alpha” parameter to allow for layering of coincident points. It’s duh in hindsight.

Above we see a how it looks with a little data. Below is the result with more data and a lower alpha:

In the words of James Taylor, all you have to do is call:

library(ggplot2)
qplot(long,lat,data=us,alpha=I(.1))

To get the Decision-Science-News-approved framing and aspect ratio for the USA:

qplot(long,lat,data=wtd,alpha=I(.1),
xlim=c(-125-10/2,-65),ylim=c(23.5,50.5)) +
opts(aspect.ratio = 3.5/5)

As we are certain that there are readers who will want to show that there are much nicer ways to do this, we say: download the data and show us.

June 24, 2010

Oxytocin and defensiveness

Filed in Articles ,Encyclopedia ,Research News
Subscribe to Decision Science News by Email (one email per week, easy unsubscribe)

HORMONE LINKED TO IN-GROUP GOODNESS, OUT-GROUP BADNESS

Who doesn’t like oxytocin? Who could dislike any substance referred to as a cuddle chemical? The answer may be you, if you are not in with the crowd feeling the effects of the hormone.

Carsten de Dreu and a super-long list of co-authors (listed below), have administered oxytocin to experimental participants and validated its bright side (cooperation among people in a group), but uncovered its dark side (defensive aggression towards people in other groups). Read all about it.

CITATION
Carsten K. W. De Dreu, Lindred L. Greer, Michel J. J. Handgraaf, Shaul Shalvi, Gerben A. Van Kleef, Matthijs Baas,Femke S. Ten Velden, Eric Van Dijk, Sander W. W. Feith. (2010) The Neuropeptide Oxytocin Regulates Parochial Altruism in Intergroup Conflict Among Humans. Science, 328(5984), 1408 – 1411.

ABSTRACT
Humans regulate intergroup conflict through parochial altruism; they self-sacrifice to contribute to in-group welfare and to aggress against competing out-groups. Parochial altruism has distinct survival functions, and the brain may have evolved to sustain and promote in-group cohesion and effectiveness and to ward off threatening out-groups. Here, we have linked oxytocin, a neuropeptide produced in the hypothalamus, to the regulation of intergroup conflict. In three experiments using double-blind placebo-controlled designs, male participants self-administered oxytocin or placebo and made decisions with financial consequences to themselves, their in-group, and a competing out-group. Results showed that oxytocin drives a “tend and defend” response in that it promoted in-group trust and cooperation, and defensive, but not offensive, aggression toward competing out-groups.

H/T author Michel Handgraaf
Photo credit 1: http://en.wikipedia.org/wiki/File:Oxytocin_with_labels.png
Photo credit 2: http://www.flickr.com/photos/markusschoepke/305865244/

June 18, 2010

What’s your planner score?

Filed in Articles ,Encyclopedia ,Ideas ,Research News
Subscribe to Decision Science News by Email (one email per week, easy unsubscribe)

QUIZ YOUR LOVED ONES ABOUT THEIR PROPENSITY TO PLAN

John Lynch, Richard Netemeyer, Stephen Spiller, Alessandra Zammit have recently published in the Journal of Consumer Research this article on the propensity to plan and financial well being

ABSTRACT

Planning has pronounced effects on consumer behavior and intertemporal choice. We develop a six-item scale measuring individual differences in propensity to plan that can be adapted to different domains and used to compare planning across domains and time horizons. Adaptations tailored to planning time and money in the short run and long run each show strong evidence of reliability and validity. We find that propensity to plan is moderately domain-specific. Scale measures and actual planning measures show that for time, people plan much more for the short run than the long run; for money, short- and long-run planning differ less. Time and money adaptations of our scale exhibit sharp differences in nomological
correlates; short-run and long-run adaptations differ less. Domain-specific adaptations predict frequency of actual planning in their respective domains. A “very long-run” money adaptation predicts FICO credit scores; low planners thus face materially higher cost of credit.

And while reading the article is fun, it’s also a hoot to take the propensity to plan test yourself, and give it to your friends and family. Give it a whirl, see if it accords with their behavior. Here are the items. Feel free to post your score in the comments.

For each question, answer on a scale from 1 to 6 in which 1 means “I strongly disagree” and 6 means “I strongly agree.”
Propensity to Plan for Money—Short Run:
1. I set financial goals for the next few days for what I
want to achieve with my money.
2. I decide beforehand how my money will be used in
the next few days.
3. I actively consider the steps I need to take to stick to
my budget in the next few days.
4. I consult my budget to see how much money I have
left for the next few days.
5. I like to look to my budget for the next few days in
order to get a better view of my spending in the future.
6. It makes me feel better to have my finances planned
out in the next few days.

Propensity to Plan for Money—Long Run:
1. I set financial goals for the next 1–2 months for what
I want to achieve with my money.
2. I decide beforehand how my money will be used in
the next 1–2 months.
3. I actively consider the steps I need to take to stick to
my budget in the next 1–2 months.
4. I consult my budget to see how much money I have
left for the next 1–2 months.
5. I like to look to my budget for the next 1–2 months
in order to get a better view of my spending in the
future.
6. It makes me feel better to have my finances planned
out in the next 1–2 months.

Propensity to Plan for Time—Short Run:
1. I set goals for the next few days for what I want to
achieve with my time.
2. I decide beforehand how my time will be used in the
next few days.
3. I actively consider the steps I need to take to stick to
my time schedule the next few days.
4. I consult my planner to see how much time I have left
for the next few days.
5. I like to look to my planner for the next few days in
order to get a better view of using my time in the
future.
6. It makes me feel better to have my time planned out
in the next few days.

Propensity to Plan for Time—Long Run:
1. I set goals for the next 1–2 months for what I want
to achieve with my time.
2. I decide beforehand how my time will be used in the
next 1–2 months.
3. I actively consider the steps I need to take to stick to
my time schedule in the next 1–2 months.
4. I consult my planner to see how much time I have left
for the next 1–2 months.
5. I like to look to my planner for the next 1–2 months
in order to get a better view of using my time in the
future.
6. It makes me feel better to have my time planned out
in the next 1–2 months.

ARTICLE TEXT [Download]

MEDIA MENTIONS
Wall Street Journal: http://jcr.wisc.edu/publicity/authors/docs/SUNJ.AA.1A020.A1.361Z2009.pdf

Yahoo Finance: http://finance.yahoo.com/retirement/article/109540/fast-track-to-financial-success

Decision Science News (meta-reference): http://www.decisionsciencenews.com/2010/06/18/the-propensity-to-plan-is-good-for-your-wallet/

June 11, 2010

I can read minds, you know

Filed in Articles ,Research News
Subscribe to Decision Science News by Email (one email per week, easy unsubscribe)

GUESSING WHAT PEOPLE ARE THINKING ABOUT BASED ON BRAIN ACTIVATION

You know how in cheesy 80s movies and TV shows there will be a romantic scene, like two young people on a date, and the guy will say something like “I can read minds, you know” and the girl will say “Ok” and scrunch up her eyes and say “What am I thinking about now?” and then the guy will say something particularly cheesy?

Well, in the future they’ll be able to do that scene and the guy will say “apple” and the girl will go “that’s amazing!” and the guy will go “well, the base rate was one in 60” and the girl will go “can I get out of this fMRI now?”

In any case, read this by Marcel Just et al

A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes

This article describes the discovery of a set of biologically-driven semantic dimensions underlying the neural representation of concrete nouns, and then demonstrates how a resulting theory of noun representation can be used to identify simple thoughts through their fMRI patterns. We use factor analysis of fMRI brain imaging data to reveal the biological representation of individual concrete nouns like apple, in the absence of any pictorial stimuli. From this analysis emerge three main semantic factors underpinning the neural representation of nouns naming physical objects, which we label manipulation, shelter, and eating … the fMRI-measured brain representation of an individual concrete noun like apple can be identified with good accuracy from among 60 candidate words, using only the fMRI activity in the 16 locations associated with these factors. To further demonstrate the generativity of the proposed account, a theory-based model is developed to predict the brain activation patterns for words to which the algorithm has not been previously exposed. The methods, findings, and theory constitute a new approach of using brain activity for understanding how object concepts are represented in the mind.

In order words, they can read your mind.

I like this task description:

Task: When a word was presented, the participants’ task was to actively think about the properties of the object to which the word referred.

… I wonder if the subjects were tempted to scrunch their eyes.

Find the full article here (free PDF download): http://www.plosone.org/article/info:doi/10.1371/journal.pone.0008622

REFERENCE: Just MA, Cherkassky VL, Aryal S, Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes. PLoS ONE 5(1): e8622. doi:10.1371/journal.pone.0008622

photo credit: The movie “Can’t Buy Me Love”, which doesn’t have the aforementioned scene, but does have the kind of nerdy-guy-dates-popular-girl device that causes writers to trot out the “I can read minds” bit.

June 3, 2010

Baseball, basketball, and (not) getting better as time marches on

Filed in Gossip ,Ideas ,R
Subscribe to Decision Science News by Email (one email per week, easy unsubscribe)

PROS ARE NOT GETTING BETTER AT FREE THROWS

Rick Larrick recently told Decision Science News that baseball players have been getting better over the years in a couple ways.

First, home runs and strikeouts have increased. The careless or clueless reader might note that this is curious, for from the batter’s perspective home runs are a good thing and strikeouts are a bad thing. What’s going on? Batters may be swinging harder, increasing the chance of both. The purported improvement is a result of the benefit of a home run being greater than the cost of a strikeout. After all, a home run results in at least one run, often more, and runs are a big deal since the typical team earns only about 5 of them per game.

DSN wondered how the players learned to swing harder from one decade to the next. Was it based on feedback from coaches? Or from fans / media attention?

According to Larrick, the number of attempted stolen bases has decreased over the years. Apparently, it is only worth it to steal if one can pull a very high percentage of the time, higher than had been believed in previous years (anyone know the stat?). So while crowds (presumably) like the action of stolen bases, players do not respond by doing it more. Winning seems more important than pleasing the crowd, which is a strike against the fan-feedback hypothesis.

After our post on winning back-to-back baseball games, some folks like our friend Russ Smith made comparisons to the hot hand effect. There is something to it. However, in the baseball example one starts with a prior of .5 (since one doesn’t even know which two teams are playing), while in basketball the chance a pro will make a free throw is about .75 (since one can condition on the player being a pro). What is surprising is that in both cases, the past success tells you next to nothing.

This conversation lead your Editor to find this NY Times article which shows that, surprisingly, pro basketball players are not getting better at free throws over the years.

So, the question to the readers is: Why do some athletic abilities improve as history marches on (e.g., running speeds, batting, base-stealing) and others do not (e.g., free throws)?

P.S. For the record, Decision Science News is not becoming a sports blog. It is just a phase the Web site is going through. That said, there has been interest in seeing this kind of result in other sports, so that analysis will be coming in future posts, in glorious, glorious R and ggplot2. (Don’t know R yet? Learn by watching: R Video Tutorial 1, R Video Tutorial 2)

Photo credit: http://www.flickr.com/photos/cakecrumb/4398699952/. A cupcake was chosen because Jeff gave us empirical evidence that people like cupcakes much more than a control food.

May 28, 2010

Tuesday’s child is full of probability puzzles

Filed in Encyclopedia ,Ideas ,R ,Tools
Subscribe to Decision Science News by Email (one email per week, easy unsubscribe)

COUNTERINTUITIVE PROBLEM, INTUITIVE REPRESENTATION

Blog posts about counterintuitive probability problems generate lots of opinions with a high probability.

Andrew Gelman and readers have been having a lot of fun with the following probability problem:

I have two children. One is a boy born on a Tuesday. What is the probability I have two boys? The first thing you think is “What has Tuesday got to do with it?” Well, it has everything to do with it.

DSN agrees with Andrew that one virtue of the “population-distribution” method is that it forces one to be explicit about various aspects of the problem, and in so doing, causes much confusion to disappear.

As a public service this week, Decision Science News presents the population-distribution representation of the problem (what it thinks of as the Gigerenzerian / Hoffragian / Peter Sedlmeier-ian representation of the problem) in a visual form.

To follow the logic, see Andrew’s post on how he solved the problem. Voila:

Red means “outside the reference class”. Yellow means “in the reference class but not boy-boy”. Green means “inside the reference class and boy-boy”.

Boy-boy in the reference class occurs with probability Green / (Green + Yellow) or 13 /27

NOTE
To see why DSN calls these Gigerenzerian / Hoffragian / Sedlmeierian representations, see:

Sedlmeier, P. (1997). BasicBayes: A tutor system for simple Bayesian inference.
Behavior Research Methods, Instruments & Computers, 29(3), 328-336.

Gigerenzer, G., & Hoffrage, U. (1995). How to improve Bayesian reasoning without instruction: Frequency formats. Psychological Review, 102,, 684–704.

(Sorry for not using R, excel is just darn fast for some things)

May 21, 2010

Some novel ideas to assist retirement investing

Filed in Ideas ,Research News
Subscribe to Decision Science News by Email (one email per week, easy unsubscribe)

IMAGINING THE FUTURE TO HELP PREPARE FOR IT

The New York Times just ran a piece called Some Novel Ideas for Improving Retirement Income about having people read Victorian novels in order to increase their retirement savings rates.

Actually, that is not true.

But it did feature some newer ideas from Psychology and Behavioral Finance and Economics presented at a Allianz-sponsored event on Monday in NYC on improving retirement decision making, including:

  • Work by Hal Ersner-Hershfield, Dan Goldstein, and Bill Sharpe using age-morphed photos of people with varying emotional expressions as a way to increase how connected people feel to their future selves. It is like the scene in a Christmas Carol in which Scrooge sees the future and upon returning promises: “I will live in the Past, the Present, and the Future. The Spirits of all Three shall strive within me. I will not shut out the lessons that they teach.” Like the Distribution Builder, this technology helps people imagine what the future may be like.

Hal, sad about saving now, but psyched about spending later

  • Work by Eric Johnson on high sensitivity to loss among the elderly
  • Findings by Alessandro Previtero on how recent stock market returns affect people’s decisions to buy annuities (which of course last a long, long time)
  • Ideas by George Loewenstein on using mental accounts to help people achieve goals

These projects and more can be read about in the new report from Allianz entitled Behavioral Finance and the Post-Retirement Crisis.

photo credit: www.flickr.com/photos/nrg-photos/4199392655

May 14, 2010

JDM 2010 Conference, St. Louis, November 19-22

Filed in Conferences ,SJDM ,SJDM-Conferences
Subscribe to Decision Science News by Email (one email per week, easy unsubscribe)

31st ANNUAL MEETING OF THE SOCIETY FOR JUDGMENT AND DECISION MAKING 2010

SJDM’s 31st annual conference will be held in the Drury Plaza Hotel, St. Louis, Missouri, during November 19-22, 2010. Early registration and welcome reception will take place the evening of Friday, November 19.

Hotel reservations at the $125/night Psychonomic convention rate can be made by clicking here.

JDMers can also stay at the Millenium Hotel at the conference rate of $134/night by clicking here, or $107/night for students here.

SUBMISSIONS
The deadline for submissions is June 21, 2010. Current call for abstracts is here. Submissions for symposia, oral presentations, and posters should be made through the SJDM website at http://sql.sjdm.org. Technical questions can be addressed to the webmaster, Jon Baron, at www@sjdm.org. All other questions can be addressed to the program chair, Michel Regenwetter, at regenwet@uiuc.edu.

ELIGIBILITY
At least one author of each presentation must be a member of SJDM. Joining at the time of submission will satisfy this requirement. A membership form may be downloaded from the SJDM website at http://www.sjdm.org/jdm-member.html. An individual may give only one talk (podium presentation) and present only one poster, but may be a co-author on multiple talks and/or posters.

AWARDS
The Best Student Poster Award is given for the best poster presentation whose first author is a student member of SJDM.

The Hillel Einhorn New Investigator Award is intended to encourage outstanding work by new researchers. Applications are due July 1, 2010. Further details are available at http://www.sjdm.org.

The Jane Beattie Memorial Fund subsidizes travel to North America for a foreign scholar in pursuits related to judgment and decision research, including attendance at the annual SJDM meeting. Further details will be available at http://www.sjdm.org.

PROGRAM COMMITTEE
Michel Regenwetter (Chair), Craig McKenzie, Nathan Novemsky, Bernd Figner, Gretchen Chapman, Gal Zauberman, Ulf Reips, Wandi Bruine de Bruin, Ellie Kyung

May 5, 2010

You won, but how much was luck and how much was skill?

Filed in Encyclopedia ,Ideas ,R ,Research News ,SJDM
Subscribe to Decision Science News by Email (one email per week, easy unsubscribe)

THE ABILITY OF WINNERS TO WIN AGAIN

Even people who aren’t avid baseball fans (your DSN editor included) can get something out of this one.

When two baseball teams play each other on two consecutive days, what is the probability that the winner of the first game will be the winner of the second game?

[If you like fun, write down your prediction.]

DSN’s father-in-law told him that recently the Mets beat the Phillies 9 to 1, but the very next day, the Phillies beat the Mets 10 to 0. How could this be? If the Mets were so good as to win by 8 points, how could the exact same players be so bad as to lose by 10 points to the same opponents 24 hours later?

Let’s call this situation (in which team A beats team B one one day, but team B beats team A the very next day) a “reversal”, and we’ll say the size of the reversal is the smaller of the two margins of victory. In the above example, the size of the reversal was 8.

Using R (code provided below), DSN obtained statistics on all major league baseball games played between 1970 and 2009 and calculated how often each type of reversal occurs per 100,000 pairs of consecutive games. The result is in the the graph above. Big reversals are rare. A reversal of size 8 occurs in only 174 of 100,000 games; a size 12 reversal happens but 10 times per 100k. A size 13 reversal never happened in those 40 years. One might think this is because it would be uncommon for a team that is so good to suddenly become so bad and vice versa, but note that big margins of victory are rare: only 4% of games have margins of victory of 8 points or larger.

Back to our question:

If a team wins on one day, what’s the probability they’ll win against the same opponent when they play the very next day?

We asked two colleagues knowledgeable in baseball and the mathematics of forecasting. The answers came in between 65% and 70%.

The true answer: 51.3%, a little better than a coin toss.

That’s right. When you win in baseball, there’s only a 51% chance you’ll win again in more or less identical circumstances. The careful reader might notice that the answer is visible in the already mentioned chart. The reversals of size 0, (meaning no reversal, meaning the same team won twice) occur 51,296 times per 100,000 pairs of consecutive games.

[At this point, DSN must admit that it is entirely possible that it has made a computational error. It welcomes others to reproduce the analysis with the code or pre-processed data at the end of this post.]

What of the adage “the best predictor of future performance is past performance”? It seems less true than Sting’s observation “History will teach us nothing“. Let’s continue the investigation.

Here were plot the probability of winning the second game based on obtaining various margins of victory in the first game. We simply calculated the average win rate for each margin of victory up to 11 games, which makes up 98% of the data, and bin together the remaining 2%, comprising margins of victory from 12 to 27 points. (Rest assured, the binning makes the graph look prettier, but does not affect the outcome.)

The equation of the robust regression line is: Probability(Win_Second_Game) = .498 + .004*First_Game_Margin which suggests that even if you win the first game by an obscene 20 points, your chance of winning the second game is only 57.8%

Still in disbelief? Here we do no binning and plot the margin of victory (or loss) of the first game winner as a function of its margin of victory in the first game. The clear heteroskedasticity is dealt with by iterative reweighted least squares in R’s rlm command. Similar results are obtained by fitting a loess line. This model is Expected_Second_Game_Margin = -.012 + .030*First_Game_Margin

One final note. The 51.3% chance you’ll win the second game given you’ve won the first is smaller than the so called “home team advantage”, which we found to be a win probability of 54.2% on first games and 53.8% on second games.

When the home team wins the first game, it wins the second game 54.7% of the time.
When the home team loses the first game, it wins the second game 52.8% of the time.
When the visitor wins the first game, it wins the second game 47.2% of the time.
When the visitor loses the first game, it wins the second game 45.3% of the time.

Surprisingly, when it comes to winning the second game, it’s better to be the home team who just lost than the visitor who just won. So much for drawing conclusions from winning. Decision Science News has always wondered why teams are so eager to fire their coaches after they lose a few big games. Don’t they realize that their desired state of having won those same few big games would have been mostly due to luck?

There you have it. Either we have made an egregious error in calculation or recent victories are surprisingly uninformative.

Do your own analysis alternative 1: The pre-processed data
If you wish, you can cheat and get the pre-processed data at http://www.dangoldstein.com/flash/bball/reversals.zip

This may be of interest for people who don’t use R or for impatient types who just want to cut to the chase.

No guarantee that our pre-processing is correct. It should be all pairs of consecutive games between the same two teams.

Do your own analysis alternative 2: The code

I’ll provide the column names file for your convenience at http://www.dangoldstein.com/flash/bball/cnames.txt. I left out a bunch of columns names I didn’t care about. The complete list is at: http://www.dangoldstein.com/flash/bball/glfields.txt

R CODE
(Don’t know R yet? Learn by watching: R Video Tutorial 1, R Video Tutorial 2)

#Data obtained from http://www.retrosheet.org/
#Go for the files http://www.retrosheet.org/gamelogs/gl1970_79.zip through
#http://www.retrosheet.org/gamelogs/gl2000_09.zip and unzip each to directories
#named "gl1970_79", "gl1980_89", etc, reachable from your working directory.

library(MASS) #For robust regression, can omit if you don't want to fit lines

#Column headers, Can get from www.dangoldstein.com/flash/bball/cnames.txt
#If you want all the headers, create from www.dangoldstein.com/flash/bball/glfields.txt
LabelsForScript=read.csv("cnames.txt", header=TRUE)

#Loop to get together all data
dat=NULL
for (baseyear in seq(1970,2000,by=10))
{
endyear=baseyear+9
#string manupulate pathnames
#reading in datafiles to one big dat goes here
for (i in baseyear:endyear)
{
mypath=paste("gl",baseyear,"_",substr(as.character(endyear),start=3,stop=4),"/GL",i,".TXT",sep="")
cat(mypath,"\n")
dat=rbind(dat,read.csv(mypath, col.names=LabelsForScript$Name))
}
}

rel=dat[,c("Date", "Home","Visitor","HomeGameNum","VisitorGameNum","HomeScore","VisitorScore")] #relevant set

rel$PrevVisitorGameNum=rel$VisitorGameNum-1
rel$PrevHomeGameNum=rel$HomeGameNum-1
rel$year=substr(rel$Date,start=1,stop=4)

rm(dat)

head(rel,20); summary(rel)

relmerge=merge(rel,rel,
by.x=c("Home","Visitor","year","HomeGameNum","VisitorGameNum"),
by.y=c("Home","Visitor","year","PrevHomeGameNum","PrevVisitorGameNum")
)

relmerge=relmerge[,c(
"Home", "Visitor", "Date.x", "HomeScore.x", "VisitorScore.x",
"Date.y", "HomeScore.y", "VisitorScore.y"
)]

relmerge$dx=relmerge$HomeScore.x-relmerge$VisitorScore.x
relmerge$dy=relmerge$HomeScore.y-relmerge$VisitorScore.y

#Eliminate ties
relmerge=with(relmerge,relmerge[(dx!=0) & (dy!=0),])

relmerge$reversal=-.5*(sign(relmerge$dx)*sign(relmerge$dy))+.5
relmerge$revsize=relmerge$reversal*pmin(abs(relmerge$dx),abs(relmerge$dy))
relmerge$winnerMarginVicG1=with(relmerge,sign(dx)*dx)
relmerge$winnerMarginVicG2=with(relmerge,sign(dx)*dy)

write.csv(relmerge,"reversals.csv")

mat=NULL
mat= data.frame(cbind(
ReversalSize=0:12,
Count=table(relmerge$revsize),
Prob=table(relmerge$revsize)/length(relmerge$revsize),
Per100k=table(relmerge$revsize)/length(relmerge$revsize)*100000
))
mat
cat("Probability previous winner wins again: ", mat[1,3],"\n")

##Graph Size of Reversal Frequency
png("SizeOfReversal.png",width=450)
plot(mat$ReversalSize,mat$Per100k,xlab="Size of Reversal",ylab="Frequency in 100,000 games",type="lines")
dev.off()

##Graph Chance of Winning Given Previous Win of Various Margins
png("WinGivenMargin.png",width=450)
brks=cut(relmerge$winnerMarginVicG1,breaks=c(0,1,2,3,4,5,6,7,8,9,10,11,27))
winsVsMargin=tapply(relmerge$winnerMarginVicG2>0,brks,mean)
names(winsVsMargin)=1:12
plot(winsVsMargin,ylim=c(0,1),axes=FALSE,xlab="Margin of Victory in First Game",ylab="Chance of Winning Second Game")
axis(1,1:12,labels=c("1","2","3","4","5","6","7","8","9","10","11","12+"))
axis(2,seq(0,1,.1))
winModel=rlm(winsVsMargin~ as.numeric(names(winsVsMargin)))
abline(winModel)
dev.off()

##Graph Expected Margin of Victory Given Past Margin of Victory
png("MarVic.png",width=450)
mm2=rlm(relmerge$winnerMarginVicG2 ~ relmerge$winnerMarginVicG1)
plot(jitter(relmerge$winnerMarginVicG1),
jitter(relmerge$winnerMarginVicG2),xlab="Margin of Victory in Game 1",
ylab="Margin of Victory of Game 1 Winner in Game 2")
abline(mm2)
dev.off()

#Probability of team winning game two if they won game 1 by n points
winModel$coefficients[1]+winModel$coefficients[2]*20

#Expected margin of victory in game two given win in game 1
mm2$coefficients[1]+mm2$coefficients[2]*33

#Home Team Advantage: First game, second game
with(relmerge,{cat(mean(dx > 0), mean(dy > 0))})

#Home team advantage second game given home won first game
# Equals 1- Visitor p win second game given visitor lost the first game
with(relmerge[relmerge$dx > 0,],mean(dy > 0))

#Home team advantage second game given home lost first game
#Equals 1 - Visitor p win second game given visitor won first game
with(relmerge[relmerge$dx < 0,],mean(dy > 0))