Pie Charts. Are they worth the Fight?

Like Rob, I recently got back from ICOTS. What a great conference. Kudos to everyone who worked hard to organize and pull it off. In one of the sessions I was at, Amelia McNamara (@AmeliaMN) gave a nice presentation about how they were using data and computer science in high schools as a part of the Mobilize Project. At one point in the presentation she had a slide that showed a screenshot of the dashboard used in one of their apps. It looked something like this.

screenshot-app

During the Q&A, one of the critiques of the project was that they had displayed the data as a donut plot. “Pie charts (or any kin thereof) = bad” was the message. I don’t really want to fight about whether they are good, nor bad—the reality is probably in between. (Tufte, the most cited source to the ‘pie charts are bad’ rhetoric, never really said pie charts were bad, only that given the space they took up they were, perhaps less informative than other graphical choices.) Do people have trouble reading radians? Sure. Is the message in the data obscured because of this? Most of the time, no.

plots_1Here, is the bar chart (often the better alternative to the pie chart that is offered) and the donut plot for the data shown in the Mobilize dashboard screenshot? The message is that most of the advertisements were from posters and billboards. If people are interested in the n‘s, that can be easily remedied by including them explicitly on the plot—which neither the bar plot nor donut plot has currently. (The dashboard displays the actual numbers when you hover over the donut slice.)

It seems we are wasting our breath constantly criticizing people for choosing pie charts. Whether we like it or not, the public has adopted pie charts. (As is pointed out in this blog post, Leland Wilkinson even devotes a whole chapter to pie charts in his Grammar of Graphics book.) Maybe people are reasonably good at pulling out the often-not-so-subtle differences that are generally shown in a pie chart. After all, it isn’t hard to understand (even when using a 3-D exploding pie chart) that the message in this pie chart is that the “big 3″ browsers have a strong hold on the market.

The bigger issue to me is that these types of graphs are only reasonable choices when examining simple group differences—the marginals. Isn’t life, and data, more complex than that?Is the distribution of browser type the same for Mac and PC users? For males and females? For different age groups? These are the more interesting questions.

The dashboard addresses this through interactivity between the multiple donut charts. Clicking a slice in the first plot, shows the distribution of product types (the second plot) for those ads that fit the selected slice—the conditional distributions.

So it is my argument, that rather than referring to a graph choice as good or bad, we instead focus on the underlying question prompting the graph in the first place. Mobilize acknowledges that complexity by addressing the need for conditional distributions. Interactivity and computing make the choice of pie charts a reasonable choice to display this.

*If those didn’t persuade you, perhaps you will be swayed by the food argument. Donuts and pies are two of my favorite food groups. Although bars are nice too. For a more tasty version of the donut plot, perhaps somebody should come up with a cronut plot.

**The ggplot2 syntax for the bar and donut plot are provided below. The syntax for the donut plot were adapted from this blog post.

# Input the ad data
ad = data.frame(
	type = c("Poster", "Billboard", "Bus", "Digital"),
	n = c(529, 356, 59, 81)
	)

# Bar plot
library(ggplot2)
ggplot(data = ad, aes(x = type, y = n, fill = type)) +
     geom_bar(stat = "identity", show_guide = FALSE) +
     theme_bw()

# Add addition columns to data, needed for donut plot.
ad$fraction = ad$n / sum(ad$n)
ad$ymax = cumsum(ad$fraction)
ad$ymin = c(0, head(ad$ymax, n = -1))

# Donut plot
ggplot(data = ad, aes(fill = type, ymax = ymax, ymin = ymin, xmax = 4, xmin = 3)) +
     geom_rect(colour = "grey30", show_guide = FALSE) +
     coord_polar(theta = "y") +
     xlim(c(0, 4)) +
     theme_bw() +
     theme(panel.grid=element_blank()) +
     theme(axis.text=element_blank()) +
     theme(axis.ticks=element_blank()) +
     geom_text(aes(x = 3.5, y = ((ymin+ymax)/2), label = type)) +
     xlab("") +
     ylab("")

 

 

Increasing the Numbers of Females in STEM

I just read a wonderful piece written about how the Harvey Mudd increased the ratio of females declaring a major in Computer Science from 10% to 40% since 2006. That is awesome!

One of the things that they attribute this success to is changing the name of their introductory course. They renamed the course from Introduction to programming in Java to Creative Approaches to Problem Solving in Science and Engineering using Python.

Now, clearly, they changed the language they were using (literally) as well,from Java to Python, but it does beg the question, “what’s in a name?” According to Jim Croce and Harvey Mudd, a lot. If you don’t believe that, just ask anyone who has been in a class with the moniker Data Science, or any publisher who has published a book recently entitled [Insert anything here] Using R.

It would be interesting to study the effect of changing a course name. Are there words or phrases that attract more students to the course (e.g., creative, problem solving)?  Are there gender differences? How long does the effect last? Is it a flash-in-the-pan? Or does it continue to attract students after a short time period? (My guess is that the teacher plays a large role in the continued attraction of students to the course.)

Looking at the effects of a name is not new. Stephen Dubner and Steve Levitt of Freakonomics fame have illuminated folks about research about whether a child’s name has an effect on a variety of outcomes such as educational achievement and future income [podcast], and suggest that it isn’t as predictive as some people believe. Perhaps someone could use some of their ideas and methods to examine the effect of course names.

Has anyone tried this with statistics (aside from Data Science)? I know Harvard put in place a course called Real Life Statistics: Your Chance for Happiness (or Misery) which got good numbers of students (and a lot of press). My sense is that this happens much more in liberal arts schools (David Moore’s Concepts and Controversies book springs to mind). What would good course words or phrases for statistics include? Evidence. Uncertainty. Data. Variation. Visualization. Understanding. Although these are words that statisticians use constantly, I have to admit they all sound better than An Introduction to Statistics.

 

An Open Letter to the TinkerPlots Community

I received the following from Cliff Konold:

We have just release the following to answer questions many have asked us about when TinkerPlots will be available for sale again. Unfortunately, we do not have a list of current users to send this to, so please distribute this to others you think would be interested.


March 21, 2014

As you may have discovered by now, you can no longer purchase TinkerPlots. Many of you who have been using TinkerPlots in courses and workshops have found your way to us asking if and when it will be available for purchase again. We expect soon, by this June.  But to allow you to make informed decisions about future instructional uses of TinkerPlots, we need to provide a little background.

On December 10, 2013, we received a letter from McGraw-Hill Education giving us notice that in 90 days they would be terminating their agreement with us to publish TinkerPlots. For those of you who remember Key Curriculum as our publisher, McGraw-Hill Education acquired Key in August 2012, and as part of that acquisition became the new publisher of The Geometer’s Sketchpad, Fathom, and TinkerPlots.

Though McGraw-Hill Education had informally told us of their plans to terminate sales of both TinkerPlots and Fathom as of December 31, 2013, we were nevertheless surprised when they actually did this. We were assuming this wouldn’t happen until mid March (i.e., 90 days). In any case, since January 1 of this year, no new licenses for TinkerPlots have been sold.

Fortunately, TinkerPlots is actually owned by our University, so we are now free to find another publisher. We are in ongoing discussions with four different organizations who have expressed interest in publishing TinkerPlots. But there are many components of TinkerPlots in addition to the application (data sets, activities, help manual, instructional movies, tutorials, on-line course materials, artwork, the license server/installer, the list of existing users), which McGraw-Hill Education does own that would be hard to do without; to replace them would require a significant undertaking. Fortunately, McGraw-Hill Education has indicated their willingness to transfer most all of these assets to us, and we are very grateful for this because they are not legally bound to do so.  However, we have not yet received any of these resources or written permission that we can use them. Until we do, we cannot realistically build and release another version of the application. We are in regular communication with people at McGraw-Hill Education who have assured us that they will begin very shortly to deliver to us these materials and official permissions for their use.

We have been telling folks that a new version of TinkerPlots will be available by June 2014, and we still think this a reasonable timeframe.  We’d give it about an 85% probability. By August, 98.2%.

In the meantime, if you have unused licenses for TinkerPlots, you will still be able to register new computers on that license number. To see how many licenses you have, go to License Information… under the Help menu. If you have one license, our memory is that you can actually register 3 computers on it — they built in a little leeway. From that same dialog box you can also deregister a computer and in this way free up a currently used license. (We just checked, and when the deregister dialog comes up, it now has the name of Sketchpad where TinkerPlots should be.  But ignore that. It’s just an indication of the publisher slowly phasing the name TinkerPlots out of its system.)

Also, the resource links under the TinkerPlots Help menu still take you to resources such as movies on the publisher’s site. They have told us, however, that after March 2015, they will discontinue hosting these materials on their web site. But by that time, all these should be available on the site of the new publisher.

We are so sorry for the inconvenience this interruption and the lack of communication has caused many of you. McGraw-Hill Education has not notified its existing users, and we don’t know who most of you are.  We have heard of several instances where teachers planning to start a course or workshop in a few days have suddenly learned that their students will not be able to purchase TinkerPlots, and they have had to quickly redesign their course. We understand that because of this ordeal, some of you will decide to jump ship on TinkerPlots. But we certainly hope that most of you will stick with us through this bumpy transition. We have put nearly 15 years of ourselves into the creation of TinkerPlots and the development of its community, and we are committed to keeping both going.

Cliff Konold and Craig Miller
The TinkerPlots Development Team
Scientific Reasoning Research Institute
University of Massachusetts Amherst
Amherst, Massachusetts

Email: konold@srri.umass.edu
Web:   www.umass.edu/srri/serg/

Blog Guilt and a Categorical Data Course

I read a blog post entitled On Not Writing and it felt a little close to home. The author, an academic who is in a non-tenure position, writes,

If you have the luxury to have time to write, do you write scholarship with the hope of forwarding an academic career, or do you write something you might find more fun, and hope to publish it another way?*

The footnote read, “Of course, all of this writing presupposes that the stacks of papers get graded.” Ouch. Too close to home. I sent this on to some of my non-tenure track peers and Rob responded that I had tapped into his blog guilt. My blog guilt had been at an all time high already, and so I vowed that I would immediately post something to Citizen Statistician. Well, that was several weeks ago, but I am finally posting.

Fall semester I taught a PhD seminar on categorical data analysis that I had proposed the previous spring. As with many first-time offerings, the amount of work was staggering and intellectually wonderful. The course notes, assignments, etc. are all available at the course website (which also doubled as the syllabus).

The course, like so many advanced seminars, had very few students actually take the course for a grade, but had quite a few auditors. The course projects were a blast to read and resulted in at least two pre-dissertation papers, a written prelim paper, and so far, two articles that have been submitted to journals!

After some reflection, there are some things I will do differently when I teach this again (likely an every-other-year offering):

  • I would like to spend more time on the classification methods. Although we talked about them a little, the beginning modeling took waaaay more time than I anticipated and I need to re-think that a bit.
  • I would like to cover mixed-effects models for binary outcomes in the future. This wasn’t possible this semester since we only had a regression course as the pre-requisite. Now, there is a new pre-requisite which includes linear mixed-effects models with continuous outcomes, so at least students will have been exposed to those types of models. This course also includes a much more in-depth introduction to likelihood, so that should also open up some time.
  • I will not teach the ordinal models in the future. Yuck. Disaster.
  • I probably won’t use the Agresti book in the future. While it is quite technical and comprehensive, it is expensive and the students did not like it for the course. I don’t know what I will use instead. Agresti will remain on a resources list.
  • The propensity score methods (PSM) were a hit with the students and those will be included again. I will also probably put together an assignment based on those.
  • I would like to add in survival analysis.

There are a ton of other topics that could be cool, but with limited time they probably aren’t feasible. I think in general my thought was to spend the first half of the course on introducing and using the logistic and multinomial models and the second half of the course on advanced applications (PSM, classification, etc.)

If anyone has any great ideas or suggestions, please leave comments. Also, I am always on the lookout for some datasets that were used in journal articles or are particularly relevant.

 

 

JMM 2014

Two weeks ago I traveled to Baltimore to the Joint Mathematics Meetings. These meetings are very much like the Joint Statistics Meetings except for mathematicians. “Now, um, usually I don’t do this but uh….Go head’ on and break em off wit a lil’ preview of the remix….” (Kelly, 2003).

The JMM are a great place to educate and work with mathematics teachers at the collegiate level who are teaching introductory statistics courses. One group that is quite active in this community is the Statistics Education Special Interest Group of the Mathematical Association of America (SIGMAA). If you are a member of the MAA, let me put in a plug to join this SIGMAA. Each year they sponsor at least one contributed paper session and often several minicourses.

This year, aside from the perennial Teaching introductory statistics (for instructors new to teaching intro stats minicourse, the SIGMAA also endorsed two minicourses aimed at using randomization/bootstrapping in the introductory course, CATALST: Introductory statistics using randomization and bootstrap methods and Using randomization methods to build conceptual understanding of statistical inference. Both mini courses were well attended and will likely be offered again next January.

JMM-2014-Minicourse-Nicola

Nicola during the CATALST minicourse.

The SIGMAA also sponsored a Contributed Paper Session entitled, Data, Modeling, and Computing in the Introductory Statistics Course. The marathon session, running from 1:00pm–6:00pm, was very well attended and included 15 presentations.

Nick-Horton

Nick Horton gives the paper, Big Data in the Intro Stats Class: Use of the Airline Delays Dataset to Expose Students to a Real-World, Complex Dataset by himself, Ben Baumer, and Hadley Wickham.

One of my favorite things at JMM is attending the SIGMAA Stat-Ed Business Meeting. This took place immediately following the CPS, so we were able to capitalize on inviting many of the attendees to join us. After eating what might have been the best spread of food I have encountered at one of these meetings, we had our meeting.

The SIGMAA presents two awards during these meetings.

The Dex Whittinghill Award is presented to the first author of the paper that receives the highest evaluations during the CPS session from the previous JMM. This year, it was presented to Kari Lock-Morgan of Duke University (who was unable to be there, but sent her heartfelt thanks via her parents).

The Robert V. Hogg Award for excellence in teaching introductory statistics was presented to Johanna Hardin of Pomona College. Johanna’s colleague, Gizem Karaali, gave a heartwarming talk when presenting Johanna the award.

IMG_3772

Scott Albers, SIGMAA chair, congratulates Johanna Hardin on winning the Robert V. Hogg Award

IMG_3776

Gizem Karaali reads a heartwarming note from the Johanna’s colleagues.

 

References

Kelly, R. (2003). Ignition (remix). On Chocolate factory. Chicago: Jive, Sony.

R Syntax for Ranked Choice Voting

I have gotten several requests for the R syntax I used to analyze the ranked-choice voting data and create the animated GIF. Rather than just posting the syntax, I thought I might write a detailed post describing the process.

Reading in the Data

The data is available on the Twin Cities R User Group’s GitHub page. The file we are interested in is 2013-mayor-cvr.csv. Clicking this link gets you the “Display” version of the data. We actually want the “Raw” data, which is viewable by clicking View Raw. The link is using a secure connection (https://) which R does not handle well without some workaround.

One option is to use the getURL() function from the RCurl library. The text= argument in the read.csv() function reads the data in using a text connection, and is necessary to not receive an error.

library(RCurl)
url = getURL("https://raw.github.com/tcrug/ranked-choice-vote-data/master/2013-mayor-cvr.csv")
vote = read.csv(text = url)

A quick look at the data reveal that the three ranked choices for the 80,101 voters are in columns 2, 3, and 4. The values “undervote” and “overvote” are ballot also need to be converted to “NA” (missing). The syntax below reduces the data frame to the second, third and fourth columns and replaces “undervote” and “over vote’ with NAs.

vote = vote[ , 2:4]
vote[vote == "undervote"] = NA
vote[vote == "overvote"] = NA

The syntax below is the main idea of the vote counting algorithm. (You will need to load the ggplot library.) I will try to explain each line in turn.

nonMissing = which(vote[ , 1] != "")
candidates = vote[nonMissing, 1]
#print(table(candidates))

vote[ , 1] =  factor(vote[ , 1], levels = rev(names(sort(table(vote[ , 1]), decreasing=TRUE))))
mayor = levels(vote[ , 1])
candidates = vote[nonMissing, 1]

p = ggplot(data = data.frame(candidates), aes(x = factor(candidates, levels = mayor))) +
	geom_bar() +
	theme_bw() +
	ggtitle("Round 1") +
	scale_x_discrete(name = "", drop = FALSE) +
	ylab("Votes") +
	ylim(0, 40000) +
	coord_flip()

ggsave(p, file = "~/Desktop/round1.png", width = 8, height = 6)
  • Line 1: Examine the first column of the vote data frame to determine which rows are not missing.
  • Line 2: Take the candidates from the first column and put them in an object
  • Line 3: Count the votes for each candidate
  • Line 5: Coerce the first column into a factor (it is currently a character vector) and create the levels of that factor so that they display in reverse order based on the number of votes. This is important in the plot so that the candidates display in the same order every time the plot is created.
  • Line 6: Store the levels we just created from Line #5 in an object
  • Line 7: Recreate the candidates object (same as Line #2) but this time they are a factor. This is so we can plot them.
  • Line 8–16: Create the bar plot
  • Line 18: Save the plot onto your computer as a PNG file. In my case, I saved it to the desktop.

Now, we will create an object to hold the round of counting (we just plotted the first round, so the next round is Round 2). We will also coerce the first column back to characters.

j = 2
vote[ , 1] = as.character(vote[ , 1])

The next part of the syntax is looped so that it repeats the remainder of the algorithm, which essentially is to determine the candidate with the fewest votes, remove him/her from all columns, take the second and third choices of anyone who voted for the removed candidate and make them the ‘new’ first and second choices, recount and continue.

while( any(table(candidates) >= 0.5 * length(candidates) + 1) == FALSE ){
	leastVotes = attr(sort(table(candidates))[1], "names")
	vote[vote == leastVotes] = NA
	rowNum = which(is.na(vote[ , 1]))
	vote[rowNum, 1] = vote[rowNum, 2]
	vote[rowNum, 2] = vote[rowNum, 3]
	vote[rowNum, 3] = NA
	nonMissing = which(vote[ , 1] != "")
	candidates = vote[nonMissing, 1]
	p = ggplot(data = data.frame(candidates), aes(x = factor(candidates, levels = mayor))) +
		geom_bar() +
		theme_bw() +
		ggtitle(paste("Round", j, sep =" ")) +
		scale_x_discrete(name = "", drop = FALSE) +
		ylab("Votes") +
		ylim(0, 40000) +
		coord_flip()
	ggsave(p, file = paste("~/Desktop/round", j, ".png", sep = ""), width = 8, height = 6)
	j = j + 1
	candidates = as.character(candidates)
	print(sort(table(candidates)))
	}

The while{} loop continues to iterate until the criterion for winning the election is met. Within the loop:

  • Line 2: Determines the candidate with the fewest votes
  • Line 3: Replaces the candidate with the fewest votes with NA (missing)
  • Line 4: Stores the row numbers with any NA in column 1
  • Line 5: Takes the second choice for the rows identified in Line #4 and stores them in column 1 (new first choice)
  • Line 6: Takes the third choice for the rows identified in Line #4 and stores them in column 2 (new second choice)
  • Line 7: Makes the third choice for the rows identified in Line #4 an NA
  • Line 8–18: Are equivalent to what we did before (but this time they are in the while loop). The biggest difference is in the ggsave() function, the filename is created on the fly using the object we created called j.
  • Line 19: Augment j by 1
  • Line 20: Print the results

Creating the Animated GIF

There should now be 35 PNG files on your desktop (or wherever you saved them in the ggsave() function). These should be called round1.png, round2.png, etc. The first thing I did was rename all of the single digit names so that they were round01.pnground02.png, …, round09.png.

Then I opened Terminal and used ImageMagick to create the animated GIF. Note that in Line #1 I move into the folder where I saved the PNG files. In my case, the desktop.

cd ~/Desktop
convert -delay 50 round**.png animation.gif

The actual animated GIF appears on the previous Citizen Statistician post.

Ranked Choice Voting

The city of Minneapolis recently elected a new mayor. This is not newsworthy in and of itself, however the method they used was—ranked choice voting. Ranked choice voting is a method of voting allowing voters to rank multiple candidates in order of preference. In the Minneapolis mayoral election, voters ranked up to three candidates.

The interesting part of this whole thing was that it took over two days for the election officials to declare a winner. It turns out that the official procedure for calculating the winner of the ranked-choice vote involved cutting and pasting spreadsheets in Excel.

The technology coordinator at E-Democracy, Bill Bushey, posted the challenge of writing a program to calculate the winner of a ranked-choice election to the Twin Cities Javascript and Python meetup groups. Winston Chang also posted it to the Twin Cities R Meetup group. While not a super difficult problem, it is complicated enough that it can make for a nice project—especially for new R programmers. (In fact, our student R group is doing this.)

The algorithm, described by Bill Bushey, is

  1. Create a data structure that represents a ballot with voters’ 1st, 2nd, and 3rd choices
  2. Count up the number of 1st choice votes for each candidate. If a candidate has 50% + 1 votes, declare that candidate the winner.
  3. Else, select the candidate with the lowest number of 1st choice votes, remove that candidate completely from the data structure, make the 2nd choice of any voter who voted for the removed candidate the new 1st choice (and the old 3rd choice the new 2nd choice).
  4. Goto 2

As an example consider the following sample data:

Voter  Choice1  Choice2  Choice3
    1    James     Fred    Frank
    2    Frank     Fred    James
    3    James    James    James
    4    Laura 
    5    David 
    6    James              Fred
    7    Laura
    8    James
    9    David    Arnie
   10    David

In this data, James has the most 1st choice votes (4) but it is not enough to win the election (a candidate needs 6 votes = 50% of 10 votes cast + 1 to win). So at this point we determine the least voted for candidate…Frank, and delete him from the entire structure:

Voter  Choice1  Choice2  Choice3
    1    James     Fred    <del>Frank</del>
    2    <del>Frank</del>     Fred    James
    3    James    James    James
    4    Laura 
    5    David 
    6    James              Fred
    7    Laura
    8    James
    9    David    Arnie
   10    David

Then, the 2nd choice of any voter who voted for Frank now become the new “1st” choice. This is only Voter #2 in the sample data. Thus Fred would become Voter #2′s 1st choice and James would become Voter #2′s 2nd choice:

Voter  Choice1  Choice2  Choice3
    1    James     Fred
    2     Fred    James
    3    James    James    James
    4    Laura 
    5    David 
    6    James              Fred
    7    Laura
    8    James
    9    David    Laura
   10    David

James still has the most 1st choice votes, but not enough to win (he still needs 6 votes!). Fred has the fewest 1st choice votes, so he is eliminated, and his voter’s 2nd and 3rd choices are moved up:

Voter  Choice1  Choice2  Choice3
    1    James
    2    James
    3    James    James    James
    4    Laura 
    5    David 
    6    James              
    7    Laura
    8    James
    9    David    Laura
   10    David

James now has five 1st choice votes, but still not enough to win. Laura has the fewest 1st choice votes, so she is eliminated, and her voter’s 2nd and 3rd choices are moved up:

Voter  Choice1  Choice2  Choice3
    1    James
    2    James
    3    James    James    James
    4     
    5    David 
    6    James              
    7    
    8    James
    9    David    Laura
   10    David

James retains his lead with five first place votes…but now he is declared the winner. Since Voter #4 and #7 do not have a 2nd or 3rd choice vote, they no longer count in the number of voters. Thus to win, a candidate only needs 5 votes = 50% of the 8 1st choice votes + 1.

The actual data from Minneapolis includes over 80,000 votes for 36 different candidates. There are also ballot issues such as undervoting and overvoting. This occurs when voters give multiple candidates the same ranking (overvoting) or do not select a candidate (undervoting).

The animated GIF below shows the results after each round of elimination for the Minneapolis mayoral election.

2013 Mayoral Race

The Minneapolis mayoral data is available on GitHub as a CSV file (along with some other smaller sample files to hone your programming algorithm). There is also a frequently asked questions webpage available from the City of Minneapolis regarding ranked choice voting.

In addition you can also listen to the Minnesota Public Radio broadcast in which they discussed the problems with the vote counting. The folks at the  R Users Group Meeting were featured and Winston brought the house down when commuting on the R program that computed the winner within a few seconds said, “it took me about an hour and a half to get something usable, but I was watching TV at the time”.

See the R syntax I used here.

 

Should Programming Count as a “Foreign Language”?

I re-hashed this blog post title from the Edutopia article, Should Coding be the “New Foreign Language” Requirement? Texas legislators just answered this question with “Yes”. I hope Minnesota doesn’t follow suit.

Now, in all fairness, I need to disclose that when I taught high school, the Math department played a practical joke on the Languages department by faking a document that claimed that mathematics would be accepted as a foreign language requirement and then conveniently dropping the document outside the classroom door of the Spanish teacher. The ensuing result had the faculty laughing for weeks.

But, I would have no more stood up for mathematics fulfilling a foreign language requirement than computer science fulfilling the same requirement. I think a better substitution however is that computer science should count as fulfilling a mathematics requirement!

The authors of the Edutopia blog write,

In terms of cognitive advantages, learning a system of signs, symbols and rules used to communicate — that is, language study — improves thinking by challenging the brain to recognize, negotiate meaning and master different language patterns. Coding does the same thing.

Substitute the word “mathematics” for “language study” in the previous paragraph and in my mind, it is an even better sell.

While I hope coding does not replace foreign language, I am glad that it is receiving its time in the spotlight. And, I hope the statistics community can use this to its advantage. This is perhaps the perfect route for building on the success of AP statistics…statistical computing. The combined sexiness (sorry Mr. Varian!) of statistics and coding would be amazing (p < .000001) and would be beneficial to both disciplines.

Warning: Mac OS 10.9 Mavericks and R Don’t Play Nicely

For some reason I was compelled to update my Mac’s OS and R on the same day. (I know…) It didn’t go well on several accounts and I mostly blame Apple. Here are the details.

  • I updated R to version 3.0.2 “Frisbee Sailing”
  • I updated my OS to 10.9 “Mavericks”

When I went to use R things were going fine until I mistyped a command. Rather than giving some sort of syntax error, R responded with,

&gt; *** caught segfault *** 
&gt; address 0x7c0, cause 'memory not mapped' 
&gt; 
&gt; Possible actions: 
&gt; 1: abort (with core dump, if enabled) 
&gt; 2: normal R exit 
&gt; 3: exit R without saving workspace 
&gt; 4: exit R saving workspace 
&gt; Selection:

Unlike most of my experiences with computing, this I was able to replicate many times. After a day of panic and no luck on Google, I was finally able to find a post on one of the Google Groups from Simon Urbanek responding to someone with a similar problem. He points out that there are a couple of solutions, one of which is to wait until Apple gets things stabilized. (This is an issue since if you have ever tried to go back to a previous OS on a Mac, you will know that this might take several days of pain and swearing.)

The second solution he suggests is to install the nightly build or rebuild the GUI. To install the nightly build visit the R  for Mac OS X Developer’s page. Or, in Terminal issue the following commands,

svn co https://svn.r-project.org/R-packages/trunk/Mac-GUI 
cd Mac-GUI 
xcodebuild -configuration Debug 
open build/Debug/R.app

I tried both and this worked fine…until I needed to load a package. Then I was given an error that the package couldn’t be found. Now I realize that you can download the packages you need from source and compile them yourself, but I was trying to figure out how to deal with students who were in a similar situation. (This is not an option for most social science students.)

The best solution it turned out is to use RStudio, which my students pretty much all use anyway. (My problem is that I am a Sublime Text 2 user.) This allowed the newest version of R to run on the new Mac OS. But, as is pointed out on the RStudio blog,

As a result of a problem between Mavericks and the user interface toolkit underlying RStudio (Qt) the RStudio IDE is very slow in painting and user interactions  when running under Mavericks.

I re-downloaded the latest stable release of the R GUI about an hour ago, and so far it seems to be working fine with Mavericks (no abort message yet), so this whole post may be moot.