Fathom Returns

The other shoe has fallen.  Last week (or so) Tinkerplots returned to the market, and now Fathom Version 2.2 (which is the foundation on which Tinkerplots is built) is  available for a free download.  Details are available on Bill Finzer‘s website.

Fathom is one of my favorite softwares…the first commercially available package to be based on learning theory, Fathom’s primary goal is to teach statistics.  After a one-minute introduction, beginning students can quickly discuss ‘findings’ across several variables.  So many classroom exercises involve only one or two variables, and Fathom taught me  that this is unfair to students and artificially holds them back.

Welcome back, Fathom!

Tinkerplots Available Again

Very exciting news for Tinkerplots users (and for those who should be Tinkerplots users).  Tinkerplots is highly visual dynamic software that lets students design and implement simulation machines, and includes many very cool data analysis tools.

To quote from TP developer Cliff Konold:

Today we are releasing Version 2.2 of TinkerPlots.  This is a special, free version, which will expire in a year  — August 31, 2015.

To start the downloading process

Go to the TinkerPlots home page and click on the Download TinkerPlots link in the right hand panel. You’ll fill out a form. Shortly after submitting it, you’ll get an email with a link for downloading.

Help others find the TinkerPlots Download page

If you have a website, blog, or use a social media site, please help us get the word out so others can find the new TinkerPlots Download page. You could mention that you are using TinkerPlots 2.2 and link to www.srri.umass.edu/tinkerplots.

Why is this an expiring version?

As we explained in this correspondence, until January of 2014, TinkerPlots was published and sold by Key Curriculum, a division of McGraw Hill Education. Their decision to cease publication caught us off guard, and we have yet to come up with an alternative publishing plan. We created this special expiring version to meet the needs of users until we can get a new publishing plan in place.

What will happen after version 2.2 expires?

By August 2015, we will either have a new publisher lined up, or we will create another free version.  What is holding us up right now is our negotiations with the University of Massachusetts Amherst, who currently owns TinkerPlots.  Once they have decided about their future involvement with TinkerPlots, we can complete our discussions with various publishing partners.

If I have versions 2.0 or 2.1 should I delete them?

No, you should keep them. You already paid for these, and they are not substantively different from version 2.2. If and when a new version of TinkerPlots is ready for sale, you may not want to pay for it.  So keep your early version that you’ve already paid for.
Cliff and Craig

Lively R

Next week, the UseR conference comes to UCLA.  And in anticipation, I thought a little foreshadowing would be nice.  Amelia McNamara, UCLA Stats grad student and rising stats ed star, shared with me a new tool that has the potential to do some wonderful things.  LivelyR is a work-in-progress that is, in the words of its creators, a “mashup of R with packages of Rstudio.” The result is a highly interactive.  I was particularly struck by and intrigued by the ‘sweeping’ function, which visually smears graphics across several parameter values.  The demonstration shows how this can help understand the effects of bin-width and off-set changes on a histogram so that a more robust sense of the sample distribution shines through.

R is beginning to become a formidable educational tool, and I’m looking forward to learning more at UseR next week. For those of you in L.A. who can attend, Aron Lunzer will be talking about LivelyR at 4pm on Tuesday, July 1.

Blog Guilt and a Categorical Data Course

I read a blog post entitled On Not Writing and it felt a little close to home. The author, an academic who is in a non-tenure position, writes,

If you have the luxury to have time to write, do you write scholarship with the hope of forwarding an academic career, or do you write something you might find more fun, and hope to publish it another way?*

The footnote read, “Of course, all of this writing presupposes that the stacks of papers get graded.” Ouch. Too close to home. I sent this on to some of my non-tenure track peers and Rob responded that I had tapped into his blog guilt. My blog guilt had been at an all time high already, and so I vowed that I would immediately post something to Citizen Statistician. Well, that was several weeks ago, but I am finally posting.

Fall semester I taught a PhD seminar on categorical data analysis that I had proposed the previous spring. As with many first-time offerings, the amount of work was staggering and intellectually wonderful. The course notes, assignments, etc. are all available at the course website (which also doubled as the syllabus).

The course, like so many advanced seminars, had very few students actually take the course for a grade, but had quite a few auditors. The course projects were a blast to read and resulted in at least two pre-dissertation papers, a written prelim paper, and so far, two articles that have been submitted to journals!

After some reflection, there are some things I will do differently when I teach this again (likely an every-other-year offering):

  • I would like to spend more time on the classification methods. Although we talked about them a little, the beginning modeling took waaaay more time than I anticipated and I need to re-think that a bit.
  • I would like to cover mixed-effects models for binary outcomes in the future. This wasn’t possible this semester since we only had a regression course as the pre-requisite. Now, there is a new pre-requisite which includes linear mixed-effects models with continuous outcomes, so at least students will have been exposed to those types of models. This course also includes a much more in-depth introduction to likelihood, so that should also open up some time.
  • I will not teach the ordinal models in the future. Yuck. Disaster.
  • I probably won’t use the Agresti book in the future. While it is quite technical and comprehensive, it is expensive and the students did not like it for the course. I don’t know what I will use instead. Agresti will remain on a resources list.
  • The propensity score methods (PSM) were a hit with the students and those will be included again. I will also probably put together an assignment based on those.
  • I would like to add in survival analysis.

There are a ton of other topics that could be cool, but with limited time they probably aren’t feasible. I think in general my thought was to spend the first half of the course on introducing and using the logistic and multinomial models and the second half of the course on advanced applications (PSM, classification, etc.)

If anyone has any great ideas or suggestions, please leave comments. Also, I am always on the lookout for some datasets that were used in journal articles or are particularly relevant.

 

 

Data Analysis and Statistical Inference starts tomorrow on Coursera

It has been (and still is) lots of work putting this course together, but I’m incredibly excited about the opportunity to teach (and learn from) the masses! Course starts tomorrow (Feb 17, 2014) at noon EST.

coursera_dasi

A huge thanks also goes out to my student collaborators who helped develop, review, and revise much of the course materials (and who will be taking the role of Community TAs on the course discussion forums) and to Duke’s Center for Instructional Technology who pretty much runs the show.

This course is also part of the Reasoning, Data Analysis and Writing Specialization, along with Think Again: How to Reason and Argue and English Composition 1: Achieving Expertise. This interdisciplinary specialization is designed to strengthen students’ ability to engage with others’ ideas and communicate productively with them by analyzing their arguments, identifying the inferences they are drawing, and understanding the reasons that inform their beliefs. After taking all three courses, students complete an in-depth capstone project where they choose a controversial topic and write an article-length essay in which they use their analysis of the data to argue for their own position about that topic.

Let’s get this party started!

Warning: Mac OS 10.9 Mavericks and R Don’t Play Nicely

For some reason I was compelled to update my Mac’s OS and R on the same day. (I know…) It didn’t go well on several accounts and I mostly blame Apple. Here are the details.

  • I updated R to version 3.0.2 “Frisbee Sailing”
  • I updated my OS to 10.9 “Mavericks”

When I went to use R things were going fine until I mistyped a command. Rather than giving some sort of syntax error, R responded with,

> *** caught segfault *** 
> address 0x7c0, cause 'memory not mapped' 
> 
> Possible actions: 
> 1: abort (with core dump, if enabled) 
> 2: normal R exit 
> 3: exit R without saving workspace 
> 4: exit R saving workspace 
> Selection:

Unlike most of my experiences with computing, this I was able to replicate many times. After a day of panic and no luck on Google, I was finally able to find a post on one of the Google Groups from Simon Urbanek responding to someone with a similar problem. He points out that there are a couple of solutions, one of which is to wait until Apple gets things stabilized. (This is an issue since if you have ever tried to go back to a previous OS on a Mac, you will know that this might take several days of pain and swearing.)

The second solution he suggests is to install the nightly build or rebuild the GUI. To install the nightly build visit the R  for Mac OS X Developer’s page. Or, in Terminal issue the following commands,

svn co https://svn.r-project.org/R-packages/trunk/Mac-GUI 
cd Mac-GUI 
xcodebuild -configuration Debug 
open build/Debug/R.app

I tried both and this worked fine…until I needed to load a package. Then I was given an error that the package couldn’t be found. Now I realize that you can download the packages you need from source and compile them yourself, but I was trying to figure out how to deal with students who were in a similar situation. (This is not an option for most social science students.)

The best solution it turned out is to use RStudio, which my students pretty much all use anyway. (My problem is that I am a Sublime Text 2 user.) This allowed the newest version of R to run on the new Mac OS. But, as is pointed out on the RStudio blog,

As a result of a problem between Mavericks and the user interface toolkit underlying RStudio (Qt) the RStudio IDE is very slow in painting and user interactions  when running under Mavericks.

I re-downloaded the latest stable release of the R GUI about an hour ago, and so far it seems to be working fine with Mavericks (no abort message yet), so this whole post may be moot.

Crime data and bad graphics

I’m working on the 2nd edition of our textbook, Gould & Ryan, and was looking for some examples of bad statistical graphics.  Last time, I used FBI data and created a good and bad graphic from the data. This time, I was pleased to see that the FBI provided its own bad graphic.fbi crime bad graph

This shows a dramatic decrease in crime over the last 5 years.  (Not sure why 2012 data aren’t yet available.) Of course, this graph is only a bad graph if the purpose is to show the rate of decrease.  If you look at it simply as a table of numbers, it is not so bad.

Here’s the graph on the appropriate scale.

fbi crimes improved

Still, a decrease worth bragging about.  But, alas, somewhat less dramatic.

Statistics, the government shutdown, and causality.

There’s a  statistical meme that is making its way into pundits’ discussions (as we might politely call them) that is of interest to statistics educators.  There are several variations, but the basic theme is this:  because of the government shutdown, people are unable to benefit from the new drugs they receive by participating in clinical trials.  The L.A Times went so far as to publish an editorial from a gentleman who claimed that he was cured by his participation in a clinical trial.

Now if they had said that future patients are prevented from benefiting from what is learned from a clinical trial, then they’d nail it.  Instead, they seem to be overlooking the fact that some patients will be randomized to the control group, and probably get the same treatment as if there were no trial at all.  And in many trials (a majority?), the result will be that the experimental treatment had little or no effect beyond the traditional treatment.  And in a very small number of cases, the experimental effect will be found to have serious side effects.  And so the pundits should really be telling us that the government shutdown prevents patients from a small probability of a benefitting from experimental treatment.

All snarkiness aside, I think the prevalence of this meme points to the subtleties of interpreting probabilistic experiments, in which outcomes contain much variability, and so conclusions must be stated in terms of group characteristics.  This came out in the SRTL discussion in Minnesota this summer, when Maxinne Pfannkuch, Pip Arnold, and Stephanie Budgett at the University of Auckland  presented their work leading towards a framework for describing students’ understanding of causality.  I don’t remember very well the example they used, but it was similar to this (and was a real-life study):   patients were randomized to receive either fish oil or vegetable oil in their diet.  The goal of the study was to determine if fish oil lowered cholesterol.  At the end of the study, the fish oil group had a slightly lower average cholesterol levels.  A typical interpretation was, “If I take fish oil, my cholesterol will go down.”

One problem with this interpretation is that it ignored the within-group variation.  Some of patients in the fish oil group saw their cholesterol go up; some saw little or no change.  The study’s conclusion is about group means, not about individuals.  (There were other problems, too.  This interpretation ignores the existence of the control group: we don’t really know if fish oil improves cholesterol compared to your current diet; we know only that it tends to go down in comparison to a vegetable-oil diet.  Also, we know the effects only for those who participated in the study. We assume they were not special people, but possibly the results won’t hold for other groups.)

Understanding causality in probabilistic settings (or any setting) is a challenge for young students and even adults.  I’m very excited to see such a distinguished group of researchers begin  to help us understand.  Judea Pearl, at UCLA, has done much to encourage statisticians to think about the importance of teaching causal inference.  Recently, he helped the American Statistical Association establish the Causality in Statistics Education prize, won this year by Felix Elwert, a sociologist at the University of Wisconsin-Madison.  We still have a ways to go before we understand how to best teach this topic at the undergraduate level and even further before we understand how to teach it at earlier levels.  But, as the government shut down has shown, understanding probabilistic causality is an important component of statistical literacy.

Paint and Patch

IMG_0591

The other day I was painting the trim on our house and it got me reminiscing. The year was 2005. The conference was JSM. The location was Minneapolis. I had just finished my third year of graduate school and was slotted to present in a Topic Contributed session at my first JSM. The topic was Implementing the GAISE Guidelines in College Statistics Courses. My presentation was entitled, Using GAISE to Create a Better Introductory Statistics Course.

We had just finished doing a complete course revision for our undergraduate course based on the work we had been doing with our NSF-funded Adapting and Implementing Innovative Material in Statistics (AIMS) project. We had rewritten the entire curriculum, including all of our assessments and course activities.

The discussant for the session was Robin Lock. In his remarks about the presentations, Lock compared the re-structuring of a statistics course to the remodeling of a house. He described how some teachers restructure their courses according to a plan doing a complete teardown and rebuild. He brought the entire room to laughter as he described most teachers’ attempts, however, as “paint and patch,” fixing a few things that didn’t work quite so well, but mostly just sprucing things up.

The metaphor works. I have been thinking about this for the last eight years. Sometimes paint-and-patch is exactly what is needed. It is pretty easy and not very time consuming. On the other hand, if the structure underneath is rotten, no amount of paint-and-patch is going to work. There are times when it is better to tear down and rebuild.

As another academic year approaches, many of us are considering the changes to be made in courses we will soon be teaching. Is it time for a rebuild? Or will just a little touch-up do the trick?