Interpreting Cause and Effect

One big challenge we all face is understanding what’s good and what’s bad for us.  And it’s harder when published research studies conflict. And so thanks to Roger Peng for posting on his Facebook page an article that led me to this article by Emily Oster:  Cellphones Do Not Give You Brain Cancer, from the good folks at the 538 blog. I think this article would make a great classroom discussion, particularly if, before showing your students the article, they themselves brainstormed several possible experimental designs and discussed strengths and weaknesses of the designs. I think it is also interesting to ask why no study similar to the Danish Cohort study was done in the US.  Thinking about this might lead students to think about cultural attitudes towards wide-spread data collection.

PD follow-up

Last Saturday the Mobilize project hosted a day-long professional development meeting for about 10 high school math teachers and 10 high school science teachers.  As always, it was very impressive how dedicated the teachers were, but I was particularly impressed by their creativity as, again and again, they demonstrated that they were able to take our lessons and add dimension to them that I, at least, didn’t initially see.

One important component of Mobilize is to teach the teachers statistical reasoning.  This is important because (a) the Mobilize content is mostly involved with using data analysis as a pathway for teaching math and science and (b) the Common Core (math) and the Next Generation (science) standards include much more statistics than previous curricula.  And yet, at least for math teachers, data analysis is not part of their education.

And so I was looking forward to seeing how the teachers performed on the “rank the airlines” Model Eliciting Activity, which was designed by the CATALYST project, led by Joan Garfield at U of Minnesota.  (Unit 2, Lesson 9 from the CATALYST web site.)  Model Eliciting Activities (MEA) are a lesson design which I’m getting really excited about, and trying to integrate into more of my own lessons.  Essentially, groups of students are given realistic and complex questions to answer.  The key is to provide some means for the student groups to evaluate their own work, so that they can iterate and achieve increasingly improved solutions.  MEAs began in the engineering-education world, and have been used increasingly in mathematics both at college and high school and middle school levels.  (A good starting point is “Model-eliciting activities (MEAs)  as a bridge between engineering education research and mathematics education research”, HamiIton, Lesh, Lester, Brilleslyper, 2008.  Advances in Engineering Education.) I was first introduced to MEAs when I was an evaluator for the CATALYST project, but didn’t really begin to see their potential until Joan Garfield pointed it out to me while I was trying to find ways of enhancing our Mobilize curriculum.

In the MEA we presented to the teachers on Saturday, they were shown data on arrival time delays from 5 airlines. Each airline had 10 randomly sampled flights into Chicago O’Hare from a particular year.  The primary purpose of the MEA is to help participants develop informal ways for comparing groups when variability is present.  In this case, the variability is present in an obvious way (different flights have different arrival delays) as well as less obvious ways (the data set is just one possible sample from a very large population, and there is sample-to-sample variability which is invisible. That is, you cannot see it in the data set, but might still use the data to conjecture about it.)

Before the PD I had wondered if the math and science teachers would approach the MEA differently.  Interestingly, during our debrief, one of the math teachers wondered the same thing.  I’m not sure if we saw truly meaningful differences, but here are some things we did see.

Most of the teams immediately hit on the idea of struggling to merge both the airline accuracy and the airline precision into their ranking.  However, only two teams presented rules that used both.  Interestingly, one used precision (variability) as the primary ranking and used accuracy (mean arrival delay) to break ties; another group did the opposite.

At least one team ranked only on precision, but developed a different measure of precision that was more relevant to the problem at hand:  the mean absolute deviations from 0 (rather than deviations from the mean).

One of the more interesting things that came to my attention, as a designer or curriculum, was that almost every team wrestled with what to do with outliers.  This made me realize that we do a lousy job of teaching people what to do with outliers, particularly since outliers are not very rare.   (One could argue whether, in fact, any of the observations in this MEA are outliers or not, but in order to engage in that argument you need a more sophisticated understanding of outliers than we develop in our students.  I, myself, would not have considered any of the observations to be outliers.)  For instance, I heard teams expressing concern that it wasn’t “fair” to penalize an airline that had a fairly good mean arrival time just because of one bad outlier.  Other groups wondered if the bad outliers were caused by weather delays and, if so, whether it was fair to include those data at all.   I was very pleased that no one proposed an outright elimination of outliers. (At least within my hearing.)  But my concern was that they didn’t seem to have constructive ways of thinking about outliers.

The fact that teachers don’t have a way of thinking about outliers is our fault.  I think this MEA did a great job of exposing the participants to a situation in which we really had to think about the effect of outliers in a context where they were not obvious data-entry errors.  But I wonder how we can develop more such experiences, so that teachers and students don’t fall into procedural-based, automated thinking.  (e.g. “If it is more than 1.5 times the IQR away from the median, it is an outlier and should be deleted.”  I have heard/read/seen this far too often.)

Do you have a lesson that engages students in wrestling with outliers? If so, please share!

Model Eliciting Activity: Prologue

I’m very excited/curious about tomorrow: I’m going to lead about 40 math and science teachers in a data-analysis activities, using one of the Model Eliciting Activities from the University of Minnesota Catalysts for Change Project. (One of our bloggers, Andy, was part of this project.) Specifically, we’re giving them the arrival-delay times for five different airlines into Chicago O’Hare. A random sample of 10 from each airline, and asking them to come up with rules for ranking the airlines from best to worst.

I’m curious to see what they come up with, particularly whether  the math teachers differ terribly from the science teachers. The math teachers are further along in our weekend professional development program than are the science teachers, and so I’m hoping they’ll identify the key characteristics of a distribution (all together: center, spread, shape; well, shape doesn’t play much of a role here) and use these to formulate their rankings. We’ve worked hard on helping them see distributions as a unit, and not a collection of individual points, and have seen big improvements in the teachers, most of whom have not taught statistics before.

The science teachers, I suspect, will be a little bit more deterministic in their reasoning, and, if true to my naive stereotype of science teachers, will try to find explanations for individual points. Since I haven’t worked as much with the science teachers, I’m curious to see if they’ll see the distribution as a whole, or instead try to do point-by-point comparisons.

When we initially started this project, we had some informal ideas that the science teachers would take more naturally to data analysis than would the math teachers. This hasn’t turned out to be entirely true. Many of the math teachers had taught statistics before, and so had some experience. Those who hadn’t, though, tended to be rather procedurally oriented. For example, they often just automatically dropped outliers from their analysis without any thought at all, just because they thought that that was the rule. (This has been a very hard habit to break.)

The math teachers also had a very rigid view of what was and was not data. The science teachers, on the other hand, had a much more flexible view of data. In a discussion about whether photos from a smart phone were data, a majority of math teachers said no and a majority of science teachers said yes. On the other hand, the science teachers tend to use data to confirm what they already know to be true, rather than use it to discover something. This isn’t such a problem with the math teachers, in part because they don’t have preconceptions of the data and so have nothing to confirm. In fact, we’ve worked hard with the math teachers, and with the science teachers, to help them approach a data set with questions in mind. But it’s been a challenge teaching them to phrase questions for their students in which the answers aren’t pre-determined or obvious, and which are empirically oriented. (For example: We would like them to ask something like “what activities most often led to our throwing away redcycling into the trash bin?” rather than “Is it wrong to throw trash into the recycling bin?” or “Do people throw trash into the recycling bin?”)

So I’ll report back soon on what happened and how it went.

Yikes…It’s Been Awile

Apparently our last blog post was in August. Dang. Where did five months go? Blog guilt would be killing me, but I swear it was just yesterday that Mine posted.

I will give a bit of review of some of the books that I read this semester related to statistics. Most recently, I finished Hands-On Matrix Algebra Using R: Active and Motivated Learning with Applications. This was a fairly readable book for those looking to understand a bit of matrix algebra. The emphasis is definitely in economics, but their are some statistics examples as well. I am not as sure where the “motivated learning” part comes in, but the examples are practical and the writing is pretty coherent.

The two books that I read that I am most excited about are Model Based Inference in the Life Sciences: A Primer on Evidence and The Psychology of Computer Programming. The latter, written in the 70’s, explored psychological aspects of computer programming, especially in industry, and on increasing productivity. Weinberg (the author) stated his purpose in the book was to study “computer programming as a human activity.” This was compelling on many levels to me, not the least of which is to better understand how students learn statistics when using software such as R.

Reading this book, along with participating in a student-led computing club in our department has sparked some interest to begin reading the literature related to these ideas this spring semester (feel free to join us…maybe we will document our conversations as we go). I am very interested in how instructor’s choose software to teach with (see concerns raised about using R in Harwell (2014). Not so fast my friend: The rush to R and the need for rigorous evaluation of data analysis and software in education. Education Research Quarterly.) I have also thought long and hard about not only what influences the choice of software to use in teaching (I do use R), but also about subsequent choices related to that decision (e.g., if R is adopted, which R packages will be introduced to students). All of these choices probably have some impact on student learning and also on students’ future practice (what you learn in graduate school is what you ultimately end up doing).

The Model Based Inference book was a shorter, readable version of Burnham and Anderson’s (2003) Springer volume on multimodel inference and information theory. I was introduced to these ideas when I taught out of Jeff Long’s, Longitudinal Data Analysis for the Behavioral Sciences Using R. They remained with me for several years and after reading Anderson’s book, I am going to teach some of these ideas in our advanced methods course this spring.

Anyway…just some short thoughts to leave you with. Happy Holidays.

Notes and thoughts from JSM 2014: Student projects utilizing student-generated data

Another August, another JSM… This time we’re in Boston, in yet another huge and cold conference center. Even on the first (half) day the conference schedule was packed, and I found myself running between sessions to make the most of it all. This post is on the first session I caught, The statistical classroom: student projects utilizing student-generated data, where I listened to the first three talks before heading off to catch the tail end of another session (I’ll talk about that in another post).

Samuel Wilcock (Messiah College) talked about how while IRBs are not required for data collected by students for class projects, the discussion of ethics of data collection is still necessary. While IRBs are cumbersome, Wilcock suggests that as statistic teachers we ought to be aware of the process of real research and educating our students about the process. Next year he plans to have all of his students go through the IRB process and training, regardless of whether they choose to collect their own data or use existing data (mostly off the web). Wilcock mentioned that, over the years, he moved on from thinking that the IRB process is scary to thinking that it’s an important part of being a stats educator. I like this idea of discussing in the introductory statistics course issues surrounding data ethics and IRB (in a little more depth than I do now), though I’m not sure about requiring all 120 students in my intro course to go through the IRB process just yet. I hope to hear an update on this experiment next year from to see how it went.

Next, Shannon McClintock (Emory University) talked about a project inspired by being involved with the honor council of her university, when she realized that while the council keeps impeccable records of reported cases, they don’t have any information on cases that are not reported. So the idea of collecting student data on academic misconduct was born. A survey was designed, with input from the honor council, and Shannon’s students in her large (n > 200) introductory statistics course took the survey early on in the semester. The survey contains 46 questions which are used to generate 132 variables, providing ample opportunity for data cleaning, new variable creation (for example thinking about how to code “any” academic misconduct based on various questions that ask about whether a student has committed one type of misconduct or another), as well as thinking about discrepant responses. These are all important aspects of working with real data that students who are only exposed to clean textbook data may not get a chance practice. It’s my experience that students love working with data relevant to them (or, even better, about them), and data on personal or confidential information, so this dataset seem to hit both of those notes.

Using data from the survey, students were asked to analyze two academic outcomes: whether or not student has committed any form of academic misconduct and an outcome of own choosing, and presented their findings in n optional (some form of extra credit) research paper. One example that Shannon gave for the latter task was defining a “serious offender”: is it a student who commits a one time bad offense or a student who habitually commits (maybe nor so serious) misconduct? I especially like tasks like this where students first need to come up with their own question (informed by the data) and then use the same data to analyze it. As part of traditional hypothesis testing we always tell students that the hypotheses should not be driven by the data, but reminding them that research questions can indeed be driven by data is important.

As a parting comment Shannon mentioned that the administration at her school was concerned that students finding out about high percentages of academic offense (survey showed that about 60% of students committed a “major” academic offense) might make students think that it’s ok, or maybe even necessary, to commit academic misconduct to be more successful.

For those considering the feasibility of implementing a project like this, students reported spending on average 20 hours on the project over the course of a semester. This reminded me that I should really start collecting data on how much time my students spend on the two projects they work on in my course — it’s pretty useful information to share with future students as well as with colleagues.

The last talk I caught in this session was by Mary Gray and Emmanuel Addo (American University) on a project where students conducted an exit poll asking voters whether they encountered difficulty in voting, due to voter ID restrictions or for other reasons. They’re looking for expanding this project to states beyond Virginia, so if you’re interested in running a similar project at your school you can contact Emmanuel at addo@american.edu. They’re especially looking for participation from states with particularly strict voter ID laws, like Ohio. While it looks like lots of work (though the presenters assured us that it’s not), projects like these that can remind students that data and statistics can be powerful activism tools.

Fathom Returns

The other shoe has fallen.  Last week (or so) Tinkerplots returned to the market, and now Fathom Version 2.2 (which is the foundation on which Tinkerplots is built) is  available for a free download.  Details are available on Bill Finzer‘s website.

Fathom is one of my favorite softwares…the first commercially available package to be based on learning theory, Fathom’s primary goal is to teach statistics.  After a one-minute introduction, beginning students can quickly discuss ‘findings’ across several variables.  So many classroom exercises involve only one or two variables, and Fathom taught me  that this is unfair to students and artificially holds them back.

Welcome back, Fathom!

Tinkerplots Available Again

Very exciting news for Tinkerplots users (and for those who should be Tinkerplots users).  Tinkerplots is highly visual dynamic software that lets students design and implement simulation machines, and includes many very cool data analysis tools.

To quote from TP developer Cliff Konold:

Today we are releasing Version 2.2 of TinkerPlots.  This is a special, free version, which will expire in a year  — August 31, 2015.

To start the downloading process

Go to the TinkerPlots home page and click on the Download TinkerPlots link in the right hand panel. You’ll fill out a form. Shortly after submitting it, you’ll get an email with a link for downloading.

Help others find the TinkerPlots Download page

If you have a website, blog, or use a social media site, please help us get the word out so others can find the new TinkerPlots Download page. You could mention that you are using TinkerPlots 2.2 and link to www.srri.umass.edu/tinkerplots.

Why is this an expiring version?

As we explained in this correspondence, until January of 2014, TinkerPlots was published and sold by Key Curriculum, a division of McGraw Hill Education. Their decision to cease publication caught us off guard, and we have yet to come up with an alternative publishing plan. We created this special expiring version to meet the needs of users until we can get a new publishing plan in place.

What will happen after version 2.2 expires?

By August 2015, we will either have a new publisher lined up, or we will create another free version.  What is holding us up right now is our negotiations with the University of Massachusetts Amherst, who currently owns TinkerPlots.  Once they have decided about their future involvement with TinkerPlots, we can complete our discussions with various publishing partners.

If I have versions 2.0 or 2.1 should I delete them?

No, you should keep them. You already paid for these, and they are not substantively different from version 2.2. If and when a new version of TinkerPlots is ready for sale, you may not want to pay for it.  So keep your early version that you’ve already paid for.
Cliff and Craig

Lively R

Next week, the UseR conference comes to UCLA.  And in anticipation, I thought a little foreshadowing would be nice.  Amelia McNamara, UCLA Stats grad student and rising stats ed star, shared with me a new tool that has the potential to do some wonderful things.  LivelyR is a work-in-progress that is, in the words of its creators, a “mashup of R with packages of Rstudio.” The result is a highly interactive.  I was particularly struck by and intrigued by the ‘sweeping’ function, which visually smears graphics across several parameter values.  The demonstration shows how this can help understand the effects of bin-width and off-set changes on a histogram so that a more robust sense of the sample distribution shines through.

R is beginning to become a formidable educational tool, and I’m looking forward to learning more at UseR next week. For those of you in L.A. who can attend, Aron Lunzer will be talking about LivelyR at 4pm on Tuesday, July 1.

Blog Guilt and a Categorical Data Course

I read a blog post entitled On Not Writing and it felt a little close to home. The author, an academic who is in a non-tenure position, writes,

If you have the luxury to have time to write, do you write scholarship with the hope of forwarding an academic career, or do you write something you might find more fun, and hope to publish it another way?*

The footnote read, “Of course, all of this writing presupposes that the stacks of papers get graded.” Ouch. Too close to home. I sent this on to some of my non-tenure track peers and Rob responded that I had tapped into his blog guilt. My blog guilt had been at an all time high already, and so I vowed that I would immediately post something to Citizen Statistician. Well, that was several weeks ago, but I am finally posting.

Fall semester I taught a PhD seminar on categorical data analysis that I had proposed the previous spring. As with many first-time offerings, the amount of work was staggering and intellectually wonderful. The course notes, assignments, etc. are all available at the course website (which also doubled as the syllabus).

The course, like so many advanced seminars, had very few students actually take the course for a grade, but had quite a few auditors. The course projects were a blast to read and resulted in at least two pre-dissertation papers, a written prelim paper, and so far, two articles that have been submitted to journals!

After some reflection, there are some things I will do differently when I teach this again (likely an every-other-year offering):

  • I would like to spend more time on the classification methods. Although we talked about them a little, the beginning modeling took waaaay more time than I anticipated and I need to re-think that a bit.
  • I would like to cover mixed-effects models for binary outcomes in the future. This wasn’t possible this semester since we only had a regression course as the pre-requisite. Now, there is a new pre-requisite which includes linear mixed-effects models with continuous outcomes, so at least students will have been exposed to those types of models. This course also includes a much more in-depth introduction to likelihood, so that should also open up some time.
  • I will not teach the ordinal models in the future. Yuck. Disaster.
  • I probably won’t use the Agresti book in the future. While it is quite technical and comprehensive, it is expensive and the students did not like it for the course. I don’t know what I will use instead. Agresti will remain on a resources list.
  • The propensity score methods (PSM) were a hit with the students and those will be included again. I will also probably put together an assignment based on those.
  • I would like to add in survival analysis.

There are a ton of other topics that could be cool, but with limited time they probably aren’t feasible. I think in general my thought was to spend the first half of the course on introducing and using the logistic and multinomial models and the second half of the course on advanced applications (PSM, classification, etc.)

If anyone has any great ideas or suggestions, please leave comments. Also, I am always on the lookout for some datasets that were used in journal articles or are particularly relevant.