Are computers needed to teach Data Science?

One of the many nice things about summer is the time and space it allows for blogging. And, after a very stimulating SRTL conference (Statistics Reasoning, Teaching and Learning) in Rotorua, New Zealand, there’s lots to blog about.

Let’s begin with a provocative posting by fellow SRTL-er Tim Erickson at his excellent blog A Best Case Scenario.  I’ve known Tim for quite awhile, and have enjoyed many interesting and challenging discussions. Tim is a creator of curricula par excellence, and has first-hand experience in what inspires and motivates students to think deeply about statistics.

The central question here is: Is computation (on a computer) necessary for learning data science? The learners here are beginners in K-12. Tim answers no, and I answer, tentatively, yes. Tim portrays me in his blog as being a bit more steadfast on this position than I really am. In truth the answer is, some; maybe; a little; I don’t know.

My own experience in the topic comes from the Mobilize project  , in which we developed the course Introduction to Data Science for students in the Los Angeles Unified School District. (I’m pleased to say that the course is expanding. This summer, five new L.A.-area school districts will begin training teachers to teach this course. )

The course relies heavily on R via Rstudio. Students begin by studying the structure of data, learning to identify cases and variables and to organize unstructured data into a “tidy” format. Next, they learn to “read” tidy datafiles into Rstudio. The course ends with students learning some predictive modeling using Classification and Regression Trees. In between, they study some inference using randomization-based methods.

To be precise, the students don’t learn straight-up R. They work within a package developed by the Mobilize team (primarily James Molyneux, Amelia McNamara, Steve Nolen, Jeroen Ooms, and Hongsuda Tangmunarunkit) called mobilizR, which is based pretty heavily on the mosaic package developed by Randall Pruim, Danny Kaplan and Nick Horton.  The idea with these packages is to provide beginners to R with a unified syntax and a set of verbs that relate more directly to the analysts’ goals. The basic structure for (almost) all commands is

WhatIWantToDo(yvariable~xvariables, dataset)

For example, to see the average walking distance recorded by a fitbit by day of the week:

 > mean(Distance~DOW,data=fitbitdec)
 Friday Monday Saturday Sunday Thursday Tuesday Wednesday 1.900000 3.690000 2.020909 2.419091 1.432727 3.378182 3.644545

The idea is to provide students with a simplified syntax that “bridges the gap” between beginners of R and more advanced users. Hopefully, this frees up some of the cognitive load required to remember and employ R commands so that students can think strategically and statistically about problems they are trying to solve.

The “bridge the gap” terminology comes from Amelia McNamara, who used the term in her PhD dissertation. One of the many really useful ideas Amelia has given us is the notion that the gap needs to be bridged. Much of “traditional” statistics education holds to the idea that statistical concepts are primarily mathematical, and, for most people, it is sufficient to learn enough of the mathematical concepts so that they can react skeptically and critically to others’ analyses. What is exciting about data science in education is that students can do their own analyses. And if students are analyzing data and discovering on their own (instead of just trying to understand others’ findings), then we need to teach them to use software in such a way that they can transition to more professional practices.

And now, dear readers, we get to the heart of the matter. That gap is really hard to bridge. One reason is that we know little to nothing about the terrain. How do students learn coding when applied to data analysis? How does the technology they use mediate that experience? How can it enhance, rather than inhibit, understanding of statistical concepts and the ability to do data analysis intelligently?

In other words, what’s the learning trajectory?

Tim rightly points to CODAP, the Common Online Data Analysis Platform,  as one tool that might help bridge the gap by providing students with some powerful data manipulation techniques. And I recently learned about, which seems another attempt to help bridge the gap.  But Amelia’s point is that it is not enough to give students the ability to do something; you have to give it to them so that they are prepared to learn the next step. And if the end-point of a statistics education involves coding, then those intermediate steps need to be developing students’ coding skills, as well as their statistical thinking. It’s not sufficient to help studemts learn statistics. They must simultaneously learn computation.

So how do we get there? One important initial step, I believe, is to really examine what the term “computational thinking” means when we apply it to data analysis. And that will be the subject of an upcoming summer blog.

Theaster Gates, W.E.B. Du Bois, and Statistical Graphics

After reading this review of a Theaster Gates show at Regan Projects, in L.A., I hurried to see the show before it closed. Inspired by sociologist and civil rights activist W.E.B. Du Bois, Gates created artistic interpretations of statistical graphics that Du Bois had produced for an exhibition in Paris in 1900.  Coincidentally, I had just heard about these graphics the previous week at the Data Science Education Technology conference while evesdropping on a conversation Andy Zieffler was having with someone else.  What a pleasant surprise, then, when I learned, almost as soon as I got home, about this exhibit.

I’m no art critic ( but I know what I like), and I found these works to be beautiful, simple, and powerful.  What startled me, when I looked for the Du Bois originals, was how little Gates had changed the graphics. Here’s one work (I apologize for not knowing the title. That’s the difference between an occasional blogger and a journalist.)  It hints of Mondrian, and  the geometry intrigues. Up close, the colors are rich and textured.

Here’s Du Bois’s circa-1900 mosaic-type plot (from, which provides a nice overview of the exhibit for which Du Bois created his innovative graphics)

The title is “Negro business men in the United States”. The large yellow square is “Grocers” the blue square “Undertakers”, and the green square below it is “Publishers.  More are available at the Library of Congress.

Here’s another pair.  The Gates version raised many questions for me.  Why were the bars irregularly sized? What was the organizing principle behind the original? Were the categories sorted in an increasing order, and Gates added some irregularities for visual interest?  What variables are on the axes?

The answer is, no, Gates did not vary the lengths of the bars, only the color.

The vertical axis displays dates, ranging from 1874 to 1899 (just 1 year before Du Bois put the graphics together from a wide variety of sources).  The horizontal axis is acres of land, with values from 334,000 to 1.1 million.

The history of using data to support civil rights has a long history.   A colleague once remarked that there was a great unwritten book behind the story that data and statistical analysis played (and continue to play) in the gay civil rights movement (and perhaps it has been written?)  And the folks at We Quant LA have a nice article demonstrating some of the difficulties in using open data to ask questions about racial profiling by the LAPD. In this day and age of alternative facts and fake news, it’s wise to be careful and precise about what we can and cannot learn from data. And it is encouraging to see the role that art can play in keeping this dialogue alive.

Slack for managing course TAs

slackI meant to write this post last year when I was teaching a large course with lots of teaching assistants to manage, but, well, I was teaching a large course with lots of teaching assistants to manage, so I ran out of time…

There is nothing all that revolutionary here. People have been using Slack to manage teams for a while now. I’ve even come across some articles / posts on using Slack as a course discussion forum, so use of Slack in an educational setting is not all that new either. But I have not heard of people using Slack for organizing the course and managing TAs, so I figured it might be worthwhile to write about my experience.

TL;DR: A+, would do it again!

I’ll be honest, when I first found out about Slack, I wasn’t all that impressed. First, I kept thinking it’s called Slacker, and I was like, “hey, I’m no slacker!” (I totally am…). Second, I initially thought one had to use Slack in the browser, and accidentally kept closing the tab and hence missing messages. There is a Slack app that you can run on your computer or phone, it took me a while to realize that. Because of my rocky start with it, I didn’t think to use Slack in my teaching. I must credit my co-instructor, Anthea Monod, for the idea of using Slack for communicating with our TAs.

Between the two instructors we had 12 TAs to manage. We set up a Slack team for the course with channels like #labs, #problem sets, #office_hours, #meetings, etc.

This setup worked really well for us for a variety of reasons:

  • Keep course management related emails out of email inbox: These really add up. At this point, any email I can keep out of my inbox is a win in my book!
  • Easily keep all TAs in the loop: Need to announce a typo in a solution key? Or give TAs a heads up about questions they might expect in office hours? I used to handle these by emailing them all, and either I’d miss one or two or a TA responding to my email would forget to reply all (people never seem to reply all when they should, but they always do when they shouldn’t!)
  • Provide a space for TAs to easily communicate with each other: Our TAs used Slack to let others know they might need someone to cover for them for office hours, or teaching a section, etc. It was nice to be able to alert all of them at once, and also for everyone to see when someone responded saying they’re available to cover.
  • Keep a record of decisions made in an easily searchable space: Slack’s search is not great, but it’s better than my email’s for sure. Plus, since you’re searching only within that team’s communication, as opposed to through all your emails, it’s a lot easier to find what you’re looking for.
  • It’s fun: The #random channel was a place people shared funny tidbits or cool blog posts etc. I doubt the TAs would be emailing each other with these if this communication channel wasn’t there. It made them act more like a community than they would otherwise.
  • It’s free: At least for a reasonable amount of usage for a semester long course.

Some words of advice if you decide to use Slack for managing your own course:

  • There is a start-up cost: Not cost as in $$, but cost as in time… At the beginning of the semester you’ll need to make sure everyone gets in the team and sets up Slack on their devices. We did this during our first meeting, it was a lot more efficient than emailing reminders.
  • It takes time for people to break their emailing habits: For the first couple weeks TAs would still email me their questions instead of using Slack. It took some time and nudging, but eventually everyone shifted all course related communication to Slack.

If you’re teaching a course with TAs this semester, especially a large one with many people to manage, I strongly recommend giving Slack a try.

A timely first day of class example for Fall 2016: Trump Tweets

On the first day of an intro stats or intro data science course I enjoy giving some accessible real data examples, instead of spending the whole time going over the syllabus (which is necessary in my opinion, but somewhat boring nonetheless).

silver-feature-most-common-women-names3One of my favorite examples is How to Tell Someone’s Age When All You Know Is Her Name from FiveThirtyEight. As an added bonus, you can use this example to get to know some students’ names. I usually go through a few of the visualizations in this article, asking students to raise their hands if their name appears in the visualization. Sometimes I also supplement this with the Baby Name Voyager, it’s fun to have students offer up their names so we can take a look at how their popularity has changed over the years.


Another example I like is the Locals and Tourists Flickr Photos. If I remember correctly I saw this example first in Mark Hanson‘s class in grad school. These maps use data from geotags on Flickr: blue pictures are taken by locals, red pictures are by tourists, and yellow pictures might be by either. This one of Manhattan is one most students will recognize, and since many people know where Times Square and Central Park are, both of which have an abundance of red – tourist – pictures. And if your students watch enough Law & Order they might also know where Rikers Island is they might recognize that, unsurprisingly, no pictures are posted from that location.

makeHowever if I were teaching a class this coming Fall, I would add the following analysis of Donald Trump’s tweets to my list of examples. If you have not yet seen this analysis by David Robinson, I recommend you stop what you’re doing now and go read it. It’s linked below:

Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half

I’m not going to re-iterate the post here, but the gist of it is that the @realDonaldTrump account tweets from two different phones, and that

the Android and iPhone tweets are clearly from different people, posting during different times of day and using hashtags, links, and retweets in distinct ways. What’s more, we can see that the Android tweets are angrier and more negative, while the iPhone tweets tend to be benign announcements and pictures.


I think this post would be a fantastic and timely first day of class example for a stats / data analysis / data science course. It shows a pretty easy to follow analysis complete with the R code to reproduce it. It uses some sentiment analysis techniques that may not be the focus of an intro course, but since the context will be familiar to students it shouldn’t be too confusing for them. It also features techniques one will likely cover in an intro course, like confidence intervals.

As a bonus, many popular media outlets have covered the analysis in the last few days (e.g. see here, here, and here), and some of those articles might be even easier on the students to begin with before delving into the analysis in the blog post. Personally, I would start by playing this clip from the CTV News Channel featuring an interview with David to provide the context first (a video always helps wake students up), and then move on to discussing some of the visualizations from the blog post.

Michael Phelps’ hickies

Ok, they’re not hickies, but NPR referred to them as such, so I’m going with it… I’m talking about the cupping marks.

The NPR story can be heard (or read) here. There were two points made in this story that I think would be useful and fun to discuss in a stats course.

The first is the placebo effect. Often times in intro stats courses the placebo effect is mentioned as something undesirable that must be controlled for. This is true, but in this case the “placebo effect from cupping could work to reduce pain with or without an underlying physical benefit”. While there isn’t sufficient scientific evidence for the positive physical effect of cupping, the placebo effect might be just enough to give the edge to an individual olympian to outperform others by a small margin.

This brings me to my second point, the individual effect on extreme cases vs. a statistically significant effect on a population parameter. I briefly did a search on Google scholar for studies on the effectiveness of cupping and most use t-tests or ANOVAs to evaluate the effect on some average pain / severity of symptom score. If we can assume no adverse effect from cupping, might it still make sense for an individual to give the treatment a try even if the treatment has not been shown to statistically significantly improve average pain? I think this would be an interesting, and timely, question to discuss in class when introducing a method like the t-test. Often in tests of significance on a mean the variance of a treatment effect is viewed as a nuisance factor that is only useful for figuring out the variability of the sampling distribution of the mean, but in this case the variance of the treatment effect on individuals might also be of interest.

While my brief search didn’t result in any datasets on cupping, the following articles contain some summary statistics or citations to studies that report such statistics that one could bring into the classroom:

PS: I wanted to include a picture of these cupping marks on Michael Phelps, but I couldn’t easily find an image that was free to use or share. You can see a picture here.

PPS: Holy small sample sizes in some of the studies I came across!

How do Readers Perceive the Results of a Data Analysis?

As a statistician who often needs to explain methods and results of analyses to non-statisticians, I have been receptive to the influx of literature related to the use of storytelling or a data narrative. (I am also aware of the backlash related to use of the word “storytelling” in regards to scientific analysis, although I am less concerned about this than, say, these scholars.) As a teacher of data analysis, the use of narrative is especially poignant in that it ties the analyses performed intrinsically to the data context—or at the very least, to a logical flow of methods used.

I recently read an article posted on Brain Pickings about the psychology behind great stories. In the article, the author, Maria Popova,  cites Jerome Bruner’s (a pioneer of cognitive psychology) essay “Two Modes of Thought”:

There are two modes of cognitive functioning, two modes of thought, each providing distinctive ways of ordering experience, of constructing reality. The two (though complementary) are irreducible to one another. Efforts to reduce one mode to the other or to ignore one at the expense of the other inevitably fail to capture the rich diversity of thought.

Each of the ways of knowing, moreover, has operating principles of its own and its own criteria of well-formedness. They differ radically in their procedures for verification. A good story and a well-formed argument are different natural kinds. Both can be used as means for convincing another. Yet what they convince of is fundamentally different: arguments convince one of their truth, stories of their lifelikeness. The one verifies by eventual appeal to procedures for establishing formal and empirical proof. The other establishes not truth but verisimilitude.

The essence of his essay is that, as Popova states, “a story (allegedly true or allegedly fictional) is judged for its goodness as a story by criteria that are of a different kind from those used to judge a logical argument as adequate or correct.” [highlighting is mine]

What type of implications does this have on a data narrative, where, in principle, both criteria are being judged? Does one outweigh the other in a reader’s judgment? How does this affect reviewers when they are making decisions about publication?

My sense is that psychologically, the judgment of two differing sets of criteria will lead most humans to judge one as being more salient than the other. Presumably, most scientists would want the logical argument (or as Brunner calls it, the logico-scientific argument) to prevail in this case. However, I think it is the story that most readers, even those with scientific backgrounds, will tend to remember.

As for reviewers, again, the presumption is that they will evaluate a paper’s merit on its scientific evidence. But, as any reviewer can tell you, the writing and narrative presenting that evidence is in some ways as important for that evidence to be believable. This is why great courtroom attorneys  spend just as much time on developing the story around a case as marshaling a logical argument they will use to entice a jury.

There is little guidance for statisticians, especially nascent statisticians, about what the “right” degree of paradigmatic and logico-scientific argument should be when writing up data analysis. In fact, many of us (including myself) do not often consider the impact of a reader’s weighing of these different types of evidence. My training in graduate school was more focused on the latter type of writing, and it was only in undergraduate writing courses and through reading other authors’ thoughts about writing that the former is even relevant to me. Ultimately there may be not much to do about how reader’s perceive our work. Perhaps it is as Sylvia Plath wrote about poetry, once a poem is made available to the public, the right of interpretation belongs to the reader.”

“Mail merge” with RMarkdown

The term “mail merge” might not be familiar to those who have not worked in an office setting, but here is the Wikipedia definition:

Mail merge is a software operation describing the production of multiple (and potentially large numbers of) documents from a single template form and a structured data source. The letter may be sent out to many “recipients” with small changes, such as a change of address or a change in the greeting line.


The other day I was working on creating personalized handouts for a workshop. That is, each handout contained some standard text (including some R code) and some fields that were personalized for each participant (login information for our RStudio server). I wanted to do this in RMarkdown so that the R code on the handout could be formatted nicely. Googling “rmarkdown mail merge” didn’t yield much (that’s why I’m posting this), but I finally came across this tutorial which called the process “iterative reporting”.

Turns our this is a pretty straightforward task. Below is a very simple minimum working example. You can obviously make your markdown document a lot more complicated. I’m thinking holiday cards made in R…

All relevant files for this example can also be found here.

Input data: meeting_times.csv

This is a 20 x 2 csv file, an excerpt is shown below. I got the names from here.

name meeting_time
Peggy Kallas 9:00 AM
Ezra Zanders 9:15 AM
Hope Mogan 9:30 AM
Nathanael Scully 9:45 AM
Mayra Cowley 10:00 AM
Ethelene Oglesbee 10:15 AM

R script: mail_merge_script.R

## Packages

## Data
personalized_info <- read.csv(file = "meeting_times.csv")

## Loop
for (i in 1:nrow(personalized_info)){
 rmarkdown::render(input = "mail_merge_handout.Rmd",
 output_format = "pdf_document",
 output_file = paste("handout_", i, ".pdf", sep=''),
 output_dir = "handouts/")

RMarkdown: mail_merge_handout.Rmd

output: pdf_document

```{r echo=FALSE}
personalized_info <- read.csv("meeting_times.csv", stringsAsFactors = FALSE)
name <- personalized_info$name[i]
time <- personalized_info$meeting_time[i]

Dear `r name`,

Your meeting time is `r time`.

See you then!

Save the Rmd file and the R script in the same folder (or specify the path to the Rmd file accordingly in the R script), and then run the R script. This will call the Rmd file within the loop and output 20 PDF files to the handouts directory. Each of these files look something like this


with the name and date field being different in each one.

If you prefer HTML or Word output, you can specify this in the output_format argument in the R script.

Quantitatively Thinking

John Oliver said it best: April 15 combines Americans two most-hated things: taxes and math.  I’ve been thinking about the latter recently after hearing a fascinating talk last weekend about quantitative literacy.

QL is meant to describe our ability to think with, and about, numbers.  QL doesn’t include  high-level math skills, but usually is meant to describe  our ability to understand percentages and proportions and basic mathematical operations.This is a really important type of literacy, of course, but I fear that the QL movement could benefit from merging QL with SL–Statistical Literacy.

No surprise, that, coming from this blog.  But let me tell you why.  The speaker began by saying that many Americans can’t figure out, given the amount of gas in their tank, how many miles they have to drive before they run out of gas.

This dumbfounded me.  If it were literally true, you’d see stalled cars every few blocks in Los Angeles.  (Now we see them only every 3 or 4 miles.)  But I also thought, wait, do I know how far I can drive before I run out of gas?  My gas gauge says I have half a tank left, and I think (but am not certain) that my tank holds 16 gallons.  That means I probably have 8 gallons left.  I can see I’ve driven about 200 miles since I last filled up because I remembered to hit that little mileage reset button that keeps track of such things.  And so I’m averaging 25 mpg. But I’m also planning a trip to San Diego in the next couple of days, and then I’ll be driving on the highway, and so my mileage will improve.  And that 25 mpg is just an average, and averages have variability, but I don’t really have a sense of the variability of that mean.  And this problem requires that I know my mpg in the future, and, well, of all the things you can predict, the future is the hardest.  And so, I’m left to conclude that I don’t really know when my car will run out gas.

Now while I don’t know the exact number of miles I can drive, I can estimate the value.  With a little more data I can measure the uncertainty in this estimate, too, and use that to decide, when the tank gets low, if I should push my luck (or push my car).

And that example, I think, illustrates a problem with the QL movement.  The issue is not that Americans don’t know how to calculate how far they can drive before their car runs out of gas, but that they don’t know how to estimate how far they can drive. This is not just mincing words. The actual problem from which the initial startling claim was made was something like this: “Your car gets 25 mpg and you have 8 gallons left in your tank.  How far can you drive before you run out of gas?”  In real life, the answer is “It depends.”  This is a situation that every first-year stats student should recognize contains variability.   (For those of you whose car tries to tell you how many miles you have left in your tank, you’ve probably experienced that pleasing event when you begin your trip with, say, 87 miles left in your tank and end your trip 10 miles later with 88 miles left in your tank.  And so you know first hand the variability in this system.) The correct response to this question is to try to estimate the miles you can drive, and to recognize assumptions you must make to do this estimation.  Instead, we are meant to go into “math mode” and recognize this not as a life-skills problem but  a Dreaded Word Problem.  One sign that you are dealing with a DWP is that there are implicit assumptions that you’re just supposed to know, and you’re supposed to ignore your own experience and plow ahead so that you can get the “right” answer, as opposed to the true answer. (Which is: “it depends”).

A better problem would provide us with data.  Perhaps we would see the distances travelled on 8 gallons the last 10 trips.  Or perhaps on just 5 gallons and then would have to estimate how far we could go, on average, with 8 gallons.  And we should be asked to state our assumptions and to consider the consequences if those assumptions are wrong.  In short, we should be performing a modeling activity, and not a DWP.  Here’s an example:  On my last 5 trips, on 10 gallons of gas I drove 252, 184, 300, 355, 205 miles.  I have 10 gallons left, and I must drive 200 miles.  Do I need to fill up? Explain.**

The point is that one reason QL seems to be such a problem is not because we can’t think about numbers, but that the questions that have been used to conclude that we can’t think about numbers are not reflective of real-life problems.  Instead, these questions are reflective of the DWP culture.  I should emphasize that this is just one reason.  I’ve seen first hand that many students wrestle with proportions and basic number-sense.  This sort of question that comes up often in intro stats — “I am 5 inches taller than average.  One standard deviation is 3 inches.  How many standard deviations above average am I?”  –is a real stumper for many students, and this is sad because by the time they get to college this sort of thing should be answerable through habit, and not require thinking through for the very first time. (Interestingly, if you change the 5 to a 6 it becomes much easier for some, but not for all.)

And so, while trying to ponder the perplexities of finding your tax bracket, be consoled that a great number of others —who really knows how many others? — are feeling the same QL anxiety as you.  But for a good reason:  tax problems are perhaps the rare examples of  DWPs that actually matter.

**suggestions for improving this problem are welcome!

PD follow-up

Last Saturday the Mobilize project hosted a day-long professional development meeting for about 10 high school math teachers and 10 high school science teachers.  As always, it was very impressive how dedicated the teachers were, but I was particularly impressed by their creativity as, again and again, they demonstrated that they were able to take our lessons and add dimension to them that I, at least, didn’t initially see.

One important component of Mobilize is to teach the teachers statistical reasoning.  This is important because (a) the Mobilize content is mostly involved with using data analysis as a pathway for teaching math and science and (b) the Common Core (math) and the Next Generation (science) standards include much more statistics than previous curricula.  And yet, at least for math teachers, data analysis is not part of their education.

And so I was looking forward to seeing how the teachers performed on the “rank the airlines” Model Eliciting Activity, which was designed by the CATALYST project, led by Joan Garfield at U of Minnesota.  (Unit 2, Lesson 9 from the CATALYST web site.)  Model Eliciting Activities (MEA) are a lesson design which I’m getting really excited about, and trying to integrate into more of my own lessons.  Essentially, groups of students are given realistic and complex questions to answer.  The key is to provide some means for the student groups to evaluate their own work, so that they can iterate and achieve increasingly improved solutions.  MEAs began in the engineering-education world, and have been used increasingly in mathematics both at college and high school and middle school levels.  (A good starting point is “Model-eliciting activities (MEAs)  as a bridge between engineering education research and mathematics education research”, HamiIton, Lesh, Lester, Brilleslyper, 2008.  Advances in Engineering Education.) I was first introduced to MEAs when I was an evaluator for the CATALYST project, but didn’t really begin to see their potential until Joan Garfield pointed it out to me while I was trying to find ways of enhancing our Mobilize curriculum.

In the MEA we presented to the teachers on Saturday, they were shown data on arrival time delays from 5 airlines. Each airline had 10 randomly sampled flights into Chicago O’Hare from a particular year.  The primary purpose of the MEA is to help participants develop informal ways for comparing groups when variability is present.  In this case, the variability is present in an obvious way (different flights have different arrival delays) as well as less obvious ways (the data set is just one possible sample from a very large population, and there is sample-to-sample variability which is invisible. That is, you cannot see it in the data set, but might still use the data to conjecture about it.)

Before the PD I had wondered if the math and science teachers would approach the MEA differently.  Interestingly, during our debrief, one of the math teachers wondered the same thing.  I’m not sure if we saw truly meaningful differences, but here are some things we did see.

Most of the teams immediately hit on the idea of struggling to merge both the airline accuracy and the airline precision into their ranking.  However, only two teams presented rules that used both.  Interestingly, one used precision (variability) as the primary ranking and used accuracy (mean arrival delay) to break ties; another group did the opposite.

At least one team ranked only on precision, but developed a different measure of precision that was more relevant to the problem at hand:  the mean absolute deviations from 0 (rather than deviations from the mean).

One of the more interesting things that came to my attention, as a designer or curriculum, was that almost every team wrestled with what to do with outliers.  This made me realize that we do a lousy job of teaching people what to do with outliers, particularly since outliers are not very rare.   (One could argue whether, in fact, any of the observations in this MEA are outliers or not, but in order to engage in that argument you need a more sophisticated understanding of outliers than we develop in our students.  I, myself, would not have considered any of the observations to be outliers.)  For instance, I heard teams expressing concern that it wasn’t “fair” to penalize an airline that had a fairly good mean arrival time just because of one bad outlier.  Other groups wondered if the bad outliers were caused by weather delays and, if so, whether it was fair to include those data at all.   I was very pleased that no one proposed an outright elimination of outliers. (At least within my hearing.)  But my concern was that they didn’t seem to have constructive ways of thinking about outliers.

The fact that teachers don’t have a way of thinking about outliers is our fault.  I think this MEA did a great job of exposing the participants to a situation in which we really had to think about the effect of outliers in a context where they were not obvious data-entry errors.  But I wonder how we can develop more such experiences, so that teachers and students don’t fall into procedural-based, automated thinking.  (e.g. “If it is more than 1.5 times the IQR away from the median, it is an outlier and should be deleted.”  I have heard/read/seen this far too often.)

Do you have a lesson that engages students in wrestling with outliers? If so, please share!

Model Eliciting Activity: Prologue

I’m very excited/curious about tomorrow: I’m going to lead about 40 math and science teachers in a data-analysis activities, using one of the Model Eliciting Activities from the University of Minnesota Catalysts for Change Project. (One of our bloggers, Andy, was part of this project.) Specifically, we’re giving them the arrival-delay times for five different airlines into Chicago O’Hare. A random sample of 10 from each airline, and asking them to come up with rules for ranking the airlines from best to worst.

I’m curious to see what they come up with, particularly whether  the math teachers differ terribly from the science teachers. The math teachers are further along in our weekend professional development program than are the science teachers, and so I’m hoping they’ll identify the key characteristics of a distribution (all together: center, spread, shape; well, shape doesn’t play much of a role here) and use these to formulate their rankings. We’ve worked hard on helping them see distributions as a unit, and not a collection of individual points, and have seen big improvements in the teachers, most of whom have not taught statistics before.

The science teachers, I suspect, will be a little bit more deterministic in their reasoning, and, if true to my naive stereotype of science teachers, will try to find explanations for individual points. Since I haven’t worked as much with the science teachers, I’m curious to see if they’ll see the distribution as a whole, or instead try to do point-by-point comparisons.

When we initially started this project, we had some informal ideas that the science teachers would take more naturally to data analysis than would the math teachers. This hasn’t turned out to be entirely true. Many of the math teachers had taught statistics before, and so had some experience. Those who hadn’t, though, tended to be rather procedurally oriented. For example, they often just automatically dropped outliers from their analysis without any thought at all, just because they thought that that was the rule. (This has been a very hard habit to break.)

The math teachers also had a very rigid view of what was and was not data. The science teachers, on the other hand, had a much more flexible view of data. In a discussion about whether photos from a smart phone were data, a majority of math teachers said no and a majority of science teachers said yes. On the other hand, the science teachers tend to use data to confirm what they already know to be true, rather than use it to discover something. This isn’t such a problem with the math teachers, in part because they don’t have preconceptions of the data and so have nothing to confirm. In fact, we’ve worked hard with the math teachers, and with the science teachers, to help them approach a data set with questions in mind. But it’s been a challenge teaching them to phrase questions for their students in which the answers aren’t pre-determined or obvious, and which are empirically oriented. (For example: We would like them to ask something like “what activities most often led to our throwing away redcycling into the trash bin?” rather than “Is it wrong to throw trash into the recycling bin?” or “Do people throw trash into the recycling bin?”)

So I’ll report back soon on what happened and how it went.