Data Science Webinar Announcement

I’m pleased to announce that on Monday, September 11 , 9-11 am Pacific, I’ll be leading a Concord Consortium Data Science Education Webinar. Oddly, I forgot to give it a title, but it would be something like “Towards a Learning Trajectory for K-12 Data Science”. This webinar, like all Concord webinars, is intended to be highly interactive. Participants should have their favorite statistical software at the ready. A detailed abstract as well as registration information is here
https://www.eventbrite.com/e/data-science-education-webinar-rob-gould-tickets-35216886656

At the same site you can view recent wonderful webinars by Cliff Konold, Hollylynne Lee and Tim Erickson.

Structuring Data in Middle School

Of the many provocative and exciting discussions at this year’s Statistics Research Teaching and Learning conference in Rotarua, NZ, one that has stuck in my mind is from Lucia Zapata-Cardona, from the Universidad de Antioquia in Columbia. Lucia discussed data from her classroom observations of a teacher at a middle school (ages 12-13) in a “Northwest Columbian city”. The class was exciting for many reasons, but the reason that I want to write about it here is because of the fact that the teacher had the students structure and store their own data.

The classroom was remarkable – to my American eyes – for the large number of students (45) and for the noise (walls were thin, the playground was immediately outside, and windows were kept open because of the heat.) Despite this, the teacher led an inquiry-based discussion, skillfully prompting the students with questions from the back of the classroom. The discussion lasted over several days.

The students had collected data about the nutritional content of the foods they eat. Challenging students with real-world, meaningful problems is an important part of Prof. Zapata-Cardona’s research, since an important goal of education is to tie the world of the classroom to the real world. Lucia was interested in examining how (and whether) the students constructed and employed statistical models to reason with the data. (Modeling was the theme of this SRTL.) What fascinated me wasn’t the modeling, but the role that the structure of the data played in the students’ reasoning.

Students were asked to collect data on the food contained in their lunchboxes so that they could answer the statistical question “How nutritious is the food we bring to school in our lunchbox?” It’s important to note that in Columbia, as Lucia explained to us, the “lunch box” doesn’t contain actual lunch (which the students eat at home), but instead includes snacks for during the day. What interested me was that the teacher let the class, after discussion, decide how they would enter and organize the data. Now I’m not sure what parameters/options the students were given. I do know that the classroom had one computer, and students took turns entering the data into this computer. And I know that the students discussed which variables they wanted to store, and how they wanted to store them.

The pivotal decision here was that the students decided that each row would represent a food, for example, Chicle. They decided to record information about serving size, calories, fats, carbs, protein, sodium, sugars, whether it was “processed” (5 g, 18, 0, 5, 0, 0, 0 and si, in case you were curious). They decided not to store information about how many students brought this food, or how many servings any individual student brought.

At this point, you may have realized that their statistical question is rather difficult, if not impossible, to answer given the format in which they stored the data. Had each case of the data been an individual lunchbox or an individual person, then the students might have made headway. Instead, they stumbled over issues about how to compare the total calories of the dataset with the total calories eaten by individuals. (After much discussion, most of the class “discovered” that the average amount was a good way of summarizing the data, but some of the more perceptive students pointed out that it wasn’t clear what the average really meant.)

Lucia’s forthcoming paper will go into the details about the good and the bad in the students’ statistical reasoning, and the ways in which they used (or failed to use) statistical models. But what was fascinating to me was the opportunity this provided for helping students understand how the structure of data affects the questions that we can ask, and how the questions we ask should first consider the structure of the data.

Too often, particularly in textbooks, there is no opportunity to reason about the structure of data. When a question is asked, the students are given appropriate data, and rarely allowed even to decide which variables to consider (since the provided data usually includes only the necessary variables), much less whether or not the data should be restructured or re-collected.

Another reason classrooms have avoided letting students structure their own data is that many real-life datasets have complicated structures. The data these students collected is really (or should have been) hierarchical. If the case is the lunchbox, a lunchbox is associated with a student and possibly with more than 1 item. If data are collected on multiple days, then there is nesting within days as well as the potential for missing variables or unequal record lengths.

Data with such a complicated structure are simply not taught in middle schools, even though, as Lucia’s case study demonstrates, they arise easily from familiar contexts.   These data are messy and complicated. Should we even open this pandora’s box for middle school students, or should it wait until they are older? Is it enough to work with the simplified “flat” format such as the one these students came up with, and just modify the statistical question? Should students be taught how to manipulate such data into different formats to answer the questions they are interested in?

You might think hierarchical formats are beyond the middle school level, but work done by Cliff Konold and Bill Finzer, in the context of using the CODAP tool, suggests that it is possible. [I can’t find an online paper to link to for this result, but there are some leads here, and I’m told it has been approved for publication so should appear soon.]

So the question is: when do we teach students to reason with hierarchical data? When do we teach students to recognize that data can be stored in different formats? When do we teach students to convert data from one format to another?

We are back to the question I asked in my last blog: what’s the learning trajectory that takes statistical beginners and teaches them the computational and statistical tools to allow them to address fundamental questions that rely on data that, on the one hand, are complex but on the other hand are found in our day-to-day lives?

Are computers needed to teach Data Science?

One of the many nice things about summer is the time and space it allows for blogging. And, after a very stimulating SRTL conference (Statistics Reasoning, Teaching and Learning) in Rotorua, New Zealand, there’s lots to blog about.

Let’s begin with a provocative posting by fellow SRTL-er Tim Erickson at his excellent blog A Best Case Scenario.  I’ve known Tim for quite awhile, and have enjoyed many interesting and challenging discussions. Tim is a creator of curricula par excellence, and has first-hand experience in what inspires and motivates students to think deeply about statistics.

The central question here is: Is computation (on a computer) necessary for learning data science? The learners here are beginners in K-12. Tim answers no, and I answer, tentatively, yes. Tim portrays me in his blog as being a bit more steadfast on this position than I really am. In truth the answer is, some; maybe; a little; I don’t know.

My own experience in the topic comes from the Mobilize project  , in which we developed the course Introduction to Data Science for students in the Los Angeles Unified School District. (I’m pleased to say that the course is expanding. This summer, five new L.A.-area school districts will begin training teachers to teach this course. )

The course relies heavily on R via Rstudio. Students begin by studying the structure of data, learning to identify cases and variables and to organize unstructured data into a “tidy” format. Next, they learn to “read” tidy datafiles into Rstudio. The course ends with students learning some predictive modeling using Classification and Regression Trees. In between, they study some inference using randomization-based methods.

To be precise, the students don’t learn straight-up R. They work within a package developed by the Mobilize team (primarily James Molyneux, Amelia McNamara, Steve Nolen, Jeroen Ooms, and Hongsuda Tangmunarunkit) called mobilizR, which is based pretty heavily on the mosaic package developed by Randall Pruim, Danny Kaplan and Nick Horton.  The idea with these packages is to provide beginners to R with a unified syntax and a set of verbs that relate more directly to the analysts’ goals. The basic structure for (almost) all commands is

WhatIWantToDo(yvariable~xvariables, dataset)

For example, to see the average walking distance recorded by a fitbit by day of the week:

 > mean(Distance~DOW,data=fitbitdec)
 Friday Monday Saturday Sunday Thursday Tuesday Wednesday 1.900000 3.690000 2.020909 2.419091 1.432727 3.378182 3.644545

The idea is to provide students with a simplified syntax that “bridges the gap” between beginners of R and more advanced users. Hopefully, this frees up some of the cognitive load required to remember and employ R commands so that students can think strategically and statistically about problems they are trying to solve.

The “bridge the gap” terminology comes from Amelia McNamara, who used the term in her PhD dissertation. One of the many really useful ideas Amelia has given us is the notion that the gap needs to be bridged. Much of “traditional” statistics education holds to the idea that statistical concepts are primarily mathematical, and, for most people, it is sufficient to learn enough of the mathematical concepts so that they can react skeptically and critically to others’ analyses. What is exciting about data science in education is that students can do their own analyses. And if students are analyzing data and discovering on their own (instead of just trying to understand others’ findings), then we need to teach them to use software in such a way that they can transition to more professional practices.

And now, dear readers, we get to the heart of the matter. That gap is really hard to bridge. One reason is that we know little to nothing about the terrain. How do students learn coding when applied to data analysis? How does the technology they use mediate that experience? How can it enhance, rather than inhibit, understanding of statistical concepts and the ability to do data analysis intelligently?

In other words, what’s the learning trajectory?

Tim rightly points to CODAP, the Common Online Data Analysis Platform,  as one tool that might help bridge the gap by providing students with some powerful data manipulation techniques. And I recently learned about data.world, which seems another attempt to help bridge the gap.  But Amelia’s point is that it is not enough to give students the ability to do something; you have to give it to them so that they are prepared to learn the next step. And if the end-point of a statistics education involves coding, then those intermediate steps need to be developing students’ coding skills, as well as their statistical thinking. It’s not sufficient to help studemts learn statistics. They must simultaneously learn computation.

So how do we get there? One important initial step, I believe, is to really examine what the term “computational thinking” means when we apply it to data analysis. And that will be the subject of an upcoming summer blog.

Theaster Gates, W.E.B. Du Bois, and Statistical Graphics

After reading this review of a Theaster Gates show at Regan Projects, in L.A., I hurried to see the show before it closed. Inspired by sociologist and civil rights activist W.E.B. Du Bois, Gates created artistic interpretations of statistical graphics that Du Bois had produced for an exhibition in Paris in 1900.  Coincidentally, I had just heard about these graphics the previous week at the Data Science Education Technology conference while evesdropping on a conversation Andy Zieffler was having with someone else.  What a pleasant surprise, then, when I learned, almost as soon as I got home, about this exhibit.

I’m no art critic ( but I know what I like), and I found these works to be beautiful, simple, and powerful.  What startled me, when I looked for the Du Bois originals, was how little Gates had changed the graphics. Here’s one work (I apologize for not knowing the title. That’s the difference between an occasional blogger and a journalist.)  It hints of Mondrian, and  the geometry intrigues. Up close, the colors are rich and textured.

Here’s Du Bois’s circa-1900 mosaic-type plot (from http://www.openculture.com/2016/09/w-e-b-du-bois-creates-revolutionary-artistic-data-visualizations-showing-the-economic-plight-of-african-americans-1900.html, which provides a nice overview of the exhibit for which Du Bois created his innovative graphics)

The title is “Negro business men in the United States”. The large yellow square is “Grocers” the blue square “Undertakers”, and the green square below it is “Publishers.  More are available at the Library of Congress.

Here’s another pair.  The Gates version raised many questions for me.  Why were the bars irregularly sized? What was the organizing principle behind the original? Were the categories sorted in an increasing order, and Gates added some irregularities for visual interest?  What variables are on the axes?

The answer is, no, Gates did not vary the lengths of the bars, only the color.

The vertical axis displays dates, ranging from 1874 to 1899 (just 1 year before Du Bois put the graphics together from a wide variety of sources).  The horizontal axis is acres of land, with values from 334,000 to 1.1 million.

The history of using data to support civil rights has a long history.   A colleague once remarked that there was a great unwritten book behind the story that data and statistical analysis played (and continue to play) in the gay civil rights movement (and perhaps it has been written?)  And the folks at We Quant LA have a nice article demonstrating some of the difficulties in using open data to ask questions about racial profiling by the LAPD. In this day and age of alternative facts and fake news, it’s wise to be careful and precise about what we can and cannot learn from data. And it is encouraging to see the role that art can play in keeping this dialogue alive.

The African Data Initiative

Are you looking for a way to celebrate World Statistics Day? I know you are. And I can’t think of a better way than supporting the African Data Initiative (ADI).

I’m proud to have met some of the statisticians, statisticis educators and researchers who are leading this initative at an International Association of Statistics Educators Roundtable workshop in Cebu, The Phillipines, in 2012. You can read about Roger and David’s Stern’s projects in Kenya here in the journal Technology Innovations in Statistics Education. This group — represented at the workshop by father-and-son Roger and David, and at-the-time grad students Zacharaiah Mbasu and James Musyoka — impressed me with their determination to improve international statistical literacy and  with their successful and creative pragmatic implementations to adjust to the needs of the local situations in Kenya.

The ADI is seeking funds within the next 18 days to adapt two existing software packages, R and Instat+ so that there is a free, open-source, easy-to-learn statistical software package available and accessible throughout the world. While R is free and open-sourced, it is not easy to learn (particularly in areas where English literacy is low). Instat+ is, they claim, easy to learn but not open-source (and also does not run on Linux or Mac).

One of the exciting things about this project is that these solutions to statistical literacy are being developed by Africans working and researching in Africa, and are not ‘imported’ by groups or corporations with little experience implementing in the local schools. One lesson I’ve learned from my experience working with the Los Angeles Unified School District is that you must work closely with the schools for which you are developing curricula; outsider efforts have a lower chance of success. I hope you’ll take a moment –in the next 18 days–to become acquainted with this worthy project!

World Statistics Day is October 20.  The theme is Better Data. Better Lives.

Data News: Fitbit + iHealth, and Open Justice data

The LA Times reported today, along with several other sources, that the California Department of Justice has initiated a new “open justice” data initiative.  On their portal, the “Justice Dashboard“, you can view Arrest Rates, Deaths in Custody, or Law Enforcement Officers Killed or Assaulted.

I chose, for my first visit, to look at Deaths in Custody.  At first, I was disappointed with the quality of the data provided.  Instead of data, you see some nice graphical displays, mostly univariate but a few with two variables, addressing issues and questions that are probably on many people’s minds.  (Alarmingly, the second most common cause of death for people in custody is homicide by a law enforcement officer.)

However, if you scroll to the bottom, you’ll see that you can, in fact, download relatively raw data, in the form of a spreadsheet in which each row is a person in custody who died.  Variables include date of birth and death, gender, race, custody status, offense, reporting agency, and many other variables.  Altogether, there are 38 variables and over 15000 observations. The data set comes with a nice codebook, too.

FitBit vs. the iPhone

Onto a cheerier topic. This quarter I will be teaching regression, and once again my FitBit provided inspiration.  If you teach regression, you know one of the awful secrets of statistics: there are no linear associations. Well, they are few and far between.  And so I was pleased when a potentially linear association sprang to mind:  how well do FitBit step counts predict the Health app counts?

Health app is an ios8 app. It was automatically installed on your iPhone, whether you wanted it or not.  (I speak from the perspective of an iPhone 6 user, with ios8 installed.) Apparently, whether you know it or not, your steps are being counted.  If you have an Apple Watch, you know about this.  But if you don’t, it happens invisibly, until you open the app. Or buy the watch.

How can you access these data?  I did so by downloading the free app QS (for “Quantified Self”). The Quantified Self people have a quantified self website directing you to hundreds of apps you can use to learn more about yourself than you probably should.  Once installed, you simply open the app, choose which variables you wish to download, click ‘submit’, and a csv filed is emailed to you (or whomever you wish).

The FitBit data can only be downloaded if you have a premium account.  The FitBit premium website has a ‘custom option’ that allows you to download data for any time period you choose, but currently, due to an acknowledged bug, no matter which dates you select, only one month of data will be downloaded. Thus, you must download month by month.  I downloaded only two months, July and August, and at some point in August my FitBit went through the wash cycle, and then I misplaced it.  It’s around here, somewhere, I know. I just don’t know where.  For these reasons, the data are somewhat sparse.

I won’t bore you with details, but by applying functions from the lubridate package in R and using the gsub function to remove commas (because FitBit inexplicably inserts commas into its numbers and, I almost forgot, adds a superfluous title to the document which requires that you use the “skip =1” option in read.table), it was easy to merge a couple of months of FitBit with Health data.  And so here’s how they compare:

fitbitsteps

The regression line is Predicted.iOS.Steps = 1192 + 0.9553 (FitBit.Steps), r-squared is .9223.  (A residual plot shows that the relationship is not quite as linear as it looks. Damn.)

Questions I’m thinking of posing on the first day of my regression class this quarter:

  1. Which do you think is a more reliable counter of steps?
  2. How closely in agreement are these two step-counting tools? How would you measure this?
  3. What do the slope and intercept tell us?
  4. Why is there more variability for low fit-bit step counts than for high?
  5. I often lose my FitBit. Currently, for instance, I have no idea where it is.  On those days, FitBit reports “0 steps”. (I removed the 0’s from this analysis.)  Can I use this regression line to predict the values on days I lose my FitBit?  With how much precision?

I think it will be helpful to talk about these questions informally, on the first day, before they have learned more formal methods for tackling these.  And maybe I’ll add a few more months of data.

Quantitatively Thinking

John Oliver said it best: April 15 combines Americans two most-hated things: taxes and math.  I’ve been thinking about the latter recently after hearing a fascinating talk last weekend about quantitative literacy.

QL is meant to describe our ability to think with, and about, numbers.  QL doesn’t include  high-level math skills, but usually is meant to describe  our ability to understand percentages and proportions and basic mathematical operations.This is a really important type of literacy, of course, but I fear that the QL movement could benefit from merging QL with SL–Statistical Literacy.

No surprise, that, coming from this blog.  But let me tell you why.  The speaker began by saying that many Americans can’t figure out, given the amount of gas in their tank, how many miles they have to drive before they run out of gas.

This dumbfounded me.  If it were literally true, you’d see stalled cars every few blocks in Los Angeles.  (Now we see them only every 3 or 4 miles.)  But I also thought, wait, do I know how far I can drive before I run out of gas?  My gas gauge says I have half a tank left, and I think (but am not certain) that my tank holds 16 gallons.  That means I probably have 8 gallons left.  I can see I’ve driven about 200 miles since I last filled up because I remembered to hit that little mileage reset button that keeps track of such things.  And so I’m averaging 25 mpg. But I’m also planning a trip to San Diego in the next couple of days, and then I’ll be driving on the highway, and so my mileage will improve.  And that 25 mpg is just an average, and averages have variability, but I don’t really have a sense of the variability of that mean.  And this problem requires that I know my mpg in the future, and, well, of all the things you can predict, the future is the hardest.  And so, I’m left to conclude that I don’t really know when my car will run out gas.

Now while I don’t know the exact number of miles I can drive, I can estimate the value.  With a little more data I can measure the uncertainty in this estimate, too, and use that to decide, when the tank gets low, if I should push my luck (or push my car).

And that example, I think, illustrates a problem with the QL movement.  The issue is not that Americans don’t know how to calculate how far they can drive before their car runs out of gas, but that they don’t know how to estimate how far they can drive. This is not just mincing words. The actual problem from which the initial startling claim was made was something like this: “Your car gets 25 mpg and you have 8 gallons left in your tank.  How far can you drive before you run out of gas?”  In real life, the answer is “It depends.”  This is a situation that every first-year stats student should recognize contains variability.   (For those of you whose car tries to tell you how many miles you have left in your tank, you’ve probably experienced that pleasing event when you begin your trip with, say, 87 miles left in your tank and end your trip 10 miles later with 88 miles left in your tank.  And so you know first hand the variability in this system.) The correct response to this question is to try to estimate the miles you can drive, and to recognize assumptions you must make to do this estimation.  Instead, we are meant to go into “math mode” and recognize this not as a life-skills problem but  a Dreaded Word Problem.  One sign that you are dealing with a DWP is that there are implicit assumptions that you’re just supposed to know, and you’re supposed to ignore your own experience and plow ahead so that you can get the “right” answer, as opposed to the true answer. (Which is: “it depends”).

A better problem would provide us with data.  Perhaps we would see the distances travelled on 8 gallons the last 10 trips.  Or perhaps on just 5 gallons and then would have to estimate how far we could go, on average, with 8 gallons.  And we should be asked to state our assumptions and to consider the consequences if those assumptions are wrong.  In short, we should be performing a modeling activity, and not a DWP.  Here’s an example:  On my last 5 trips, on 10 gallons of gas I drove 252, 184, 300, 355, 205 miles.  I have 10 gallons left, and I must drive 200 miles.  Do I need to fill up? Explain.**

The point is that one reason QL seems to be such a problem is not because we can’t think about numbers, but that the questions that have been used to conclude that we can’t think about numbers are not reflective of real-life problems.  Instead, these questions are reflective of the DWP culture.  I should emphasize that this is just one reason.  I’ve seen first hand that many students wrestle with proportions and basic number-sense.  This sort of question that comes up often in intro stats — “I am 5 inches taller than average.  One standard deviation is 3 inches.  How many standard deviations above average am I?”  –is a real stumper for many students, and this is sad because by the time they get to college this sort of thing should be answerable through habit, and not require thinking through for the very first time. (Interestingly, if you change the 5 to a 6 it becomes much easier for some, but not for all.)

And so, while trying to ponder the perplexities of finding your tax bracket, be consoled that a great number of others —who really knows how many others? — are feeling the same QL anxiety as you.  But for a good reason:  tax problems are perhaps the rare examples of  DWPs that actually matter.

**suggestions for improving this problem are welcome!

Interpreting Cause and Effect

One big challenge we all face is understanding what’s good and what’s bad for us.  And it’s harder when published research studies conflict. And so thanks to Roger Peng for posting on his Facebook page an article that led me to this article by Emily Oster:  Cellphones Do Not Give You Brain Cancer, from the good folks at the 538 blog. I think this article would make a great classroom discussion, particularly if, before showing your students the article, they themselves brainstormed several possible experimental designs and discussed strengths and weaknesses of the designs. I think it is also interesting to ask why no study similar to the Danish Cohort study was done in the US.  Thinking about this might lead students to think about cultural attitudes towards wide-spread data collection.

PD follow-up

Last Saturday the Mobilize project hosted a day-long professional development meeting for about 10 high school math teachers and 10 high school science teachers.  As always, it was very impressive how dedicated the teachers were, but I was particularly impressed by their creativity as, again and again, they demonstrated that they were able to take our lessons and add dimension to them that I, at least, didn’t initially see.

One important component of Mobilize is to teach the teachers statistical reasoning.  This is important because (a) the Mobilize content is mostly involved with using data analysis as a pathway for teaching math and science and (b) the Common Core (math) and the Next Generation (science) standards include much more statistics than previous curricula.  And yet, at least for math teachers, data analysis is not part of their education.

And so I was looking forward to seeing how the teachers performed on the “rank the airlines” Model Eliciting Activity, which was designed by the CATALYST project, led by Joan Garfield at U of Minnesota.  (Unit 2, Lesson 9 from the CATALYST web site.)  Model Eliciting Activities (MEA) are a lesson design which I’m getting really excited about, and trying to integrate into more of my own lessons.  Essentially, groups of students are given realistic and complex questions to answer.  The key is to provide some means for the student groups to evaluate their own work, so that they can iterate and achieve increasingly improved solutions.  MEAs began in the engineering-education world, and have been used increasingly in mathematics both at college and high school and middle school levels.  (A good starting point is “Model-eliciting activities (MEAs)  as a bridge between engineering education research and mathematics education research”, HamiIton, Lesh, Lester, Brilleslyper, 2008.  Advances in Engineering Education.) I was first introduced to MEAs when I was an evaluator for the CATALYST project, but didn’t really begin to see their potential until Joan Garfield pointed it out to me while I was trying to find ways of enhancing our Mobilize curriculum.

In the MEA we presented to the teachers on Saturday, they were shown data on arrival time delays from 5 airlines. Each airline had 10 randomly sampled flights into Chicago O’Hare from a particular year.  The primary purpose of the MEA is to help participants develop informal ways for comparing groups when variability is present.  In this case, the variability is present in an obvious way (different flights have different arrival delays) as well as less obvious ways (the data set is just one possible sample from a very large population, and there is sample-to-sample variability which is invisible. That is, you cannot see it in the data set, but might still use the data to conjecture about it.)

Before the PD I had wondered if the math and science teachers would approach the MEA differently.  Interestingly, during our debrief, one of the math teachers wondered the same thing.  I’m not sure if we saw truly meaningful differences, but here are some things we did see.

Most of the teams immediately hit on the idea of struggling to merge both the airline accuracy and the airline precision into their ranking.  However, only two teams presented rules that used both.  Interestingly, one used precision (variability) as the primary ranking and used accuracy (mean arrival delay) to break ties; another group did the opposite.

At least one team ranked only on precision, but developed a different measure of precision that was more relevant to the problem at hand:  the mean absolute deviations from 0 (rather than deviations from the mean).

One of the more interesting things that came to my attention, as a designer or curriculum, was that almost every team wrestled with what to do with outliers.  This made me realize that we do a lousy job of teaching people what to do with outliers, particularly since outliers are not very rare.   (One could argue whether, in fact, any of the observations in this MEA are outliers or not, but in order to engage in that argument you need a more sophisticated understanding of outliers than we develop in our students.  I, myself, would not have considered any of the observations to be outliers.)  For instance, I heard teams expressing concern that it wasn’t “fair” to penalize an airline that had a fairly good mean arrival time just because of one bad outlier.  Other groups wondered if the bad outliers were caused by weather delays and, if so, whether it was fair to include those data at all.   I was very pleased that no one proposed an outright elimination of outliers. (At least within my hearing.)  But my concern was that they didn’t seem to have constructive ways of thinking about outliers.

The fact that teachers don’t have a way of thinking about outliers is our fault.  I think this MEA did a great job of exposing the participants to a situation in which we really had to think about the effect of outliers in a context where they were not obvious data-entry errors.  But I wonder how we can develop more such experiences, so that teachers and students don’t fall into procedural-based, automated thinking.  (e.g. “If it is more than 1.5 times the IQR away from the median, it is an outlier and should be deleted.”  I have heard/read/seen this far too often.)

Do you have a lesson that engages students in wrestling with outliers? If so, please share!