The Mobilize project, which I recently joined, centers a high school data-science curriculum around participatory sensing data. What is participatory sensing, you ask?
I’ve recently been trying to answer this question, with mixed success. As the name suggests, PS data has to do with data collected from sensors, and so it has a streaming aspect to it. I like to think of it as observations on a living object. Like all living objects, whatever this thing is that’s being observed, it changes, sometimes slowly, sometimes rapidly. The ‘participatory’ means that it takes more than one person to measure it. (But I’m wondering if you would allow ‘participatory’ to mean that the student participates in her own measurements/collection?) Initially, in Mobilize, PS meant specially equipped smart-phones to serve as sensors. Students could snap pictures of snack-wrappers, or record their mood at a given moment, or record the mood of their snack food. A problem with relying on phones is that, as it turns out, teenagers aren’t always that good with expensive equipment. And there’s an equity issue, because what some people consider a common household item, others consider rare and precious. And smart-phones, although growing in prevalence, are still not universally adopted by high school students, or even college students.
If we ditch the gadgetry, any human being can serve as a sensor. Asking a student to pause at a certain time of day to record, say, the noise level, or the temperature, or their frame of mind, or their level of hunger, is asking that student to be a sensor. If we can teach the student how to find something in the accumulated data about her life that she didn’t know, and something that she finds useful, then she’s more likely to develop what I heard Bill Finzer call a “data habit of mind”. She’ll turn to data next time she has a question or problem, too.
Nothing in this process is trivial. Recording data on paper is one thing: but recording it in a data file requires teaching students about flat-files (which, again something I’ve learned from Bill, is not necessarily intuitive), and teaching students about delimiters between variables, and teaching them, basically, how to share so that someone else can upload and use their data. Many of my intro-stats college students don’t know how to upload a data file into the computer, so that I now teach it explicitly, with high, but not perfect, rates of success. And that’s the easy part. How do we help them learn something of value about themselves or their world?
I’m open to suggestions here. Please. One step seems to be to point them towards a larger context in which to make sense of their data. This larger context could be a social network, or a community, or larger datasets collected on large populations. And so students might need to learn how to compare their (rather paltry by comparison) data stream to a large national database (which will be more of a snapshot/panel approach, rather than a data-stream). Or they will need to learn to merge their data with their classmates, and learn about looking for signals among variation, and comparing groups.
This is scary stuff. Traditionally, we teach students how to make sense of our data. And this is less scary because we’ve already made sense of the data and we know how to point the student towards making the “right” conclusions. But these PS data have not before been analyzed. Even if we the teacher may have seen similar data, we have not seen these data. The student is really and truly functioning as a researcher, and the teacher doesn’t know the conclusion. What’s more disorienting, the teacher doesn’t have control of the method. Traditional, when we talk about ‘shape’ of a distribution, we trot out data sets that show the shapes we want the students to see. But if the students are gathering their own data, is the shape of a distribution necessarily useful? (It gets scarier at a meta-level: many teachers are novice statisticians, and so how do we teach the teachers do be prepared to react to novel data?)
So I’ll sign off with some questions. Suppose my classroom collects data on how many hours they sleep a night for, say, one month. We create a data file to include each student’s data. Students do not know any statistics–this is their first data experience. What is the first thing we should show them? A distribution? Of what? What concepts do students bring to the table that will help them make sense of longitudinal data? If we don’t start with distributions, should we start with an average curve? With an overly of multiple time-series plots (“spaghetti plots”)? And what’s the lesson, or should be the lesson, in examining such plots?