Revisiting that first day of class example

About a year ago I wrote this post: 

I wasn’t teaching that semester, so couldn’t take my own advice then, but thankfully (or the opposite of thankfully) Trump’s tweets still make timely discussion.

I had two goals for presenting this example on the first day of my data science course (to an audience of all first-year undergraduates, with little to no background in computing and statistics):

  1. Give a data analysis example with a familiar context
  2. Show that if they take the time to read the code, they can probably understand what it’s doing, at least at a high level

First, I provided them some context: “The author wanted to analyze Trump’s tweets: both the text, and some other information on the tweets like when and from what device they were posted.” And I asked the students “If you wanted to do this analysis, how would you go about collecting the data?”. Some suggested manual data collection, which we all agreed is too tedious. A few suggested there should be a way to get the data from Twitter. So then we went back to the blog post, and worked our way through some of the code. (My narrative is roughly outlined in handwriting below.)

The moral of the story: You don’t need to figure out how to write a program that gets tweets from Twitter. Someone else has already done it, and packaged it up (in a package called twitteR), and made it available for you to use. Here, the important message I tried to convey was that “No, I don’t expect you to know that this package exists, or to figure out how to use it. But I hope you agree that once you know the package exists, it’s worth the effort to figure out how to use its functionality to get the tweets, instead of collecting the data manually.”

Then, we discussed the following plot in detail:

First, I asked the students to come up with a list of variables we need in our dataset so that we can make this plot: we need to know what time each tweet was posted and what device it came from and we need to know how what percentage of tweets were posted in a given hour.

Here is the breakdown of the code (again, my narrative is in the handwritten comments):

Once again, I wanted to show the students that if they take some time, they can probably figure out roughly what each line (ok, maybe not each, but most lines) of code are doing. We didn’t get into discussing what’s a geom, what’s the difference between %>% and +, what’s an aesthetic, etc. We’ll get into those, but the night semester is young…

My hope is that next time I present how to do something new in R, they’ll remember this experience of being able to mostly figure out what’s happening by taking some time staring at the code and thinking about “if I had to do this by hand, how would I go about it?”.

Modernizing the Undergraduate Statistics Curriculum at #JSM2017

I’m a bit late in posting this, but travel delays post-JSM left me weary, so I’m just getting around to it. Better late than never?

Wednesday at JSM featured an invited statistics education session on Modernizing the Undergraduate Statistics Curriculum. This session featured two types of speakers: those who are currently involved in undergraduate education and those who are on the receiving end of graduating majors. The speakers involved in undergraduate education presented on their recent efforts for modernizing the undergraduate statistics curriculum to provide the essential computational and problem solving skills expected from today’s modern statistician while also providing a firm grounding in theory and methods. The speakers representing industry discussed their expectations (or hopes and dreams) for new graduates and where they find gaps in the knowledge of new hires.

The speakers were  Nick Horton (Amherst College), Hilary Parker (Stitch Fix), Jo Hardin (Pomona College), and Colin Rundel (Duke University). The discussant was Rob Gould (UCLA). Here are the slides for each of the speakers. If you have any comments or questions, let us know in the comments.

Modernizing the undergraduate statistics curriculum: what are the theoretical underpinnings? – Nick Horton

Hopes and dreams for statistics graduates – Hilary Parker

Expectations and Skills for Undergraduate Students Doing Research in Statistics and Data Science – Jo Hardin

Moving Away from Ad Hoc Statistical Computing Education – Colin Rundel

Discussion – Rob Gould

Novel Approaches to First Statistics / Data Science Course at #JSM2017

Tuesday morning, bright an early at 8:30am, was our session titled “Novel Approaches to First Statistics / Data Science Course”. For some students the first course in statistics may be the only quantitative reasoning course they take in college. For others, it is the first of many in a statistics major curriculum. The content of this course depends on which audience the course is aimed at as well as its place in the curriculum. However a data-centric approach with an emphasis on computation and algorithmic thinking is essential for all modern first statistics courses. The speakers in our session presented their approaches for the various first courses in statistics and data science that they have developed and taught. The discussion also highlighted pedagogical and curricular choices they have made in deciding what to keep, what to eliminate, and what to modify from the traditional introductory statistics curriculum. The speakers in the session were Ben Baumer from Smith College, Rebecca Nugent from CMU, myself, and Daniel Kaplan from Macalester College. Our esteemed discussant was Dick DeVeaux, and our chair, the person who managed to keep this rambunctious bunch on time, was Andrew Bray from Reed College. Here are the slides for each of the speakers. If you have any comments or questions, let us know in the comments, or find us on social media!

Ben Baumer – Three Methods Approach to Statistical InferenceRebecca Nugent – Lessons Learned in Transitioning from “Intro to Statistics” to “Reasoning with Data”

Mine Cetinkaya-Rundel – A First-Year Undergraduate Data Science Course

Daniel Kaplan – Teaching Stats for Data Science

Dick DeVeaux – Discussion