Data Science Webinar Announcement

I’m pleased to announce that on Monday, September 11 , 9-11 am Pacific, I’ll be leading a Concord Consortium Data Science Education Webinar. Oddly, I forgot to give it a title, but it would be something like “Towards a Learning Trajectory for K-12 Data Science”. This webinar, like all Concord webinars, is intended to be highly interactive. Participants should have their favorite statistical software at the ready. A detailed abstract as well as registration information is here

At the same site you can view recent wonderful webinars by Cliff Konold, Hollylynne Lee and Tim Erickson.

Envisioning Data Science Webinar Series and Call for Input

Webinar Series: Data Science Undergraduate Education

Join the National Academies of Sciences, Engineering, and Medicine for a webinar series on undergraduate data science education. Webinars will take place on Tuesdays from 3-4pm ET starting onSeptember 12 and ending on November 14. See below for the list of dates and themes for each webinar.

This webinar series is part of an input-gathering initiative for a National Academies study on Envisioning the Data Science Discipline: The Undergraduate Perspective. Learn more about the study, read the interim report, and share your thoughts with the committee on the study webpage at

Webinar speakers will be posted as they are confirmed on the webinar series website.

Webinar Dates and Topics

  • 9/12/17 – Building Data Acumen
  • 9/19/17 – Incorporating Real-World Applications
  • 9/26/17 – Faculty Training and Curriculum Development
  • 10/3/17 – Communication Skills and Teamwork
  • 10/10/17 – Inter-Departmental Collaboration and Institutional Organization
  • 10/17/17 – Ethics
  • 10/24/17 – Assessment and Evaluation for Data Science Programs
  • 11/7/17 – Diversity, Inclusion, and Increasing Participation
  • 11/14/17 – Two-Year Colleges and Institutional Partnerships

All webinars take place from 3-4pm ET.  If you plan to join us online, please register to attend.  You will have the option to register for the entire webinar series or for individual webinars.

Share Your Input

The study committee is seeking public input for consideration in their upcoming report which will set forth a vision for the emerging discipline of data science at the undergraduate level.  To share your input with the committee, please fill out this form.

Modernizing the Undergraduate Statistics Curriculum at #JSM2017

I’m a bit late in posting this, but travel delays post-JSM left me weary, so I’m just getting around to it. Better late than never?

Wednesday at JSM featured an invited statistics education session on Modernizing the Undergraduate Statistics Curriculum. This session featured two types of speakers: those who are currently involved in undergraduate education and those who are on the receiving end of graduating majors. The speakers involved in undergraduate education presented on their recent efforts for modernizing the undergraduate statistics curriculum to provide the essential computational and problem solving skills expected from today’s modern statistician while also providing a firm grounding in theory and methods. The speakers representing industry discussed their expectations (or hopes and dreams) for new graduates and where they find gaps in the knowledge of new hires.

The speakers were  Nick Horton (Amherst College), Hilary Parker (Stitch Fix), Jo Hardin (Pomona College), and Colin Rundel (Duke University). The discussant was Rob Gould (UCLA). Here are the slides for each of the speakers. If you have any comments or questions, let us know in the comments.

Modernizing the undergraduate statistics curriculum: what are the theoretical underpinnings? – Nick Horton

Hopes and dreams for statistics graduates – Hilary Parker

Expectations and Skills for Undergraduate Students Doing Research in Statistics and Data Science – Jo Hardin

Moving Away from Ad Hoc Statistical Computing Education – Colin Rundel

Discussion – Rob Gould

Novel Approaches to First Statistics / Data Science Course at #JSM2017

Tuesday morning, bright an early at 8:30am, was our session titled “Novel Approaches to First Statistics / Data Science Course”. For some students the first course in statistics may be the only quantitative reasoning course they take in college. For others, it is the first of many in a statistics major curriculum. The content of this course depends on which audience the course is aimed at as well as its place in the curriculum. However a data-centric approach with an emphasis on computation and algorithmic thinking is essential for all modern first statistics courses. The speakers in our session presented their approaches for the various first courses in statistics and data science that they have developed and taught. The discussion also highlighted pedagogical and curricular choices they have made in deciding what to keep, what to eliminate, and what to modify from the traditional introductory statistics curriculum. The speakers in the session were Ben Baumer from Smith College, Rebecca Nugent from CMU, myself, and Daniel Kaplan from Macalester College. Our esteemed discussant was Dick DeVeaux, and our chair, the person who managed to keep this rambunctious bunch on time, was Andrew Bray from Reed College. Here are the slides for each of the speakers. If you have any comments or questions, let us know in the comments, or find us on social media!

Ben Baumer – Three Methods Approach to Statistical InferenceRebecca Nugent – Lessons Learned in Transitioning from “Intro to Statistics” to “Reasoning with Data”

Mine Cetinkaya-Rundel – A First-Year Undergraduate Data Science Course

Daniel Kaplan – Teaching Stats for Data Science

Dick DeVeaux – Discussion


My JSM 2017 itinerary

JSM 2017 is almost here. I just landed in Maryland, and I finally managed to finish combing through the entire program. What a packed schedule! I like writing an itinerary post each year, mainly so I can come back to it during and after the event. I obviously won’t make it to all sessions listed for each time slot below, but my decision for which one(s) to attend during any time period will likely depend on proximity to previous session, and potentially also proximity to childcare area.

The focus of the sessions I selected are education, data science, computing, visualization, and social responsibility. In addition to talks on topics I actively work in, I also enjoy listening to talks in application areas I’m interested in, hence the last topic on this list.

If you have suggestions for other sessions (in these topics or other) that you think would be interested, let me know in the comments!

Sun, 7/30/2017

Sunday will be mostly meetings for me, and I’m skipping any evening stuff to see Andrew Bird & Belle and Sebastian!

Mon, 7/31/2017

  • DataFest meeting: 10am – 12pm at H-Key Ballroom 9. Stop by if you’re already an ASA DataFest organizer, or if you’d like to be one in the future!
    • First hour will be discussing what worked and what didn’t, any concerns, kudos, advice for new sites, etc.
    • Second hour will be drop-in for addressing any questions regarding organizing an ASA DataFest at your institution.
  • Computing and Graphics mixer: 6 – 8pm at H-Key Ballroom 1.
  • Caucus for Women in Statistics Reception and Business Meeting: 6:30 – 8:30pm at H-Holiday Ballroom 1&2.

8:30 AM – 10:20 AM

10:30 AM – 12:20 PM

2:00 PM – 3:50 PM

4:00 PM – 5:50 PM

ASA President’s Invited Speaker: It’s Not What You Said. It’s What They Heard – Jo Craven McGinty, The Wall Street Journal

Tue, 8/1/2017

8:30 AM – 10:20 AM

10:30 AM – 12:20 PM

2:00PM – 3:50 PM

4:00 PM – 5:50 PM

Deming Lecture: A Rake’s Progress Revisited – Fritz Scheuren, NORC-University of Chicago

Wed, 8/2/2017

  • Statistical Education Business Meeting – 6-7:30pm

8:30 AM – 10:20 AM

10:30 AM – 12:20 PM

2:00PM – 3:50 PM

4:00 PM – 5:50 PM

COPSS Awards and Fisher Lecture: The Importance of Statistics: Lessons from the Brain Sciences – Robert E. Kass, Carnegie Mellon University

Thur, 8/3/2017

8:30 AM – 10:20 AM

 10:30 AM – 12:20 PM

StatPREP Workshops

This last weekend I helped Danny Kaplan and Kathryn Kozak (Coconino Community College) put on a StatPREP workshop. We were also joined by Amelia McNamara (Smith College) and Joe Roith (St. Catherine’s University). The idea behind StatPREP is to work directly with college-level instructors, through online and in community-based workshops, to develop the understanding and skills needed to work and teach with modern data.

Danny Kaplan ponders at #StatPREP

One of the most interesting aspects of these workshops were the tutorials and exercises that the participants worked on. These utilized the R package learnr. This package allows people to create interactive tutorials via RMarkdown. These tutorials can incorporate code chunks that run directly in the browser (when the tutorial is hosted on an appropriate server), and Shiny apps. They can also include exercises/quiz questions as well.

An example of a code chunk from the learnr package.

Within these tutorials, participants were introduced to data wrangling (via dplyr), data visualization (via ggfomula), and data summarization and simulation-based inference (via functions from Project Mosaic). You can see and try some of the tutorials from the workshop here. Participants, in breakout groups, also envisioned a tutorial, and with the help of the workshop presenters, turned that into the skeleton for a tutorial (some things we got working and others are just outlines…we only had a couple hours).

You can read more about the StatPREP workshops and opportunities here.



Read elsewhere: Organizing DataFest the tidy way

Part of the reason why we have been somewhat silent at Citizen Statistician is that it’s DataFest season, and that means a few weeks (months?) of all consuming organization followed by a weekend of super fun data immersion and exhaustion… Each year that I organize DataFest I tell myself “next year, I’ll do [blah] to make my life easier”. This year I finally did it! Read about how I’ve been streamlining the process of registrations, registration confirmations, and dissemination of information prior to the event on my post titled “Organizing DataFest the tidy way” on the R Views blog.

Stay tuned for an update on ASA DataFest 2017 once all 31 DataFests around the globe have concluded!

Theaster Gates, W.E.B. Du Bois, and Statistical Graphics

After reading this review of a Theaster Gates show at Regan Projects, in L.A., I hurried to see the show before it closed. Inspired by sociologist and civil rights activist W.E.B. Du Bois, Gates created artistic interpretations of statistical graphics that Du Bois had produced for an exhibition in Paris in 1900.  Coincidentally, I had just heard about these graphics the previous week at the Data Science Education Technology conference while evesdropping on a conversation Andy Zieffler was having with someone else.  What a pleasant surprise, then, when I learned, almost as soon as I got home, about this exhibit.

I’m no art critic ( but I know what I like), and I found these works to be beautiful, simple, and powerful.  What startled me, when I looked for the Du Bois originals, was how little Gates had changed the graphics. Here’s one work (I apologize for not knowing the title. That’s the difference between an occasional blogger and a journalist.)  It hints of Mondrian, and  the geometry intrigues. Up close, the colors are rich and textured.

Here’s Du Bois’s circa-1900 mosaic-type plot (from, which provides a nice overview of the exhibit for which Du Bois created his innovative graphics)

The title is “Negro business men in the United States”. The large yellow square is “Grocers” the blue square “Undertakers”, and the green square below it is “Publishers.  More are available at the Library of Congress.

Here’s another pair.  The Gates version raised many questions for me.  Why were the bars irregularly sized? What was the organizing principle behind the original? Were the categories sorted in an increasing order, and Gates added some irregularities for visual interest?  What variables are on the axes?

The answer is, no, Gates did not vary the lengths of the bars, only the color.

The vertical axis displays dates, ranging from 1874 to 1899 (just 1 year before Du Bois put the graphics together from a wide variety of sources).  The horizontal axis is acres of land, with values from 334,000 to 1.1 million.

The history of using data to support civil rights has a long history.   A colleague once remarked that there was a great unwritten book behind the story that data and statistical analysis played (and continue to play) in the gay civil rights movement (and perhaps it has been written?)  And the folks at We Quant LA have a nice article demonstrating some of the difficulties in using open data to ask questions about racial profiling by the LAPD. In this day and age of alternative facts and fake news, it’s wise to be careful and precise about what we can and cannot learn from data. And it is encouraging to see the role that art can play in keeping this dialogue alive.

JSM 2016 session on “Doing more with data”

The ASA’s most recent curriculum guidelines emphasize the increasing importance of data science, real applications, model diversity, and communication / teamwork in undergraduate education. In an effort to highlight recent efforts inspired by these guidelines, I organized a JSM session titled Doing more with data in and outside the undergraduate classroom. This session featured talks on recent curricular and extra-curricular efforts in this vein, with a particular emphasis on challenging students with real and complex data and data analysis. The speakers discussed how these pedagogical innovations aim to educate and engage the next generation, and help them acquire the statistical and data science skills necessary to succeed in a future of ever-increasing data. I’m posting the slides from this session for those who missed it as well as for those who want to review the resources linked in the slides.

Computational Thinking and Statistical Thinking: Foundations of Data Science

by Ani Adhikari and Michael I. Jordan, University of California at Berkeley


Learning Communities: An Emerging Platform for Research in Statistics

by Mark Daniel Ward, Purdue University


The ASA DataFest: Learning by Doing

by Robert Gould, University of California at Los Angeles

(See if you’re interested in organizing an ASA DataFest at your institution.)


Statistical Computing as an Introduction to Data Science

by Colin Rundel, Duke University [GitHub]

JSM 2016 session on Reproducibility in Statistics and Data Science

Will reproducibility always be this hard?Ten years after Ioannidis alleged that most scientific findings are false, reproducibility — or lack thereof — has become a full-blown crisis in science. Flagship journals like Nature and Science have published hand-wringing editorials and revised their policies in the hopes of heightening standards of reproducibility. In the statistical and data sciences, the barriers towards reproducibility are far lower, given that our analysis can usually be digitally encoded (e.g., scripts, algorithms, data files, etc.). Failure to ensure the credibility of our contributions will erode “the extraordinary power of statistics,” both among our colleagues and in our collaborations with scientists of all fields. This morning’s JSM session on Reproducibility in Statistics and Data Science featured talks on recent efforts in pursuit of reproducibility. The slides of talks by the speakers and the discussant are posted below.

Note that some links point to a GitHub repo including slides as well as other useful resources for the talk and for adopting reproducible frameworks for your research and teaching. I’m also including Twitter handles for the speakers which is likely the most efficient way for getting in touch with them if you have any questions for them.

This session was organized by Ben Baumer and myself as part of our Project TIER fellowship. Many thanks to Amelia McNamara, who is also a Project TIER fellow, for chairing the session (and correctly pronouncing my name)!

  • Reproducibility for All and Our Love/Hate Relationship with Spreadsheets – Jenny Bryan – repo, including slides – @JennyBryan
  • Steps Toward Reproducible Research – Karl Broman – slides – @kwbroman
  • Enough with Trickle-Down Reproducibility: Scientists, Open This Gate! Scientists, Tear Down This Wall! – Karthik Ram – slides – @_inundata
  • Integrating Reproducibility into the Undergraduate Statistics Curriculum – Mine Çetinkaya-Rundel – repo, including slides – @minebocek
  • Discussant: Yihui Xie – slides – @xieyihui

PS: Don’t miss this gem of a repo for links to many many more JSM 2016 slides. Thanks Karl for putting it together!