Every couple of years, the International Association of Statistics Education hosts a Roundtable discussion, wherein researchers, statisticians, and curriculum developers are gather from around the world to share ideas. The 2012 Roundtable, held in Cebu City, the Philippines, focused on the role of Technology in Statistics Education, and so, after a very long time editing (for me and Jennifer Kaplan) and re-writing (for our authors), we are now ready to present the Roundtable Special Edition. The articles cover the spectrum, K-12, introductory statistics, and beyond. Versions of these articles appeared in the Proceedings, but the versions published here are peer-reviewed, re-written, and re-written again. Topics include : designing computer games to teach data science, measuring the attitude of teachers towards technology in their classroom, how to decide which features make a successful on-line course, how to best teach students to use statistical packages, some exciting innovations for teaching inference and experimental design, as well as descriptions of exciting developments in statistics education in Kenya, Malaysia, and more!
The website Lifehacker recently had an article about some common statistical misconceptions. I thought they did a great job explaining things like the base-rate fallacy and Simpson’s Paradox for a lay audience. I also really liked the extrapolation cartoon they picked. [Read the whole article here.]
Two years ago, my department created a new two-course, doctoral-level sequence primarily aimed at our quantitative methods students. This sequence, aside from our students, also attracts students from other departments (primarily in the social sciences) that plan to pursue more advanced methodological coursework (e.g., Hierarchical Linear Modeling).
One of the primary characteristics that differentiates this new sequence of courses from the other doctoral sequence of methodology courses that we teach is that it is “more rigorous”. This adjective, rigorous, bothers me. It bothers me because I don’t know what it means.
How do I know if a class is rigorous? When I ask my colleagues, the response is more often than not akin to Supreme Court Justice Potter Stewart’s “definition” of pornography (see Jacobellis v. Ohio)…I may not be able to define what a ‘rigorous course’ is, but you’ll know it when you take one.
It seems that students, in my experience, associate rigor with the amount (and maybe complexity) of mathematics that appear in the course. Rigor also seems to be directly associated with the amount of homework and difficulty-level of the assessments.
I think that I relate rigor to the degree to which a student is pushed intellectually. Because of this, I have a hard time associating rigor with a particular course. In my mind, rigorousness is an interaction between the content, the assessment and the student. The exact same course taught in different semesters (or different sections within a semester) has, in my mind, had differing levels of rigor, not because the content (nor assessment) has changed, but because the student make-up has been different.
The experience in the classroom, as much as we try to standardize it in the curriculum, is very different from one class to the next. A single question or curiosity might change the tenor of a class (to the good or the bad). And, try as I might to recreate the thoughtful questions or digressions of learning in future iterations of the course, the academic result often never matches that of the original.
So maybe having students that are all interested in statistics in a single course lead to a more nuanced curiosity and thereby rigor. But, on the other hand, there is much to be said about courses in which there are students with a variety of backgrounds and academic interests. I think rigor can exist in both types of courses. Or, maybe I am completely wrong and rigor is something more tangible. Is there such a thing as a rigorous course?
What do we fear more? Losing data privacy to our government, or to corporate entities? On the one hand, we (still) have oversight over our government. On the other hand, the government is (still) more powerful than most corporate entities, and so perhaps better situated to frighten.
In these times of Snowden and the NSA, the L.A. Times ran an interesting story about just what tracking various internet companies perform. And it’s alarming. (“They’re watching your every move.”, July 10, 2013). Interestingly, the story does not seem to appear on their website as of this posting.) Like the government, most of these companies claim that (a) their ‘snooping’ is algorithmic; no human sees the data and (b) their data are anonymized. And yet…
To my knowledge, businesses aren’t required to adhere to, or even acknowledge, any standards or practices for dealing with private data. Thus, a human could snoop on particular data. We are left to ponder what that human will do with the information. In the best case scenario, the human would be fired, as, according to the L.A. Times, Google did when it fired an engineer for snooping on emails of some teenage girls.
But the data are anonymous, you say? Well, there’s anonymous and then there’s anonymous. As LaTanya Sweeney taught us in the 90′s, knowing a person’s zipcode, gender, and date of birth is sufficient to uniquely identify 85% of Americans. And the L.A. Times reports a similar study where just four hours of anonymized tracking data was sufficient to identify 95% of all individuals examined. So while your name might not be recorded, by merging enough data files, they will know it is you.
This article fits in really nicely with a fascinating, revelatory book I’m currently midway through: Jaron Lanier‘s Who Owns The Future? A basic theme of this book is that internet technology devalues products and goods (files) and values services (software). One process through which this happens is that we humans accept the marvelous free stuff that the internet provides (free google searches, free amazon shipping, easily pirated music files) in exchange for allowing companies to snoop. The companies turn our aggregated data into dollars by selling to advertisers.
A side affect of this, Lanier explains, is that there is a loss of social freedom. At some point, a service such as Facebook gets to be so large that failing to join means that you are losing out on possibly rich social interactions. (Yes, I know there are those who walk among us who refuse to join Facebook. But these people are probably not reading this blog, particularly since our tracking ‘bots tell us that most of our readers come from Facebook referrals. Oops. Was I allowed to reveal that?) So perhaps you shouldn’t complain about being snooped on since you signed away your privacy rights. (You did read the entire user agreement, right? Raise your hand if you did. Thought so.) On the other hand, if you don’t sign, you become a social pariah. (Well, an exaggeration. For now.)
Recently, I installed Ghostery, which tracks the automated snoopers that follow me during my browsing. Not only “tracks”, but also blocks. Go ahead and try it. It’s surprising how many different sources are following your every on-line move.
I have mixed feelings about blocking this data flow. The data-snooping industry is big business, and is responsible, in part, for the boom of stats majors and, more importantly, the boom in stats employment. And so indirectly, data-snooping is paying for my income. Lanier has an interesting solution: individuals should be paid for their data, particular when it leads to value. This means the era of ‘free’ is over–we might end up paying for searches and for reading wikipedia. But he makes a persuasive case that the benefits exceed the costs. (Well, I’m only half-way through the book. But so far, the case is persuasive.)
Technology Innovations in Statistics Education has published a paper by Noleine Fitzallen that I think many readers of this blog will find interesting. She examines a group of young students to see the ways that they use Tinkerplots to analyze data. Here’s the abstract and link:
Exploration of the way in which students interacted with the software package, TinkerPlots Dynamic Data Exploration, to answer questions about a data set using different forms of graphical representations, revealed that the students used three dominant strategies – Snatch and Grab, Proceed and Falter, and Explore and Complete. The participants in the study were 12 year 5-and-6 students (11-12 years old) who completed data analysis activities and answered questions about the data analysis process undertaken. The data for the inquiry were collected by on-screen capture video as the students worked at the computer with TinkerPlots. Thematic analysis was used to explore the data to determine the students’ strategies when conducting data analysis within the software environment.
It just seems to me that this is what data science was meant to do: give us fun toys. This particular “toy”, called Every Noise at Once, lets you explore the musical universe. Ours may be the last blog to comment on it–I think I stumbled upon this too late. But it provides a great example for our students about the power of data analysis.
The data come from the company EchoNest, and the visualization (although that’s a weak word for this—its visual and aural, maybe “visauralization”?) from their chief engineer Glenn McDonald. According to McDonald’s blog (via www.furia.com, on May 31 2013), songs are depicted in a 10-dimensional space, reduced to two dimensions here. The dimension are: vertical: ”organic” (on the bottom) to “mechanical” on top. I love the designation of “organic”, which says so much more than “acoustic”. The horizontal axis is a what McDonald calls “bounciness”, with songs on the right being bouncier than songs on the left.
The joy of this visuaralization is that it is interactive. Click on a genre and hear a representative sample. Click on the “>>” symbol next to the genre label, and it expands to show you practitioners of the genre.
I suppose part of me feels that music has too many labels. This graph gives this point of view some support. And yet, I confess, it was quite satisfying to learn that there is a difference between “indie pop” and “indie rock”. (Both are roughly equally bouncy, but pop is more mechanical.) ”String Quartet” is its own genre, and if you double click, you see the names of actual string quartets. The Takacs Quartet is apparently more mechanical than the borodin quartet. The only recording of Takacs I have is of the Bartok quartets, and so I guess this makes sense. Still, string quartets consist of four stringed instruments, and so I suppose the scale of the variation here must be quite small. A mechanical string quartet is, I suppose, one that amps its strings: I couldn’t find the Kronos Quartet, which I looked for somewhere in the upper-right quadrant. Nor could I find my LA-based favorites the Calder Quartet, which I would expect to fall somewhere in the center-right of the graph.
Dimension reduction in all its many forms is an important part of the visualization world. Which raises the question: when do we teach this to our students? Can it be taught, in some form, in introductory statistics? These questions seem related to one of my pet peeves, namely that we don’t teach statistics students how to interpret maps. Maps are, today, summaries of data. Most are quite crude, but students should learn to be critical (in the constructive sense) of data maps. Is there a data-mapping framework that would allow us to teach how to be critical of heat maps, google-type maps, traffic maps, and maps of musical genres?
You can win $1000 for turning your Ph.D. thesis into an interpretive dance. More importantly, you will also receive a call-out from Science and get to perform your dance at TEDX in Belgium. This contest is not only open to more recent Ph.D.s, but anyone who got a Ph.D. (in the sciences) and also to students working on a Ph.D.
Gonzolabs has tips and examples over on their website. So put on your dancing shoes, grad your Ph.D. advisor and do-si-do. Now, if I can only get the Jabbawockeez and figure out what a mixed-effects model looks like as a dance…
Ever since we wrote an article in which we analyzed the articles which were been published in the Statistics Education Research Journal (Zieffler et al., 2011), I have been thinking about the relationships within the network of literature published on statistics education. What are the pivotal articles? Which are foundational? How inter-connected are the articles?
This spring I started documenting those relationships by putting together a social network of articles published in Technology Innovations in Statistics Education and the articles they referenced. I just finished that work and used Gephi to produce a couple network plots.
The first network graph (shown above) examines the community structure of the network by decomposing the network into sub-networks, or communities. I have made the nodes for the actual TISE articles larger for visual ease of interpretation. The node labels are the first author’s last name and year of publication. Currently (and not unsurprisingly), the subnetworks generally consist of the actual article published in TISE and the literature that was referenced therein. There are some commonalities between articles as well. For example, the two articles by McDaniel were identified as a single community. It will be interesting to see how these communities change as I add more literature into the network.
The second network graph has the size of the node and node label sized by in-degree. In this case, in-degree is a measure of how often a particular article was referenced. The most cited literature in TISE is:
- Chance, B., Ben-Zvi, D., Garfield, J., & Medina, E. (2007). The role of technology in improving student learning of statistics. Technology Innovations in Statistics Education, 1(1).
- Konold, C., & Miller, C. (2004). TinkerPlots Dynamic Data Exploration (Version 1–2). Emeryville, CA: Key Curriculum Press.
- American Statistical Association. (2010). Guidelines for assessment and instruction in statistics education (GAISE): College report. Alexandria, VA: Author.
- National Council of Teachers of Mathematics. (2000). Principles and standards for school mathematics. Reston, VA: Author.
- R Core Team. (2011). R: A language and environment for statistical computing. ISBN 3-900051-07-0. Vienna, Austria.
At some point, it would be nice to do this by author as well.
I just finished reading An Accidental Statistician: The Life and Memories of George E. P. Box. The book reads like he is recounting his memories (it is aptly named) rather than as a biography. I enjoyed the stories and vignettes of his work and his intersections with other statisticians. The book also included pictures of many famous statisticians (George’s friends and family—Fisher was his father-in-law for a bit) in social situations. My favorite was the picture of Dr. Frank Wilcoxon on his motorcycle (see below).
There were some very interesting and funny anecdotes. For example, when George recounted a trip to Israel, he was told to get to the airport very early because of the intense security measures. After standing in a non-moving line for several hours, he apparently quipped that he had never before physically seen a stationary process.
My favorite sections of the book were the stories he told of writing Statistics for Experimenters, his book—along with William (Bill) Hunter and Stu Hunter—on experimental design. He wrote about how the book evolved from mimeographed notes for a course he had taught to the published version. It took several years for them to finish the writing of the book, only to be met with horrible reviews. (Note: This makes me feel slightly better about the year it took to write our book.)
In a chapter written about Bill Hunter (who was one of George’s graduate students at the University of Wisconsin), George relates that Bill started his PhD in 1960. After he finished (in 1963!) he was hired almost immediately by Wisconsin as an assistant professor. Three years later he was made associate professor, and in 1969 (eight years after he started his PhD) he was made full professor. Unbelievable!
Research Hacks are a series of blog posts about some of the tools, applications, and computer programs that I use in my workflow. Some of these I began using when I was a graduate student, and others I have picked up more recently. This is the second post in the series (see the first post Feedreaders and Aggregators.)
Electronically managing the absurdly large volume of articles, reports, book chapters and other writings that academics procure is a huge way to save time and increase production. My initial way to manage these files (often PDFs) was to include them in a folder that corresponded to a particular project or paper. Because I could never find the article again (Spotlight was a long way from working well at this point), I often had multiple copies of the same paper residing on my computer. This also meant that I had multiple annotations across these papers.
When one summer I realized that I had 11 copies of a paper on covariational reasoning (the topic of my dissertation) on my computer I laughed at the absurdity of this system and vowed to fix it. This is when I found Papers.
Papers (now in its second version—Papers2) is a management system for a person’s “research library” (as they refer to it). It is sort of like iTunes for PDF files. You have a “library” of files (only one place on your computer) and these are displayed in the Papers application (just like iTunes). You can then have “playlists” in which you put these files, but without creating multiple copies! For example, you could have “playlists” containing the references for each paper you are currently writing.
The search feature is great. If you are an organization nut like myself, you can also input all sorts of meta-data (publication type, tag words, photos, links to supplementary material, etc.). Papers can also output references for BibTeX or Endnote and has integration with Scrivener and Word. There are limited annotation tools within Papers (although more in v2.1) at this point, but rumor has it that is a big part of the future. There are also several workarounds using Dropbox, Skim, etc. Lastly there are iPhone and iPad apps for Papers that I think are beautiful. Reading articles on the iPad is one of the coolest things ever.
Unfortunately Papers is not free (but there is a substantial discount for students). Also, as far as I know, it is only available for Mac users. There are several other management and reference systems available as well. Two of those are Zotero and Mendeley.
Each system has features that are really cool and some that aren’t as well developed. Why did I choose Papers? At the time it was pretty much there only one choice at the time that existed in a state that actually worked. (I seem to recall Mendeley was just released as a beta version.) Would I make the same choice now? I am not sure, but I think so. My second choice would be Zotero. (I am a little concerned about what will happen to Mendeley now that it has been purchased by Elsevier.)
No matter what choice you make, let me make several suggestions.
- Begin using it immediately.
- Begin entering meta-data for every paper you have right away. Don’t be chincy here. Yes, I know it is time-consuming, but that just gets worse as you accumulate more and more articles. Some of this can be automated depending on the recency of the paper, etc.
- Learn how to use it to input references into a paper.
- Figure out a workflow for paper annotation (taking notes, highlighting, etc.)
Summer is a wonderful time to learn a new software program or computing language. Happy computing!