A few weeks ago I gave a two-hour Introduction to R workshop for the Master of Engineering Management students at Duke. The session was organized by the student-led Career Development and Alumni Relations committee within this program. The slides for the workshop can be found here and the source code is available on GitHub. Why might this be of interest to you? The materials can give you a sense of what’s feasible to teach in two hours to an audience that is not scared of programming but is new to R.

Continue reading

In one of our previous posts (Halloween: An Excuse for Plotting with Icons), we gave a quick tutorial on how to plot using icons using ggplot. A reader, Dr. D. K. Samuel asked in a comment how to use multiple icons. His comment read, ...can you make a blog post on using multiple icons for such data year, crop,yield 1995,Tomato,250 1995,Apple,300 1995,Orange,500 2000, Tomato,600 2000,Apple, 800 2000,Orange,900 it will be nice to use icons for each data point.

Continue reading

In my course on the GLM, we are discussing residual plots this week. Given that it is also Halloween this Saturday, it seems like a perfect time to code up a residual plot made of ghosts. The process I used to create this plot is as follows: Find an icon that you want to use in place of the points on your scatterplot (or dot plot). I used a ghost icon (created by Andrea Mazzini) obtained from The Noun Project.

Continue reading

Are you looking for a way to celebrate World Statistics Day? I know you are. And I can’t think of a better way than supporting the African Data Initiative (ADI). I’m proud to have met some of the statisticians, statisticis educators and researchers who are leading this initative at an International Association of Statistics Educators Roundtable workshop in Cebu, The Phillipines, in 2012. You can read about Roger and David’s Stern’s projects in Kenya here in the journal Technology Innovations in Statistics Education.

Continue reading

In one of our older posts, I wrote about using feedreaders and aggregators to keep up-to-date on blogs, journals, etc. These work great when the site you want to read have an RSS feed. RSS (Rich Site Summary) is simply a format that retrieves updated content from a webpage. A feedreader (or aggregator) grabs the “feed” and displays the updated content for you to read. No more having to visit the website daily to see if something changed or was updated.

Continue reading

This post is about ggplot2 and dplyr packages, so let’s start with loading them: library(ggplot2) library(dplyr) I can’t be the first person to make the following mistake: ggplot(mtcars, aes(x = wt, y = mpg)) %>% geom_point() Can you spot the mistake in the code above? Look closely at the end of the first line. The operator should be the + used in ggplot2 for layering, not the %>% operator used in dplyr for piping, like this:

Continue reading

The LA Times reported today, along with several other sources, that the California Department of Justice has initiated a new “open justice” data initiative. On their portal, the “Justice Dashboard”, you can view Arrest Rates, Deaths in Custody, or Law Enforcement Officers Killed or Assaulted. I chose, for my first visit, to look at Deaths in Custody. At first, I was disappointed with the quality of the data provided. Instead of data, you see some nice graphical displays, mostly univariate but a few with two variables, addressing issues and questions that are probably on many people’s minds.

Continue reading

Author's picture

Citizen Statistician

Learning to swim in the data deluge