Fruit Plot: Plotting Using Multiple PNGs

In one of our previous posts (Halloween: An Excuse for Plotting with Icons), we gave a quick tutorial on how to plot using icons using ggplot. A reader, Dr. D. K. Samuel asked in a comment how to use multiple icons. His comment read,

…can you make a blog post on using multiple icons for such data
year, crop,yield
1995,Tomato,250
1995,Apple,300
1995,Orange,500
2000, Tomato,600
2000,Apple, 800
2000,Orange,900
it will be nice to use icons for each data point. It will also be nice if the (icon) data could be colored by year.

This blog post will address this request. First, the result…

fruit-plot

The process I used to create this plot is as follows:

  1. Find the icons that you want to use in place of the points on your scatterplot (or dot plot).

I used an apple icon (created by Creative Stall), an orange icon (created by Gui Zamarioli), and a tomato icon (created by Andrey Vasiliev); all obtained from The Noun Project.

  1. Color the icons.

After downloading the icons, I used Gimp, a free image manipulation program, to color each of the icons. I created a green version, and a blue version of each icon. (The request asked for the two different years to have different colors.) I also cropped the icons.

Given that there were only three icons, doing this manually was not much of a time burden (10 minutes after I selected the color palette—using colorbrewer.org). Could this be done programatically? I am not sure. A person, who is not me, might be able to write some commands to do this with ImageMagick or some other program. You might also be able to do this in R, but I sure don’t know how…I imagine it involves re-writing the values for the pixels you want to change the color of, but how you determine which of those you want is beyond me.

If you are interested in only changing the color of the icon outline, an alternative would be to download the SVGs rather than the PNGs. Opening the SVG file in a text editor gives the underlying syntax for the SVG. For example, the apple icon looks like this:

<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" x="0px" y="0px" viewBox="0 0 48 60" enable-background="new 0 0 48 48" xml:space="preserve">
  <g>
    <path d="M19.749,48c-1.662... />
    <path d="M24.001,14.866c-0.048, ... />
    <path d="M29.512, ... />
  </g>
<text x="0" y="63" fill="#000000" font-size="5px" font-weight="bold" font-family="'Helvetica Neue', Helvetica, Arial-Unicode, Arial, Sans-serif">Created by Creative Stall</text><text x="0" y="68" fill="#000000" font-size="5px" font-weight="bold" font-family="'Helvetica Neue', Helvetica, Arial-Unicode, Arial, Sans-serif">from the Noun Project</text>
</svg>

The three path commands draw the actual apple. The first draws the apple, the second path command draws the leaf on top of the apple, and the third draws the stem. Adding the text, fill=”blue” to the end of each path command will change the color of the path from black to blue (see below).

<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" x="0px" y="0px" viewBox="0 0 48 60" enable-background="new 0 0 48 48" xml:space="preserve">
  <g>
    <path d="M19.749,48c-1.662 ... fill="blue" />
    <path d="M24.001,14.866c-0.048, ... fill="blue" />
    <path d="M29.512, ... fill="blue" />
  </g>
<text x="0" y="63" fill="#000000" font-size="5px" font-weight="bold" font-family="'Helvetica Neue', Helvetica, Arial-Unicode, Arial, Sans-serif">Created by Creative Stall</text><text x="0" y="68" fill="#000000" font-size="5px" font-weight="bold" font-family="'Helvetica Neue', Helvetica, Arial-Unicode, Arial, Sans-serif">from the Noun Project</text>
</svg>

This could easily be programmatically changed. Then the SVG images could also programmatically be exported to PNGs.

  1. Read in the icons (which are PNG files).

Here we use the readPNG() function from the png library to bring the icon into R.

library(png)
blue_apple = readPNG("~/Desktop/fruit-plot/blue_apple.png", TRUE)
green_apple = readPNG("~/Desktop/fruit-plot/green_apple.png", TRUE)
blue_orange = readPNG("~/Desktop/fruit-plot/blue_orange.png", TRUE)
green_orange = readPNG("~/Desktop/fruit-plot/green_orange.png", TRUE)
blue_tomato = readPNG("~/Desktop/fruit-plot/blue_tomato.png", TRUE)
green_tomato = readPNG("~/Desktop/fruit-plot/green_tomato.png", TRUE)
  1. Create the data.

Use the data.frame() function to create the data.

plotData = data.frame(
&nbsp; year = c(1995, 1995, 1995, 2000, 2000, 2000),
&nbsp; crop = c("tomato", "apple", "orange", "tomato", "apple", "orange"),
&nbsp; yield = c(250, 300, 500, 600, 800, 900)
)

plotData
  year   crop yield
1 1995 tomato   250
2 1995  apple   300
3 1995 orange   500
4 2000 tomato   600
5 2000  apple   800
6 2000 orange   900

Next we will add a column to our data frame that maps the year to color. This uses the ifelse() function. In this example, if the logical statement plotData$year == 1995 evaluates as TRUE, then the value will be “blue”. If it evaluates as FALSE, then the value will be “green”.

plotData$color = ifelse(plotData$year == 1995, "blue", "green")

plotData
  year   crop yield color
1 1995 tomato   250  blue
2 1995  apple   300  blue
3 1995 orange   500  blue
4 2000 tomato   600 green
5 2000  apple   800 green
6 2000 orange   900 green

Now we will use this new “color” column in conjunction with the “crop” column to identify the icon that will be plotted for each row. the paste0() function concatenates each argument together with no spaces between them. Here we are concatenating the color value, an underscore, and the crop value.

plotData$icon = paste0(plotData$color, "_", plotData$crop)

plotData
  year   crop yield color         icon
1 1995 tomato   250  blue  blue_tomato
2 1995  apple   300  blue   blue_apple
3 1995 orange   500  blue  blue_orange
4 2000 tomato   600 green green_tomato
5 2000  apple   800 green  green_apple
6 2000 orange   900 green green_orange
  1. Use ggplot to create a scatterplot of the data, making the size of the points 0.
library(ggplot2)

p = ggplot(data = plotData, aes(x = year, y = yield)) +
  geom_point(size = 0) +
  theme_bw() +
  xlab("Year") +
  ylab("Yield")
  1. Use a for() loop to add annotation_custom() layers (one for each point) that contain the image.

Similar to the previous post, we add new layers (in our case each layer will be an additional point) by recursively adding the layer and then writing this into p. The key is that the image name is now in the “icon” column of the data frame. The values in the “icon” column are character data. To make R treat these as objects we first parse the character data using the parse() function, and then we use eval() to have R evaluate the parsed expression. A description of this appears in this Stack Overflow question.

library(grid)

for(i in 1:nrow(plotData)){
  p = p + annotation_custom(
    rasterGrob(eval(parse(text = plotData$icon[i]))),
    xmin = plotData$year[i] - 20, xmax = plotData$year[i] + 20, 
    ymin = plotData$yield[i] - 20, ymax = plotData$yield[i] + 20
  )
} 

# Show plot
print(p)
  1. Some issues to consider and my alternative plot.

I think that plot is what was requested, but since I cannot help myself, I would propose a few changes that I think would make this plot better. First, I would add lines to connect each fruit (apple in 1995 to apple in 2000). This would help the reader to better track the change in yield over time.

Secondly, I would actually leave the fruit color constant across years and vary the color between fruits (probably coloring them according to their real-world colors). This again helps the reader in that they can more easily identify the fruits and also helps them track the change in yield. (It also avoids a Stroop-like effect of coloring an orange some other color than orange!)

Here is the code:

# Read in PNG files
apple = readPNG("~/Desktop/fruit-plot/red_apple.png", TRUE)
orange = readPNG("~/Desktop/fruit-plot/orange_orange.png", TRUE)
tomato = readPNG("~/Desktop/fruit-plot/red_tomato.png", TRUE)

# Plot
p2 = ggplot(data = plotData, aes(x = year, y = yield)) +
  geom_point(size = 0) +
  geom_line(aes(group = crop), lty = "dashed") +
  theme_bw()  +
  xlab("Year") +
  ylab("Yield") +
  annotate("text", x = 1997, y = 350, label = "Tomato created by Andrey Vasiliev from the Noun Project", size = 2, hjust = 0) +
  annotate("text", x = 1997, y = 330, label = "Apple created by Creative Stall from the Noun Project", size = 2, hjust = 0) +
  annotate("text", x = 1997, y = 310, label = "Orange created by Gui Zamarioli from the Noun Project", size = 2, hjust = 0)

for(i in 1:nrow(plotData)){
  p2 = p2 + annotation_custom(
    rasterGrob(eval(parse(text = as.character(plotData$crop[i])))),
    xmin = plotData$year[i] - 20, xmax = plotData$year[i] + 20, 
    ymin = plotData$yield[i] -20, ymax = plotData$yield[i]+20
  )
}

# Show plot
print(p2)

And the result…

fruit-plot2

Halloween: An Excuse for Plotting with Icons

In my course on the GLM, we are discussing residual plots this week. Given that it is also Halloween this Saturday, it seems like a perfect time to code up a residual plot made of ghosts.

Ghost plotThe process I used to create this plot is as follows:

  1. Find an icon that you want to use in place of the points on your scatterplot (or dot plot).

I used a ghost icon (created by Andrea Mazzini) obtained from The Noun Project. After downloading the icon, I used Preview to create a new PNG file that had cut out the citation text in the downloaded image. I will add the citation text at a later stage in the plot itself. This new icon was 450×450 pixels.

  1. Use ggplot to create a scatterplot of a set of data, making the size of the points 0.

Here is the code that will create the data and make the plot that I used.

plotData = data.frame(
  .fitted = c(76.5, 81.3, 75.5, 79.5, 80.1, 78.5, 79.5, 77.5, 81.2, 80.4, 78.1, 79.5, 76.6, 79.4, 75.9, 86.6, 84.2, 83.1, 82.4, 78.4, 81.6, 79.6, 80.4, 82.3, 78.6, 82.1, 76.6, 82.1, 87, 82.2, 82.1, 87.2, 80.5, 84.9, 78.5, 79, 78.5, 81.5, 77.4, 76.8, 79.4, 75.5, 80.2, 80.4, 81.5, 81.5, 80.5, 79.2, 82.2, 83, 78.5, 79.2, 80.6, 78.6, 85.9, 76.5, 77.5, 84.1, 77.6, 81.2, 74.8, 83.4, 80.4, 77.6, 78.6, 83.3, 80.4, 80.5, 80.4, 83.8, 85.1, 82.2, 84.1, 80.2, 75.7, 83, 81.5, 83.1, 78.3, 76.9, 82, 82.3, 85.8, 78.5, 75.9, 80.4, 82.3, 75.7, 73.9, 80.4, 83.2, 85.2, 84.9, 80.4, 85.9, 76.8, 83.3, 80.2, 83.1, 77.6),
  .stdresid = c(0.2, -0.3, 0.5, 1.4, 0.3, -0.2, 1.2, -1.1, 0.7, -0.1, -0.3, -1.1, -1.5, -0.1, 0, -1, 1, 0.3, -0.5, 0.5, 1.8, 1.6, -0.1, -1.3, -0.2, -0.9, 1.1, -0.2, 1.5, -0.3, -1.2, -0.6, -0.4, -3, 0.5, 0.3, -0.8, 0.8, 0.5, 1.3, 1.8, 0.5, -1.6, -2, -2.1, -0.8, 0.4, -0.9, 0.4, -0.4, 0.6, 0.4, 1.4, -1.4, 1.3, 0.4, -0.8, -0.2, 0.5, 0.7, 0.5, 0.1, 0.1, -0.8, -2.1, 0, 1.9, -0.5, -0.1, -1.4, 0.6, 0.7, -0.3, 1, -0.7, 0.7, -0.2, 0.8, 1.3, -0.7, -0.4, 1.5, 2.1, 1.6, -1, 0.7, -1, 0.9, -0.3, 0.9, -0.3, -0.7, -0.9, -0.2, 1.2, -0.8, -0.9, -1.7, 0.6, -0.5)
  )

library(ggplot2)

p = ggplot(data = plotData, aes(x = .fitted, y = .stdresid)) +
    theme_bw() + 
    geom_hline(yintercept = 0) +
    geom_point(size = 0) +
    theme_bw() +
    xlab("Fitted values") +
    ylab("Standarized Residuals") +
    annotate("text", x = 76, y = -3, label = "Ghost created by Andrea Mazzini from Noun Project")
  1. Read in the icon (which is a PNG file).

Here we use the readPNG() function from the png library to bring the icon into R.

library(png)
ghost = readPNG("/Users/andrewz/Desktop/ghost.png", TRUE)
  1. Use a for() loop to add annotation_custom() layers (one for each point) that contain the image.

The idea is that since we have saved our plot in the object p, we can add new layers (in our case each layer will be an additional point) by recursively adding the layer and then writing this into p. The pseudo-like code for this is:

for(i in 1:nrow(plotData)){
    p = p + 
      annotation_custom(
        our_image,
        xmin = minimum_x_value_for_the_image, 
        xmax = maximum_x_value_for_the_image, 
        ymin = minimum_y_value_for_the_image, 
        ymax = maximum_y_value_for_the_image
        ) 
    }

In order for the image to be plotted, we first have to make it plot-able by making it a graphical object, or GROB.

The rasterGrob() function (found in the grid,/b> package) renders a bitmap image (raster image) into a graphical object or GROB which can then be displayed at a specified location, orientation, etc. Read more about using Raster images in R here.

The arguments xmin, xmax, ymin, and ymax give the horizontal and vertical locations (in data coordinates) of the raster image. In our residual plot, we want the center of the image to be located at the coordinates (.fitted, .stdresid). In the syntax below, we add a small bit to the maximum values and subtract a small bit from the minimum values to force the icon into a box that will plot the icons a bit smaller than their actual size. (#protip: play around with this value until you get a plot that looks good.)

library(grid)

for(i in 1:nrow(plotData)){
    p = p + annotation_custom(
      rasterGrob(ghost),
      xmin = plotData$.fitted[i]-0.2, xmax = plotData$.fitted[i]+0.2, 
      ymin = plotData$.stdresid[i]-0.2, ymax = plotData$.stdresid[i]+0.2
      ) 
    }

Finally we print the plot to our graphics device using

print(p)

And the result is eerily pleasant!

The African Data Initiative

Are you looking for a way to celebrate World Statistics Day? I know you are. And I can’t think of a better way than supporting the African Data Initiative (ADI).

I’m proud to have met some of the statisticians, statisticis educators and researchers who are leading this initative at an International Association of Statistics Educators Roundtable workshop in Cebu, The Phillipines, in 2012. You can read about Roger and David’s Stern’s projects in Kenya here in the journal Technology Innovations in Statistics Education. This group — represented at the workshop by father-and-son Roger and David, and at-the-time grad students Zacharaiah Mbasu and James Musyoka — impressed me with their determination to improve international statistical literacy and  with their successful and creative pragmatic implementations to adjust to the needs of the local situations in Kenya.

The ADI is seeking funds within the next 18 days to adapt two existing software packages, R and Instat+ so that there is a free, open-source, easy-to-learn statistical software package available and accessible throughout the world. While R is free and open-sourced, it is not easy to learn (particularly in areas where English literacy is low). Instat+ is, they claim, easy to learn but not open-source (and also does not run on Linux or Mac).

One of the exciting things about this project is that these solutions to statistical literacy are being developed by Africans working and researching in Africa, and are not ‘imported’ by groups or corporations with little experience implementing in the local schools. One lesson I’ve learned from my experience working with the Los Angeles Unified School District is that you must work closely with the schools for which you are developing curricula; outsider efforts have a lower chance of success. I hope you’ll take a moment –in the next 18 days–to become acquainted with this worthy project!

World Statistics Day is October 20.  The theme is Better Data. Better Lives.

Research Hack: Obtaining an RSS Feed for Websites without RSS Feeds

In one of our older posts, I wrote about using feedreaders and aggregators to keep up-to-date on blogs, journals, etc. These work great when the site you want to read have an RSS feed. RSS (Rich Site Summary) is simply a format that retrieves updated content from a webpage. A feedreader (or aggregator) grabs the “feed” and displays the updated content for you to read. No more having to visit the website daily to see if something changed or was updated.

Many sites have an RSS feed and make the process of adding the content to your feedreader fairly painless. In general the process works something like this…look for the RSS icon, click it, and choose the reader you want to send the content to, and click subscribe.

Subscribing to an RSS Feed

Subscribing to an RSS Feed

But what about sites that do not have an RSS feed explicitly in place? It turns out that there are third-party applications that can piece together an RSS feed for these sites as well. There are several of these applications available online. I have used Page2RSS and had pretty good luck.

Page2RSSJust enter the URL of the site you want to keep tabs on into the “Page URL” box and click the “to RSS” button. This will produce another URL that can be copied and pasted into your feedreader or subscribed to directly.

I have successfully used this to add RSS feeds from many sites and journals to my feedreader. Below is a shot from Feedly (my current reader) on my iPad Mini of updated content (as of this morning) on the Journal of Statistics Education website.

JSE on Feedly