Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Image

Book Review: R Graphs Cookbook 64

RickJWagner writes "Once upon a time, I thought communication was one of my strong suits. Alas, a few years into my programming career I realized I'm more of the head-down codeslinging type, not one of the schmoozing managerial types. So when I have a point to make, I really like to have my data ready to do the talking for me. In that capacity, this book is a very good weapon to have in my arsenal." Read on for the rest of Rick's review.
R Graphs Cookbook
author Hrishi Mittal
pages 272
publisher Packt Publishing
rating 8/10
reviewer RickJWagner
ISBN 1849513066
summary An invaluable reference book for expert R users
Right away, you should realize this is not a book that teaches R. R (an excellent open source statistical language) is a great tool for any technician. I've used it to analyze logs, find performance bottlenecks, and make sense of mountains of nearly unrecognizable data. But this book doesn't teach R, it teaches R graphing.

It turns out R has excellent graphing capabilities. You can draw scatter plots, line plots, pie graphs, bar charts, histograms, box and whisker plots, heat maps, contour maps and 'regular' maps. These are all good for demonstrating data in different ways, and the book lightly explains which graph will help you illustrate which point.

If you're getting a little interested, you'll also want to know that all this graphing can be scripted and scheduled. So you can get data-driven reports on a schedule, easily accomplished once you know how to write the graphing scripts (which are then scheduled using cron or a similar facility). One small caveat: To prepare your data for presentation, I think it's usually necessary to partner R with another language that's better for text extracting and manipulation. I prefer Python for this task, you might like another language.

The book is exceptionally easy to read and work with. This doesn't mean it's simplistic, though. Anyone who's tangled with R's graphing without a good example will testify that figuring out the various functions and arguments necessary to wrangle a descriptive graph can be really difficult. This book gives you the kind of graphs you need, with the bells and whistles you're going to want, in a series of snippets you can run immediately.

The book is written in Packt's "Recipe" format. In a nutshell, this means that it's a series of how-to sections worded in a templated form. There are headings for sections that inform you what you're going to accomplish, how it's done, and why it worked. You quickly realize it's a repetitive format, but it serves to make the book an excellent resource for quick reference.

Another really nice feature of the book is the downloadable source code and matching data. Knowing the data is half the battle, really. The specific formulas given are certainly useful, but without knowing how the underlying data is formatted you really wouldn't get nearly the practical value. For that reason, I urge anyone using this book to be sure they examine the underlying data for at least the first few formulas. After that, it'll be automatic, you'll know you want to look at that data when you're trying to master some graph type. Then when you go to make your own data ready for graphing, you reach for that secondary language like Python, extract the fields you want in a way similar to your example data set, and presto-- you've got the graph you want.

The book starts out with a first chapter that introduces the kinds of graphs you'll be able to produce and situations where each type is most useful. The next chapters, up until the final one, are in-depth sections on each of the graph types. Maps are treated to a different chapter than pie graphs, for instance. The final chapter covers putting final touches on your graphs, including saving them in different formats (PDF, PNG, JPEG, etc.) and niceties like adding scientific notations, mathematical symbols, etc.

The book states that the target audience is experienced R programmers. I really don't think that's necessary, though. There is an obligatory R installation section, and I think that a reasonably competent programmer with Google at his disposal could get off the ground (for graphing purposes) with this book and a little bumbling. If you already know R, then you needn't worry at all, there is nothing here that will look foreign to you.

If I could change one thing about the book, I'd want a comprehensive index of all the functions and arguments that augment the basic core functions that produce the example graphs. These functions and arguments tweak the basic function in ways that make them much more appealing than what the basic function alone can provide. But the book isn't able to show each and every combination with each graphing function, so it's up to the reader to figure out how to pick some of the options from one recipe and apply it to another. It's not difficult to do, but having an index to help you find the options you want would make this process easier.

You can purchase R Graphs Cookbook from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

*

This discussion has been archived. No new comments can be posted.

Book Review: R Graphs Cookbook

Comments Filter:
  • by Anonymous Coward

    How much are they paying you guys to keep putting these Packt reviews up?

    • by vlm ( 69642 )

      How much are they paying you guys to keep putting these Packt reviews up?

      I donno, he advocates for python when everyone else would use perl, and I'm sure the python guys are not paying for that...

  • by PCM2 ( 4486 ) on Monday April 18, 2011 @01:31PM (#35858474) Homepage

    ...is brought to you once again by the letter Packt and the number RickJWagner.

    • Are there no more O'Reilly books being published anymore? How about some reviews of new instant classics (like the Camel book [oreilly.com])?

      It's been all Packt, all the time for how long now?

      As a side note, PacktLib (all you can eat package) is more expensive at $220/yr than O'Reilly's Safari Online Books. Safari is $110/yr for the base package--5 books at a time. That's strange, since O'Reilly books have usually been considered the best tech books. Also they have books from a whole lot of other publishers, while Packt

      • I pay $0 for an all-I-can Safari ... of course, it is bought by my county library system, and I pay taxes and donate to the library system specifically, so I guess I do pay something...

        • by PCM2 ( 4486 )

          My library offers Safari access also (San Francisco Public Library). From the related page:

          You are signed in to Safari Books Online, paid for and licensed by your academic or public library. You are accessing a Custom Safari Books Online Library that contains a specially-tailored subset of 3,827 titles from Safari Books Online's overall content.

          Mind you, they point out that this is a subset of the full 12,000 books available on paid Safari. But 3,827 books is nothing to sneeze at.

      • by RDW ( 41497 )

        'Safari is $110/yr for the base package--5 books at a time.'

        For that money, you could also buy around 20 O'Reilly iPhone apps on iTunes. Each contains the unencrypted text of the book, which is easy to extract and re-package as a conventional ePub for use on any device:

        http://oreilly.com/ebooks/oreilly_iphone_tips.csp [oreilly.com]
        http://zef.me/3246/convert-cheap-oreilly-iphone-app-books-to-epub [zef.me]

    • What's the problem? I thought it was a decent review of the type of book that should be of interest to the ./ readership - much preferred over "paranoid spin on political events" type stories that are taking over.
  • Or you can use Excel (Score:4, Informative)

    by AdamInParadise ( 257888 ) on Monday April 18, 2011 @01:46PM (#35858644) Homepage

    Or any other spreadsheet program.

    Now of course I admit that Excel is probably not as flexible as R. However, unless your job is to produce stunning, tailor-made graphs, a spreadsheet application will deliver results a lot faster.

    • Re: (Score:1, Insightful)

      by pclminion ( 145572 )
      If your data set is so small that a spreadsheet can open it, then your data set is a toy data set.
      • Re: (Score:2, Insightful)

        by Anonymous Coward

        If your data set is so small that a spreadsheet can open it, then your data set is a toy data set.

        Where's the +1,Smugly Superior mod option when you need it...

        Seriously, any data set that you encounter "in the wild" is by definition not a toy data set. There's many instances where using a spreadsheet to quickly visualize some figures is fine, just like there's many instances where using a word processor to write a letter instead of firing up TeX is fine.

        (Granted, you can easily write letters with TeX, too, and better-looking ones than what LibreOffice etc. will come up with, but that's because TeX has p

        • Where's the +1,Smugly Superior mod option when you need it...

          I was trying to be funny, not an asshole. Apparently I failed. I apologize.

          • Where's the +1,Smugly Superior mod option when you need it...

            I was trying to be funny, not an asshole. Apparently I failed. I apologize.

            Next time try adding a joke as a hint towards your intentions.

            • Next time try adding a joke as a hint towards your intentions.

              I was trying to poke fun at Excel's well known limitation on the maximum number of rows. Sometimes a joke's not funny if you need to spell it out.

      • by sseaman ( 931799 )

        If you're talking about the ridiculous row limit, that went away in Excel 2007. [wikipedia.org]

        However, like many researchers I have used several versions of Excel to produce publishable graphs from summary data--means, SEMs, etc. I love R, but it was only recently that I decided to spend enough time learning the ins and outs of its graphing capabilities that I felt comfortable producing even a bar chart in R for publication. Since I had been producing my tables in Excel anyway--and I'm still not entirely in love with us

        • Re: (Score:2, Informative)

          by Anonymous Coward

          Wrong row limit. Sure you can _have_ 1M+ rows but you can still only graph 32K of them at a time.

        • For lattice graphics, get Lattice: Multivariate Data Visualization with R, by the author of the lattice package in R. However, I would recommend instead the ggplot2 package, and the book ggplot2: Elegant Graphics for Data Analysis by its author. ggplot has all the functionality that lattice does, it produces prettier plots by default, and its easier to specify graphs and edit them with a minimal change in code.
          • Hadley Wickham, author of ggplot2, is a prolific contributor of R modules. His documentation is fairly good, yet of the somewhat harried variety. You can get yourself quite lost by the amount of argument inheritance, which in R is entirely unlike tea. The book needs about 50% more material added, by someone who understands generic programming, stating precisely what operators are required for each argument passed into the ggplot hairball.

            Hadley also indulged in some proscriptive urges. One was not to pr

      • by lwsimon ( 724555 )

        Or you've offloaded all the heavy processing work to the database server - where it belongs - and are doing mere presentational work on your desktop...

      • If your data set is so small that a spreadsheet can open it, then your data set is a toy data set.

        Says someone who has obviously never worked in, say, the financial services industry.

    • what pclminion said. also: "this graphing can be scripted and scheduled."

      • by vlm ( 69642 )

        what pclminion said. also: "this graphing can be scripted and scheduled."

        The best part is not just tail ending "a graph" at the end of a script, but automating thousands of graphs at various resolutions, high, medium, and thumbnail, and then creating the index page that clicks thru. And send emails of "noteworthy" graphs to certain personnel. Add and remove graphs as they appear in the data set, all automatically. I would imagine my little couple minute script would take months to do manually in Excel, one graph at a time.. But my script runs daily...

        Excel is, unfortunately,

    • by Beryllium Sphere(tm) ( 193358 ) on Monday April 18, 2011 @02:14PM (#35858986) Journal

      People who know more about statistics than I do severely criticize Excel, e.g. http://www.stat.uiowa.edu/~jcryer/JSMTalk2001.pdf [uiowa.edu]

      • That paper appears to have been written by a bright but semi-literate 13 year old anti-MS geek, so I'm guessing it's someone on slashdot.
    • by plopez ( 54068 ) on Monday April 18, 2011 @02:34PM (#35859250) Journal

      I like R because:
      1) It can handle the large (million or more) ata sets I need to crunch and compare

      2) Seriously, the latest versions of Excel seem to choke on larger datasets. The "Oh no! Excel is bogging down and getting ready to crash!" sensation is far too frequent. R is much more stable

      3) You can do nice graphics in R you can't do in Excel. See http://addictedtor.free.fr/graphiques/ [addictedtor.free.fr]

      4) There is a huge number of pre-rolled *serious* statistical libraries already written, and open sourced (including GPL'd) for it. FFT, geospatial stats, multivariate linear and non-linear statistical modeling, time series analysis, linear algebra, and more. Including OOP. I jam ust exploring how R does OOP now.

      5) The scripting language is in the Lisp family. It works the way I think.

      6) You can compile and link in your own packages in Fortran (pick your flavor 77, 88, 95, '03, or '08), C, C++, etc. If it links, you can link it.

      Sweet. Also more stable than Matlab (and cheaper), and more user friendly than SAS.

      • by garcia ( 6573 )

        And R is free and SAS and/or Excel are not. For most here that would be the big deal breaker.

        While I use SAS myself, it's because it's available to me. However, I would not use Excel to build charts simply because if you have to change something it's very likely you will have to recreate the chart too. Personally I like running a block of code and having the output get e-mailed to the report's recipient each day/week/month/quarter/foo w/o me having to do anything manually.

        Excel = manual and that scares the

    • And just how do you write a UNIX script that can automatically aggregate the desired data run it through R using Excel (without having to ship the data off your UNIX system via Samba or some other roundabout way)?

      I'll bet most of the users of R are working on some sort of UNIX/Linux system as is common in the scientific community.
    • I'm sorry, but if you think Excel's graphs are good for much of anything, or you think they are easy to edit and reformat, you are grossly mistaken. I'm no novice: I've written spreadsheets with named variables so I can change the content of Excel graphs by changing names or data in cells.

      Before you get snarky about R, at least take the time to find one of the web sites dedicated to displaying charts, maps, and graphs generated with R. Most of them are far beyond anything Excel can do.
      If all you want ar

      • I should add: I've even written a set of macros in Excel that let Excel play Pong against itself. I bring it out whenever someone says "but I can do that in Excel..." to which I say,"I can do this.... but just because you can does not mean you should." Sic semper Excel graphics.

    • by dtdmrr ( 1136777 )

      a spreadsheet application will deliver results a lot faster.

      Not really, particularly if you have the data already entered. Running:
      R
      data=read.csv("data.csv")
      hist(data)

      takes far less time than selecting your columns, dragging the mouse over to the graph button, selecting the region for your plot, and then trudging through a multi-stage wizard. Even if you actually want to type in some data in a spreadsheet its frequently faster to save the table and load it up in R or gnuplot to graph it. And if you do want something like a histogram or a boxplot, excel doesn

    • Now of course I admit that Excel is probably not as flexible as R. However, unless your job is to produce stunning, tailor-made graphs, a spreadsheet application will deliver results a lot faster.

      R is not a graphing language. It's a statistics language. If you just want to plot your sales growth by quarter, sure, a spreadsheet is much more convenient. But professional-quality graphs aren't the only (or even the primary) reason for R.

      R has an enormous library of very well refined statistics functions. Spreadsheets are not designed to handle hundreds of thousands of data points, cross-correlations, advanced data transforms, and all kinds of analysis that spreadsheets don't (and shouldn't) have.

    • by 1E06 ( 971987 )
      Please, please, please have a look at http://www.stat.uiowa.edu/~jcryer/JSMTalk2001.pdf [uiowa.edu] and at http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html [burns-stat.com] "The hard way looks easy, the easy way looks hard."
    • by Paltin ( 983254 )
      No.

      Graphing things in R is much faster.



      plot(foo$bar,foo$blarg)

      Done.

      As opposed to highlightning columns, switching to insert chart, inserting.... makin sure everything is in the right place...
  • by proxima ( 165692 ) on Monday April 18, 2011 @02:12PM (#35858956)

    R makes great graphs functionally speaking, but without mucking about with the options and some post-processing they are not the most attractive. Open up your favorite financial/data intensive news source and look at the visuals and you'll find that generating that style with just code is fairly difficult.

    Until about Office 2007, the defaults in Excel charts were also atrocious. Openoffice.org is still pretty bad, and Matlab is not much better than R. The good news is that you can generate PDFs from each of these and easily open them in Inkscape/Illustrator, where making vector-based edits is easy.

    Anyone who regularly visualizes data needs to pick up resources on how to clearly organize and display your data, like "The Visual Display of Quantitative Information" by Edward Tufte (though some of his examples are a little dated). Books like that are full of examples that would be very tricky to replicate without any post processing, because it usually involves eliminating excessive lines and cluttering detail.

    • R makes great graphs functionally speaking, but without mucking about with the options and some post-processing they are not the most attractive.

      Base graphics aren't that nice looking, but that's why ggplot and lattice exist. You can fairly easily produce publication quality graphs with them without spending much time dealing with additional options. There are also packages which produce many of the plots which Tufte promulgates.

    • by plopez ( 54068 )

      per my other post see: http://addictedtor.free.fr/graphiques/ [addictedtor.free.fr]

    • MATLAB is BETTER than R? Holy shit, R must look fucking terrible, because even in MATLAB 2010b, after a bit of editing, the result is still fucking hideous.

    • by Lorens ( 597774 )

      Anyone who regularly visualizes data needs to pick up resources on how to clearly organize and display your data, like "The Visual Display of Quantitative Information" by Edward Tufte (though some of his examples are a little dated).

      For a modern example please see Hans Rosling :

      http://singularityhub.com/2010/12/09/hans-rosling-shows-you-200-years-of-global-growth-in-4-minutes-video/ [singularityhub.com]

      Really. I've showed it to my parents, wife, and my two kids (sub-teen), they were all totally enthralled.

  • Far, far too basic. (Score:4, Informative)

    by dondelelcaro ( 81997 ) <don@donarmstrong.com> on Monday April 18, 2011 @02:14PM (#35858982) Homepage Journal

    Just from examining the few preview pages on amazon.com, this book appears to be far too basic for anyone who has actually done any serious work with R. I personally would forgo this entire book, and spend the time wandering through the R Graph Gallery [addictedtor.free.fr] which has far more examples with source code and underlying data. It's also rather odd that this book doesn't cover ggplot, grid graphics, lattice, or any of the more commonly used tools in advanced R graphics.

    Perhaps this book could be useful as your first foray into graphing with R... but I'm unconvinced it even covers that well.

    • I think that is generally true of books from Packt Publishing. They present an introduction to the topic only and just when you are getting to the point where some real-world depth is needed to solve your problem they give out. As such I avoid books from them.

  • nuff said! - try it!

  • Lets just be careful we are not overly reliant on pure data in the first place. Or you become susceptible to these (http://pastebin.com/p2HfGx1L) techniques. P.S. Sorry for the pastebin link, but it looks like Venkat took down his online email archives...
  • Coach Outlet Coach Outlets Store Coach Outlets Online Coach Outlet Store Coach Factory Outlets http://www.coachoutletsstore.org/ [coachoutletsstore.org]

Sendmail may be safely run set-user-id to root. -- Eric Allman, "Sendmail Installation Guide"

Working...