Forgot your password?
typodupeerror
Image

R In a Nutshell 91

Posted by samzenpus
from the read-all-about-it dept.
joel.neely writes "R is a statistical computing environment that is fully-compliant with state-of-the-art buzzwords: free, open-source, cross-platform, interactive, graphics, objects, closures, higher-order functions, and more. It is supported by an impressive collection of user-supplied modules through CRAN, the 'Comprehensive R Archive Network.' And now it has its own O'Reilly Nutshell book, R in a Nutshell, written by Joseph Adler. I am pleased to report that Adler has risen to the challenge of the highly-regarded 'Nutshell' franchise. As is traditional for the series, this title mixes introduction, tutorial, and reference material in a style that is well suited to a reader who already has a background in programming, but is a new or occasional user of R." Read on for the rest of Joel's review.
R in a Nutshell
author Joseph Adler
pages 672
publisher O'Reilly
rating 9/10
reviewer Joel Neely
ISBN 978-0-596-80170-0
summary A practical and engaging introduction to the R statistical system and its usage
As a curious newcomer to R who wanted to get going quickly, I was well-served by Part 1, which provided an R kickstart. Chapter 1 covers the process of getting and installing R. It is short, to the point, and just works, addressing Windows, Mac OS X, and Linux/Unix with equal attention. Chapter 2, on the R user interface, introduces the range of options for interacting with R: the GUI (both the standard version and some enhanced alternatives), the interactive console, batch mode, and the RExcel package (which supports R inside a certain well-known spreadsheet). Chapter 3 uses a set of interactive examples to provide a quick tour of the R language and environment, establishing a task-oriented theme that carries through the rest of the book. The last chapter of part 1 covers R packages. It summarizes the standard pre-loaded packages, introduces the tools to explore repositories and install additional package, and concludes by explaining how to create new packages.

As a polyglot programmer who is always interested in seeing how a new language approaches programs and their construction, I enjoyed Part 2, which described the R language. This section begins with an overview in chapter 5, and then devotes a chapter each to R syntax, R objects, symbols and environments (central to understanding the dynamic nature of R), functions (including higher-order functions), and R's own approach to object-oriented programming. This section closes in chapter 11, with a discussion of techniques and tips for improving performance.

As a busy professional with data sitting on my hard drive that I'd like to understand better, I appreciated Part 3, with its practical emphasis on using R to load, transform, and visualize data. Chapter 12 presented alternatives for loading, editing, and saving data, from the built-in data editor, through file I/O in a variety of formats, to a mature set of database access options. Chapter 13 illustrated a range of techniques for manipulating, organizing, cleaning, and sorting data, in preparation for presentation or more detailed analysis. Chapter 14 introduces the reader to the wealth of graphical presentation options built into the R environment. There are so many charting types and details that this chapter could have been overwhelming, but Adler keeps the interest high and the mood light by drawing on an engaging variety of data: toxic chemical levels, baseball statistics, the topography of Yosemite Valley, demographic data, and even turkey prices. Chapter 15 is devoted to lattice graphics, the R implementation of the "trellis graphics" technique for data visualization developed at Bell Labs. This chapter illustrates the power of lattice graphics by exploring the question of why more babies are born on weekdays than weekends.

As a non-statistician who still occasionally needs to do some number-crunching, I'm sure I'll be returning to Part 4, with its detailed explanations and illustrations of analysis tools and techniques–almost two-hundred pages worth. In chapters 16 through 20, Adler surveys topics in data analysis, probability, statistics, power tests, and regression modeling. As someone who has been offered too many medications and lost fortunes, I found much to enjoy in chapter 21, which used a variety of spam-detection techniques to illustrate the concepts of classification. Chapter 22, on machine learning, discusses several of the data mining techniques that R supports. Chapter 23 covers time series analysis, which may be used to identify trends or periodic patterns in data. Finally, chapter 24 offers an overview of Bioconductor, an open-source project focused on genomic data.

The book closes with a detailed reference to the standard R packages.

This is an impressive piece of work. In a volume of this size (about 650 pages), navigation is crucial, and I found both the organization of the chapters and index up to the task. I was able to follow the instructions and examples through the first several chapters of the book essentially without a hitch, and in the latter chapters the variety of illustrations and data sources added interest to what could have been very dull going.

I won't claim perfection for this book. There were a couple of explanations that could have been clearer, and one or two odd turns of phrase or rough edits. Out of all the code examples that I tried, I found exactly one that didn't seem to work without a minor correction. For a work of this size, that's actually pretty amazing!

As a long-time O'Reilly reader, I see Joseph Adler's R in a Nutshell as a welcome addition to the menagerie.

You can purchase R in a Nutshell: A Desktop Quick Reference from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
This discussion has been archived. No new comments can be posted.

R In a Nutshell

Comments Filter:
  • ebook version (Score:4, Informative)

    by proxima (165692) on Monday July 19, 2010 @01:52PM (#32953406)

    As an (occasional) R user, I am excited to see a well-reviewed O'Reilly book on the language. I went and checked the major ebook stores - Amazon, BN, and Stanza, and none had the title.

    It turns out that in addition to the Safari books service, O'Reilly also sells DRM-free copies in epub, mobi, and PDF formats. This book is available here [oreilly.com]. It's not a huge discount over the printed version on Amazon ($6.50 less), though. I'm surprised, then, that it isn't available via the major stores.

  • by Yold (473518) on Monday July 19, 2010 @02:33PM (#32953878)

    It handles data nicely. You can do things similar to list comprehensions in python. Implementing it in another language would break its semi-compatibility with S-plus. It also has data-types aimed towards the sorts of processing that it is designed for, like formula objects and data frames. Finally, the interactive mode is invaluable for exploratory analysis.

    You could build a ton of syntactic sugar into another language to get something close to R, in-fact, that's actually what basically all of the operations in R are (syntactic sugar, as described in R In A Nutshell).

    So to answer your question, it makes more sense to design a language for statistics rather than hack it onto an existing language.

  • by dr_canak (593415) on Monday July 19, 2010 @02:44PM (#32954044)

    Not having read the O' Reilly book,

    I can't draw a comparison between the two, but I have been extremely pleased with "R In Action" by Robert Kabacoff

    and it can be found here:

    http://www.manning.com/kabacoff/ [manning.com]

    It's a work in progress, in that some 90% of the book is written. Pre-ordering the electronic version gives you the ability to download chapters as they are written, plus a final e-copy (or hard copy if you pay more) when it's completed.

    I have a high degree of familiarity with SPSS and SAS, and am learning R to get around the crazy licensing issues of the aforementioned programs. I have been very pleased with Kabacoff's book, as I had *no* familiarity with R before grabbing "R in Action." The publisher/author support a forum where purchasers can identify errors and/or make suggestions for improvements before the book goes to final press.

    Not sure if it is competition for "R in a Nutshell" or simply an additional reference, but worth checking out if you want to learn R. It's been very helpful for me.

    jeff

  • by chthonicdaemon (670385) on Monday July 19, 2010 @03:04PM (#32954386) Homepage Journal
    This is really the old domain specific language argument. Why go for a DSL when you have a good general purpose language and you can add functionality with libraries. In the end, it's all about notation. You can add a matrix library to Java and write A = B.times(C).plus(D).invert().transpose(), or you can have a language that allows you to write A = inv(B*C+D)'. In R, the data frames are a really rich way of handling data, and the things you can do form a great working environment. For what it's worth, there are R wrappers for many languages (like Perl and Python), but once you have gotten used to the full R environment, using the engine from other languages grates.
  • by Stradenko (160417) on Monday July 19, 2010 @03:13PM (#32954488) Homepage

    JRI sounds like you want, but rJava is there when you want the reverse.
    http://rosuda.org/JRI/ [rosuda.org]
    http://rosuda.org/rJava/ [rosuda.org]

    Similar things exist for Python, Perl and probably others.

  • by khb (266593) on Monday July 19, 2010 @03:27PM (#32954730)

    Often reason people get involved in statistical analysis is there is a body of data, and no clue where to start ... as inhabitants of the information age, and cheap storage ... there's lots of material and often little clue or thought to what the stored data might mean.

    http://rattle.togaware.com/ [togaware.com] is a website dedicated to "rattle" which is an R package (and togaware has a PDF book that's a great introduction) to a GUI based datamining tool.

    Very handy, and the book is very lucid.

  • by DaVince21 (1342819) on Monday July 19, 2010 @03:52PM (#32955186) Homepage

    Google "R language", or "R code", or something similar. It's search engine searching 101.

  • by ichthyoboy (1167379) on Monday July 19, 2010 @04:39PM (#32955938)
    Or even better yet, use Rseek [rseek.org]: basically a modified Google search that looks specifically through pages on R.
  • by js_sebastian (946118) on Monday July 19, 2010 @05:03PM (#32956300)
    I don't know about java, but when I have to use a statistics library available in R, I use rpy. It's a python module that lets you automagically call r functions very easily, and directly get back python objects or R objects for further processing with R methods. Python's introspection capabilities make this sleek and transparent, I doubt a Java binding could be as cool (though if you need java, there probably are solutions).

    and honestly, i'm so glad i don't have to use R directly... TFS says it is object oriented, but as far as I can recall all the library methods i tried just returned heterogeneous matrixes, with no real user-defined types. And the function calling semantics are mind-boggling, with mixing of keyword and positional arguments leading to all sorts of weirdness...
  • by mbakunin (258573) <wcw@bignose.org> on Tuesday July 20, 2010 @11:55AM (#32965486) Homepage

    Sadly, no. As the other guys said, R does absolutely everyting you claim it doesn't. Every positional function argument is a shortcut you can call explicitly in any order. Don't put any stock into this recommendation.

    If you are working in python, have discovered that SciPy's stats functions are not ready for prime time (they aren't), and need drop-in replacements, use rpy. Otherwise, you will find it does not play very well with R. It feeds and returns objects in what I found unintuitive and unuseful ways. Yes, you can make it work, so if you're in python already, you should use it. Otherwise, learn and use R when it makes sense, which is roughly 90% of the time doing data analysis.

The biggest mistake you can make is to believe that you are working for someone else.

Working...