Stories
Slash Boxes
Comments

News for nerds, stuff that matters

XML and Perl

Posted by timothy on Thu Jan 30, 2003 11:30 AM
from the texty-bits dept.
davorg writes "One of Perl's great strengths is in processing text files. That is, after all, why it became so popular for generating dynamic web pages -- web pages are just text (albeit text that is supposed to follow particular rules). As XML is just another text format, it follows that Perl will be just as good at processing XML documents. It's therefore surprising that using Perl for XML processing hasn't received much attention until recently. That's not saying that there hasn't been work going on in that area -- many of the Perl XML processing modules have long and honourable histories -- it's just that the world outside of the Perl community doesn't seem to have taken much notice of this work. This is all set to change with the publication of this book and O'Reilly's Perl and XML." Read on to see how well Davorg thinks this book introduces XML text processing with Perl to the wider world.
XML and Perl
author Mark Riehl, Ilya Sterin
pages 378
publisher New Rider
rating 8
reviewer Davorg
ISBN 0735712891
summary Good introduction to processing XML with Perl

XML and Perl is written by two well-known members of the Perl XML community. Both are frequent contributors to the "perl-xml" mailing list, so there's certainly no doubt that they know what they are talking about. Which is always a good thing in a technical book.

The book is made up of five sections. The first section has a couple of chapters which introduce you to the concepts covered in the book. Chapter one introduces you separately to XML and Perl and then chapter two takes a first look at how you can use Perl to process XML. This chapter finishes with two example programs for parsing simple XML documents.

Section two goes into a lot more detail about parsing XML documents with Perl. Chapter three looks at event-driven parsing using XML::Parser and XML::Parser::PerlSAX to demonstrate to build example programs before going to talk in some detail about XML::SAX which is currently the state of the art in event-driven XML parsing in Perl. It also looks at XML::Xerces which is a Perl interface to the Apache Software Foundation's Xerces parser. Chapter four covers tree based XML parsing and presents examples using XML::Simple, XML::Twig, XML::DOM and XML::LibXML. In both of these chapters the pros and cons of each of the modules are discussed in detail so that you can easily decide which solution to use in any given situation.

Section three covers generating XML documents. In chapter five we look at generating XML from text sources using simple print statements and also the modules XML::Writer and XML::Handler::YAWriter. Chapter six looks at taking data from a database and turning that into XML using modules like XML::Generator::DBI and XML::DBMS. Chapter seven looks at miscellaneous other input formats and contains examples using XML::SAXDriver::CSV and XML::SAXDriver::Excel.

Section four covers more advanced topics. Chapter eight is about XML transformations and filtering. This chapter covers using XSLT to transform XML documents. It covers the modules XML::LibXSLT, XML::Sabletron and XML::XPath.

Chapter nine goes into detail about Matt Sergeant's AxKit, the Apache XML Kit which allows you to create a website in XML and automatically deliver it to your visitors in the correct format.

Chapter ten rounds off the book with a look at using Perl to create web services. It looks at the two most common modules for creating web services in Perl - XML::RPC and SOAP::Lite.

Finally, section five contains the appendices which provide more background on the introductions to XML and Perl from chapter one.

There was one small point that I found a little annoying when reading the book: Each example was accompanied with a sample of the XML documents to be processed together with both a DTD and an XML Schema definition for the document. This seemed to me to be overkill. Did we really need both DTDs and XML Schemas for every example. I would have found it less distracting if one (or even both) of these had been moved to an appendix.

That small complaint aside, I found it a useful and interesting book. It will be very useful to Perl programmers (like myself) who will increasingly be expected to process (and provide) data in XML formats.


You can purchase XML and Perl from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • Nice (Score:2)

    by Gortbusters.org (637314) on Thursday January 30 2003, @11:46AM (#5189662) Homepage Journal
    Though the reviewer didn't think so, I like it when DTD and XML Schema examples are side by side. Having looked at DTD's for quite some time now, have to change gears to the new standard of using XML schemas.

    Would be nice to have a book with more than just one chapter on web services. There are a plethura of Java/C# web services books out there, but it's hard to find one on there just for Perl, PHP, etc.
  • I'd buy it ... (Score:1, Funny)

    by B3ryllium (571199) on Thursday January 30 2003, @11:46AM (#5189663) Homepage
    ... but I thought Perl was a write-only language? How can I be expected to read the book, if it's just gibberish like Perl? Geez. :) (Okay, fine - I admit it - I kinda like Perl. But that's another story.)
  • XML is NOT just text! (Score:5, Insightful)

    by Anonymous Coward on Thursday January 30 2003, @11:51AM (#5189685)
    The whole point of XML is that it is NOT just a string of text. That's why Perl isn't particularly any better than Java or C++ or VB or whatever for processing XML - you're going to be using a library that gives you SAX or DOM access to your XML, and you'll never need to know that there's a text representation being serialized onto some wires somewhere.
  • Natural? (Score:1, Redundant)

    by CaseyB (1105) on Thursday January 30 2003, @11:55AM (#5189708)
    As XML is just another text format, it follows that Perl will be just as good at processing XML documents.

    Not really. If you're using XML as "just another text format", then you're making a funamental mistake. Within your software, you should always be treating XML as a hierarchical data structure, not as a text stream. Apart from manipulating CDATA or attribute value text, Perl has no particular strength with XML.

    • Re:Natural? by mortonda (Score:3) Thursday January 30 2003, @12:05PM
  • Petal (Score:4, Informative)

    by Chris Croome (24340) on Thursday January 30 2003, @12:00PM (#5189727) Journal

    One new, and cool, Perl XML module that people might not know about is Petal [cpan.org] (PErl Template Attribute Language).

    It is an implementation of the Zope TAL (Template Attribute Language) specification [zope.org] and it basically allows you to create XML templates where all the templating commands are just attributes of existing tags.

    This allows things like XHTML templates which are very WYSIWYG friendly since the editors don't do anything with attributes that they don't know about.

  • This was a review? (Score:4, Insightful)

    by Syris (129850) on Thursday January 30 2003, @12:03PM (#5189749)
    I'm sorry, but this just wasn't a terribly deep review and well below par for /. Listing contents of a book and then nitpicking a detail don't a book review make.


    How effective were the examples? How easy to read and understand were the general concepts? Were the descriptions of libraries and API's clear? Was the writing generally readable?


    Would this book even make a good reference?


    Jeez, anyone want to follow up the post with a real review?

  • XML frees us from Perl (Score:5, Interesting)

    by Euphonious Coward (189818) on Thursday January 30 2003, @12:07PM (#5189774)
    The whole point of XML is to free us from having to do the kinds of things Perl is meant for. Absent free-form text munging, Perl really has no advantage over other languages. At the same time, it has real deficits for people who need to know they have solved a problem correctly and completely.

    (For reference, see this rant [underlevel.net] by the brilliant net.kook Erik Naggum. The most quotable bit, for the lazy among you, is

    ...[Perl] rewards idiotic behavior in a way that no other language or tool has ever done, and on top of it, it punishes conscientiousness and quality craftsmanship -- put simply: you can commit any dirty hack in a few minutes in perl, but you can't write an elegant, maintainabale program that becomes an asset to both you and your employer; you can make something work, but you can't really figure out its complete set of failure modes and conditions of failure. (how do you tell when a regexp has a false positive match?)
    )

    • Re:XML frees us from Perl by Slugbait (Score:1) Thursday January 30 2003, @12:22PM
    • Perl is a reflection of your soul (Score:4, Interesting)

      by Nexus7 (2919) on Thursday January 30 2003, @12:27PM (#5189864)
      Well, perhaps not your soul, but your Perll code just reflects the way you think to a greater extent than other languages. This isn't something that's done underhandedly, it is well advertised in every posting in c.l.perl and the Camel book, and every other book about Perl. Which is that Perl is not at all orthogonal, TMTOWDI (there's more than one way to do it). If you want to be rigorous and declare everything and not have your typos become references automatically, you "use strict" and your magic line is "#!/usr/bin/perl -w". If not, well Perl allows you to do that too. If you want objects, you can do that, if not, not.

      If is possible to write quality code in Perl Just because the language allows you to not do so isn't its fault. It doesn't stop you from doing it, because that'd stop you from doing brilliant things.

      To address some specific things you mentioned, you can do full-fledged exception handling in Perl if you want to (with eval and specific modules), or, you know, not. And I'm not familiar with the false positive matches in regexps (perhaps you're referring to some famous problem). But if a regexp doesn't do what you want it to, isn't is wrong? Between // and tr and split I get along just fine.
      [ Parent ]
    • Re:XML frees us from Perl (Score:5, Insightful)

      by glwtta (532858) on Thursday January 30 2003, @12:45PM (#5189992) Homepage
      how do you tell when a regexp has a false positive match?

      A what? You (or rather the brilliant person being quoted) either mean that it matches a string that the expression isn't supposed to, which would be a serious bug in the language (and I am not aware of any such bugs); or you mean that it matches correctly, but matches things you didn't expect it to, in which case you tell, by (gasp!) testing your code. In any case, how do you tell a "false positive" regexp match in Java?

      but you can't write an elegant, maintainabale program that becomes an asset to both you and your employer

      Perhaps you can't. I have, and I do.

      [ Parent ]
    • Re:XML frees us from Perl by Anonymous Coward (Score:1) Thursday January 30 2003, @12:52PM
    • Re:XML frees us from Perl by scrytch (Score:3) Thursday January 30 2003, @02:57PM
    • Re:XML frees us from Perl by Internet Dog (Score:1) Thursday January 30 2003, @03:29PM
    • Re:XML frees us from Perl by Mark_Uplanguage (Score:1) Thursday January 30 2003, @04:40PM
    • 3 replies beneath your current threshold.
  • by Chocolate Teapot (639869) on Thursday January 30 2003, @12:11PM (#5189789) Journal
    Although I agree that Perl/XML sounds like a powerful and flexible way to serve dynamic content, I can't help thinking that it is ultimately better to adapt existing frameworks (Slashcode, PHP-Nuke & friends etc..). Maybe a friendly group of Perl/XML gods will read the book and produce a framework/toolkit that the rest of us mere mortals can use. I suspect that I will buy this book anyway, read it, and after frying my brain for a few days I will stuff it on my bookshelf and walk away with a huge inferiority complex. My bookshelf makes me look like a guru, but secretly, my encyclopaedic knowledge comes from here [bathroomreader.com].
  • i hate perl... (Score:2)

    by cygnus (17101) on Thursday January 30 2003, @12:12PM (#5189792) Homepage
    and i know there are going to be a lot of posts saying "XML obviates Perl!"...

    but i disagree. Perl absoulely RIPS through this stuff, unlike the Java stuff i've written. sometimes, there's nothing like some good, old-fashioned procedural code to munge one document into another.

    the only problem i had was with UTF-8 stuff. perl really wasn't quite there until perl 5.8, and i'm having trouble finding installs of it on the machines i need to use it on at the university i work for.
  • by nathanz (555048) on Thursday January 30 2003, @12:23PM (#5189844)
    I think one of the main reasons Perl and XML aren't generally used together is because Perl isn't object oriented in the same way the Java and C# are. I know that OO concepts have been bolted on to Perl in the same way the OO was bolted on to C++ and in my opinion with similar results (i.e., kludge-fest). It's very natual in Java to parse an XML doc and get an object, while it's more natural to parse a log file or CSV file with Perl.
  • by Cy Guy (56083) on Thursday January 30 2003, @12:30PM (#5189879) Homepage Journal
    Then maybe you should get it from Amazon [amazon.com], where it is $12 cheaper.

    Please Rob, explain to us how whatever deal you have with bn.com is worth your user base overpaying by so much? Users can buy the book through the link above, and I will put a third of my affiliate commission (about $1.40 per copy) towards Perl development projects [affero.net]. This way everybody wins. Using your link, I assume you win, and that bn wins, but your loyal user base is out an additional $12 and I can't imagine your deal with bn.com nets you that much for providing the link.

  • So, where's the review? (Score:4, Insightful)

    by mattdm (1931) on Thursday January 30 2003, @12:45PM (#5189994) Homepage
    I see the table of contents explained in paragraph form. And then one complaint about the organization of the book. And then I expect to read the review, but it's already on to "you can buy this book here", and user comments.

    I know complaining about slashdot stories is like shooting those proverbial barreled fish, but sheesh.
  • XML::Simple (Score:2, Interesting)

    by Anonymous Coward on Thursday January 30 2003, @01:37PM (#5190271)
    I'm seeing a lot of comments that perl doesn't have any particular strengths when dealing with XML. A good module people should check out is XML::Simple. Basically, it automagically turns XML into a nested data structure, and automagically turns a nested data structure into XML. The great thing about it you just make a single API call, and just directly access the data from there without having to learn anything more complicated. Definitely not an end-all solution, but definitely handles the common case wonderfully, and has quite a few handy options to allow more fine tuned control.
  • by davids-world.com (551216) on Thursday January 30 2003, @02:06PM (#5190418) Homepage
    XML is NOT just a text file (just because we can read it with a simple "more hello.xml"). Perl is good at processing text, because it knows regular expressions and some extensions to them. However, an XML DTD (or a Schema) defines a context-free grammar, which make a language class above the regular languges. That's why we can't fully parse XML files with Perl's RE. A good example would be nested tags that result from recursive grammar rules in the DTD. These cannot be parsed without some serious geekism in Perl RE. However, I love to write those little tools that operate on XML data in Perl. Very often, you can work with regular expressions on context-free/sensitive language data!
  • by HealYourChurchWebSit (615198) on Thursday January 30 2003, @02:20PM (#5190505) Homepage


    The reviewer is correct, Perl is a good tool for slamming and jammin' text, including XML. What I'm not so sure of is the quote "It's therefore surprising that using Perl for XML processing hasn't received much attention until recently."

    I mean one need only scroll down the extensive list of CPAN Modules [cpan.org] to see well over 50, as well as many sites/authors devoting [cpan.org] time, energy and resource.

    Similarly, I would point out some press modules supporting web services via XML, such as SOAP::Lite as far back as 02/26/01 [netscape.com] and XML-RPC also in '01 [sourceforge.net] -- or O'Reilly's own XML.com with articles such as "Processing XML with Perl [xml.com]" written shortly after the turn of the millenium.

    Point is, though I personally love Perl, blatant plugs such as "... it's just that the world outside of the Perl community doesn't seem to have taken much notice of this work. This is all set to change with the publication of this book and O'Reilly's Perl and XML." " don't inspire confidence in the reviewer's objectivity.

  • Axkit, perl & XML so happy together (Score:2, Informative)

    by porter235 (413926) on Thursday January 30 2003, @02:33PM (#5190572)
    check it out. http://axkit.org/ [axkit.org]

    "Apache AxKit is an XML Application Server for Apache. It provides on-the-fly conversion from XML to any format, such as HTML, WAP or text using either W3C standard techniques, or flexible custom code. AxKit also uses a built-in Perl interpreter to provide some amazingly powerful techniques for XML transformation."

    picture coccoon for perl. using perl for xsp pages and doing pipline transformations on xml. great stuff.
  • use AxKit! (Score:1)

    by sbwoodside (134679) <sbwoodside@yahoo.com> on Thursday January 30 2003, @03:18PM (#5190744) Homepage
    Use AxKit! You're selling yourself short if you start to develop a site without it. It's just the ideal way to get the whole separation of content and presentation thing that XML is supposed to be all about. It makes it dirt easy to store your content in XML, use XSLT for transformations and XSP for dynamic back-end processing. Check it out [axkit.org]!

    Also read this [monasticxml.org]

    simon
  • by mackman (19286) on Thursday January 30 2003, @03:24PM (#5190796)
    Perl's strength is text processing was its ability to work with (read and generate) poorly structured data. XML makes it easy to create well structured data thus writing document processing code in languages like C++ is easier. People who don't know Perl, or people who learned other XML toolkits first, have less reason to learn XML with Perl.
  • by boatboy (549643) on Thursday January 30 2003, @05:22PM (#5191875) Homepage
    That Perl was geared toward text proccessing has been an obstacle to XML support in my admittedly limited experience. We're trying to interface with a 3rd party system that claims to use XML for data interchange. But because their programmers are used to traditional text-proccessing, their XML support is _very_ kludgy. Stupid things like requiring line feeds after each element, etc.
  • by Animats (122034) on Friday January 31 2003, @03:04AM (#5194816) Homepage
    Actually, Perl is mediocre at processing XML/HTML/SGML. Ever write a lex-type state machine parser in Perl? You can do it, but it's not as easy as it should be. "Get next character from string" is slow and/or clunky in Perl. (If strings are long, removing the first character is expensive. And you can't just subscript your way through a string. So you need to manage a small working buffer explicitly, something you shouldn't have to do in a language like Perl.) Perl does tree structures of objects, but Perl 5 objects aren't all that fast. Parsers in Perl tend to either have C components (creating a portability problem) or are slow. This is a lack.

    You can write such parsers as regular expressions, but that makes them even slower.

    Despite this, I parse millions of lines of SGML/HTML/XML into trees of HTML::Element, using only Perl. But it's clunkier than it should be.

  • by zapfie (560589) on Thursday January 30 2003, @11:42AM (#5189644)
    Perl is a markup language?
    [ Parent ]
    • 1 reply beneath your current threshold.
  • by GombuMstr (532073) on Thursday January 30 2003, @12:26PM (#5189857)
    Uniquely enough our data processing that has nothing to do with the web is heavily constructed with perl. We love the flexibility of it. It doesn't take to long for a new person to figure out how our daily processing works.

    In fact I have been looking into perl-xml for processing of scalc spreadsheets that our stores send to us every day. It has been a valuable tool and we would be up a creek with Windows tools trying to do the exact same thing.

    --Travis
    [ Parent ]
  • by Mr. Droopy Drawers (215436) on Thursday January 30 2003, @12:38PM (#5189947)
    As you're also aware, most Comp Sci courses fawn over Pascal, a VERY formalized language. However, it's not mentioned much past education circles (and Apple afficionados).

    In practice, reference counting doesn't seem to lead to memory leaks as you describe. And, I would argue it is much more efficient than Java's method.

    PERL is an excellent SCRIPTING language. Larry Wall describes it as a "glue" language. XML is a good thing to glue together. It's perfect for that. Every tool has its purpose; push any too far, and you start abusing it.

    Trying to find the quote from Larry Wall. I think it goes something like this: "Perl did easy things easily and made impossible things doable."
    [ Parent ]
  • by sheriff_p (138609) on Thursday January 30 2003, @12:45PM (#5189984)
    Ah no, see, you forgot to read the first line:

    "One of Perl's great strengths is in processing text files."

    Perl is good at handling text files. XML is a text file. Therefore, Perl is good at handling XML.

    As opposed to:

    My pasta maker is good at making pasta. Pasta is a type of food. Ice-cream is also food. Therefore, my pasta maker is good at making ice-cream.

    Does that help?
    [ Parent ]
  • by IpalindromeI (515070) on Thursday January 30 2003, @01:53PM (#5190351) Journal
    Except that your syllogism is faulty, whereas his is not.

    His:
    1. (from earlier in his post) Perl is well suited for processing all text formats.
    2. XML is a text format.
    3. Therefore, Perl is well suited for processing XML.

    Yours:
    1. Your pasta maker is good at making pasta.
    2. Pasta is a type of food.
    3. Therefore, your pasta maker is good at making all types of food (for example, ice cream).

    You can see that he went from general to specific, whereas you went from specific to general. He argues that being able to do all things in a given set (process all text formats) gives the ability to do one of the things in that set (process a particular text format). You argue that being able to do one thing in a set (make a particular food) gives the ability to do all things in the set (make all foods).

    You could save your argument by changing your middle point to be "All foods are a type of pasta," and then your conclusion becomes trivially true. But you'd also have to get everyone to agree that ice cream is pasta.
    [ Parent ]
  • by Golias (176380) on Thursday January 30 2003, @05:08PM (#5191749)
    As XML is just another text format, it follows that Perl will be just as good at processing XML documents.

    Since my pasta maker is good at making pasta, and ice cream and pasta are both foods, it follows my pasta maker will be just as good at making ice cream.

    That only correlates if ice cream is a type of pasta, because XML is a text format.

    This is a lot more like saying "since my pasta maker is good at making Ziti, Rigate, Macaroni, etc., all pastas really, and Spaghetti is a type of pasta, my pasta maker should be good at making Spaghetti.

    [ Parent ]
  • by owlstead (636356) on Thursday January 30 2003, @08:35PM (#5193136)
    XML itself is indeed a simple vehicle for storing data (the data itself can be quite complex, since you can put in anything you like). Obviously XML will not replace an RDBMS for storing and looking up data, and it does not need to.

    Though XML itself may look easy, I can asure you that the technical incompetent won't like the standards written around XML a bit. Schema's and XSLT take a while to get used to.

    Furthermore, you do not have to write an application to parse XML at all. It has been done already. You will be presented with the DOM or with SAX. With the DOM you get a pre-parsed tree structure and with SAX you will be called back if it has found your data. 95% of the people in these discussions will know this.

    The only conclusion I can draw from your writing is that you are as deep in XML as the writer of the original article: not at all. You see XML as just a text-file with some data in it. Other /. articles have already explained why this isn't so.

    Warper

    can anybody rewrite _all_ the linux configuration files to xml please? before lunch?
    [ Parent ]
  • 20 replies beneath your current threshold.