Skip to content

Improving error messages of Dpkg dependency parser

June 12, 2013

Hello

Config::Model::Dpkg project (a Debian source package model based on Config::Model) is partly based on a ParseRec::Descent grammar. This grammar is used to parse the dependency of a Debian source package.

This article will show how such a grammar is written, its limitation regarding error handling and how to improve the situation.

Debian package main data is described in debian/control file. This file can feature a list of dependencies, i.e. a list of package that must be installed for the package to work. These dependencies are declared in fields like “Build-Depends”, or “Depends” as a list of package. For Dpkg model purpose, I needed only to parse one item of a dependency list at a time.

This dependency item can be a simple package name:

foo

or a package name with a version requirement:

foo ( > 1.24 )

or a package name with architectures restrictions:

foo [alpha amd64 hurd-arm linux-armeb]

or both:

foo ( > 1.24 ) [alpha amd64 hurd-arm linux-armeb]

or a list of alternate choices combining the possibilities above:

foo ( > 1.24 ) | bar [ linux-any] | baz ( << 3.14 ) [ ! hurd-armel !hurd-armeb ]

or a variable that is replaced during package build:

${perl-depends}

Writing a Parse::RecDescent grammar to parse this is relatively straightforward.

The first production handles alternate dependencies separated by ‘|’ and raises an error if some text was not “consumed” by the dependencies:

dependency_item: depend(s /\|/) eofile |

A dependency as explained above is expressed as:

depend: pkg_dep | variable

A variable like ${foo} or ${bar}-1.24~ is parsed with:

variable: /\${[\w:\-]+}[\w\.\-~+]*/

This rule handles a package name with optional version or arch restriction:

pkg_dep: pkg_name dep_version(?) arch_restriction(?) 
pkg_name: /[a-z0-9][a-z0-9\+\-\.]+/

The remaining rules are quite simple:

dep_version: '(' oper version ')' 
oper: '<<' | '<=' | '=' | '>=' | '>>'
version: variable | /[\w\.\-~:+]+/

arch_restriction: '[' arch(s) ']'
arch:  /!?[\w-]+/

eofile: /^\Z/

The grammar above works well to parse the dependency. You can test it with this small Perl script:

#!/usr/bin/perl
use strict;
use warnings;
use 5.010 ;
use Parse::RecDescent ;

my $parser = Parse::RecDescent->new(join('',));
my $dep = shift ;
say "parsing '$dep'";
my $ret = $parser->dependency_item($dep) ;

say "result is ", $ret if $ref ;

__DATA__
# insert grammar here !!!

Unfortunately, any error in the optional parts (i.e version requirements and arch restriction) leads to an error message which is not very helpful. The error message only mention that some text could not be parsed:

parsing 'foo ( != 1.24 ) | bar'

       ERROR (line 1): Invalid dependency item: Was expecting /\|/ but found
                       "( != 1.24 ) | bar" instead

or

parsing 'foo [ arm & armel] | bar'

       ERROR (line 1): Invalid dependency item: Was expecting /\|/ but found
                       "[ arm & armel] | bar" instead

The problem comes from the fact that version requirements or arch restrictions are optional. For instance if a version requirement has a syntax error, Parse::RecDescent will try to parse it as an arch restriction. This arch restriction rule will also fail and the last terminal (“eofile”) will fail. So the error message does not hint at the actual syntax problem.

To generate better error messages, I improved the suggestion made in Parse::RecDescent FAQ.

Instead of calling a plain subroutine, I use a sub reference that will store the error messages in a closure. This sub ref is declared in a start-up action. Note that the sub ref explicitly returns undef. I’ll explain why later.

{
    my @dep_errors ;
    my $add_error = sub {
        my ($err, $txt) = @_ ;
        push @dep_errors, "$err: '$txt'" ;
        return ;
    } ;
}

The following production always fails while ensuring that the error list is reset. This production is always run at the beginning of the dependency parsing:

dependency: { @dep_errors = (); }

Here’s the actual “dependency” production that is run when “dependency” method is called on the parser. It will return an array ref containing (1, data) if the dependency is valid or (0, errors) otherwise:

dependency: depend(s /\|/) eofile
  {
    $return = [ 1 , @{$item[1]} ] ;
  }
  |
  {
    push( @dep_errors, "Cannot parse: '$text'" ) unless @dep_errors ;
    $return =  [ 0, @dep_errors ];
  }

The following productions don’t change much:

depend: pkg_dep | variable
variable: /\${[\w:\-]+}[\w\.\-~+]*/
pkg_dep: pkg_name dep_version(?) arch_restriction(?) 
dep_version: '(' oper version ')'

The first rule of this production parses the package name which must be followed by a space, end of string ‘(‘ or ‘[‘. A positive look-ahead assertion is used so only the package name is consumed. If the first rule fails, the second rule provides a meaningful error message. The second rule will match anything which is not a space and create an error message. Since $add_error returns undef, the second rule returns undef and the production fails. So the text stored in the error message is not consumed:

pkg_name: /[a-z0-9][a-z0-9\+\-\.]+(?=\s|\Z|\(|\[)/
    | /\S+/ { $add_error->("bad package name", $item[1]) ;}

The same trick is used with these productions:

oper: '<<' | '=' | '>>'
    | /\S+/ { $add_error->("bad dependency version operator", $item[1]) ;}

version: variable | /[\w\.\-~:+]+(?=\s|\)|\Z)/
    | /\S+/ { $add_error->("bad dependency version", $item[1]) ;}

The action of this production is a little bit more tricky. The action ensures that ‘!’ are either added before all arch or not at all. Otherwise an error message is generated and added to the list of errors:

arch_restriction: '[' osarch(s) ']'
    {
        my $mismatch = 0;
        # $ref contains ['!',os,arch] or ['',os,arch]
        my $ref = $item[2] ;
        for (my $i = 0; $i < $#$ref -1 ; $i++ ) {
            $mismatch ||= ($ref->[$i][0] xor $ref->[$i+1][0]) ;
        }
        my @a = map { ($_->[0] || '') . ($_->[1] || '') . $_->[2] } @$ref ;
        if ($mismatch) {
            $add_error->("some names are prepended with '!' while others aren't.", "@a") ;
        }
        else {
            $return = 1 ;
        }
    }

The check above is possible only if the “osarch” production returns an array ref containing something like ('!','linux','any') for “!linux-any‘ or ('','linux','any') for “linux-any“:

osarch: not(?) os(?) arch
    {
        $return =  [ $item[1][0], $item[2][0], $item[3] ];
    }
    | /.?(?=\s|\]|\Z)/ { $add_error->("bad arch specification: ", $item[1]) ; }

not: '!'

Here’s the remaining of the grammar:

os: /(any|uclibc-linux|linux|kfreebsd|knetbsd|etc...)-/
   | /\w+/ '-' { $add_error->("bad os in architecture specification", $item[1]) ;}

arch: / (any |alpha|amd64 |arm\b |arm64 |etc... )
        (?=(\]| ))
      /x
      | /\w+/ { $add_error->("bad arch in architecture specification", $item[1]) ;}

eofile: /^\Z/

That’s all for grammar 2.0

Before someone yells: “Show me the message ! “, here are some example of bad dependencies and their error message generated by the parser:

parse 'foo ( != 1.24 ) | bar'
result is: 0 bad dependency version operator: '!='

parsing 'foo [ arm & armel] | bar'
result is: 0 bad arch specification: : '&'

parsing 'foo [ arm armel ] | bar [!moo]'
result is: 0 bad arch specification: : ']' bad arch in architecture specification: 'moo'

The 2 first error messages are spot on the actual error. The second one has a false positive (‘]’ is correct) but correctly highlights the wrong arch name (‘moo’).

Mission accomplished.

In order to keep this post (relatively) simple, I’ve removed the part that actually store parsed data. They don’t really matter for error handling. Nevertheless, you may see the whole grammar in Config::Model::Dpkg::Dependency module.

All the best

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: