Skip to content

Creating a DEP-5 parser with Config::Model


Following a discussion on #debian-perl IRC channel, I’ve proposed to provide a script to parse DEP-5 files. The goal is to be able to parse *and* validate DEP-5 files. DEP-5 is a
proposal to make debian/copyright machine-interpretable. This proposal is driven by Lars Wirzenius.

With Config::Model, any evolution of DEP-5 specification will
be easy to include in the DEP-5 model read by Config::Model.

What DEP-5 model ?

To keep a long story short, let’s say that DEP-5 model is a description of DEP-5 syntax and semantic that can be used by Config::Model to perform validation. For more detail on how to create a model, please read this doc. In other word, DEP-5 model is DEP-5 document translated into a special format.

First step was to directly edit the doc and munge it into a YAML document describing the structure of DEP-5. Here’s a small extract of this YAML file (slightly edited to remove most long descriptions) :

    class_description: >
       Machine-readable debian/copyright
      name_match: X-.*
      type: leaf
      value_type: string
        mandatory: 1
        type: leaf
        value_type: uniline
        description: >-
          URI of the format specification, such as ...
        type: leaf
        value_type: uniline
        type: hash
        index_type: string
        ordered: 1
          type: node
          config_class_name: Debian::Dep5::Content

During this YAML file creation step, the problem raised by the License keyword became obvious because of License properties:

  • Limited number of valid License (no problem, let’s use an enum)
  • License names are not case sensitive (optimism goes somewhat down)
  • License names have version number and an optional ‘+’ suffix (ok, let’s use a regular expression with Config::Model’s brand new ‘match’ specification)
  • License can be combined with ‘and’ or ‘or’. (uh oh, the ‘match’ regexp will not be enough. A grammar would be better.)
  • License can specify an abbreviation or the full text of the license.

Long story short, I had to add to Config::Model the possibility to specify a Parse::RecDescent grammar (link) to validate a value. More on this later.

Of course, the first draft of a model in YAML was far from being perfect.

So the second step was to load it with config-edit-model. I had to fix a number of YAML errors and then some errors in the model description.

Then, I had to write a parser to load the DEP-5 data into Config::Model tree.  I first used Raphaël Hertzog’s Dpkg::Control::Hash module. But this one is not able to cope with repeated fields without clobbering them. So I had to provide my own parser.

The parser is divided in 2 parts:
– the parse function to load DEP-5 data in a simple data structure
– the read function to load the simple data structure into Config::Model’s configuration tree

You can view the code on Config::Model repository.

Then the model is divided in 3 configuration classes:

  • Debian::Dep5: the root class
  • Debian::Dep5::Content: To represent Files specification
  • Debian::Dep5::License

And the full model of Dep-5 can be also read on Config::Model repository:

(The model are biggish because they include help text taken from DEP-5 documentation)

Now some explanation is required on how is performed the License validation.

The trick is that each License used must be listed in Dep5’s license parameter which is specified this way (in YAML syntax):

 allow_keys_matching: '^(?i:Apache|Artistic|BSD|FreeBSD|etc...|other)[\d\.\-]*\+?$'
 type: leaf
 value_type: string
 index_type: string
 type: hash

So this License element contains the list of licenses with their full text and only a limited number of Licenses are accepted.

Now, let’s explain how the files are tied to the declared licenses. The Debian::Dep5::Content class had a License element that represent the relation between the files and the License(s). This bond is represented by Debian::Dep5::License class. The ‘abbrev’ and ‘full_license’ are fairly obvious.

The ‘abbrev’ parameter is another matter. Here’s its declaration (minus the description and help):

 type: leaf
 value_type: uniline
 default: other

So far, so good. Now the meaty part: the validation requirement based on Parse::RecDescent

 grammar: "license (oper license)(s?)
          oper: 'and' | 'or'
          license: /[\\w\\-\\.\\+]+/i\n
             { # PRD action to check if the license text is provided
               $return = $arg[0]->grab('! License')->defined($item[0]);
             } "

This grammar specifies:

  • The syntax of the License line itself (hence something like “Perl or GPL”)
  • An action performed when the grammar is match. Using Config::Model API, this Perl snippet checks that the License abbreviation has a corresponding License declared in Debian::Dep5 License hash

This way, an abbreviation cannot be used without a proper License statement.

This code is the first stab. Some more work is to be done:

  • Implement exception parsing
  • Provided DEP-5 writer
  • Provide config-edit-dep5 cli

As always feedback are more than welcome.

All the best

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: