drupal code builder

Refactoring with Rector

Rector is a tool for making changes to PHP code, which powers tools that assist with upgrading deprecated code in Drupal. When I recently made some refactoring changes in Drupal Code Builder, which were too complex to do with search and replace regexes, it seemed like a good opportunity to experiment with Rector, and learn a bit more about it.

Besides, I'm an inveterate condiment-passer: I tend to prefer spending longer on a generalisation of a problem than on the problem itself, and the more dull the problem and the more interesting the generalisation, the more the probability increases.

So faced with a refactoring from this return from the getFileInfo() method:

    return [
      'path' => '',
      'filename' => $this->component_data['filename'],
      'body' => $body,
      'merged' =>$merged,
    ];

to this:

    return new CodeFile(
      body_pieces: $body,
      merged: $merged,
    );

which was going to be tedious as hell to do in a ton of files, obviously, I elected to spend time fiddling with Rector.

The first thing I'll say is that the same sort of approach as I use with migrations works well: work with a small indicative sample, and iterate small changes. With a migration, I will find a small number of source rows which represent different variations (or if there is too much variation, I'll iterate the iteration multiple times). I'll run the migration with just those sources, examine the result, make refinements to the migration, then roll back and repeat.

With Rector, you can specify just a single class in the code that registers the rule to RectorConfig in the rector.php file, so I picked a class which had very little code, as the dump() output of an entire PHP file's PhpParser analysis is enormous.

You then use the rule class's getNodeTypes() method to declare which node types you are interested in. Here, I made a mis-step at first. I wanted to replace Array_ nodes, but only in the getFileInfo() method. So in my first attempt, I specified ClassMethod nodes, and then in refactor() I wrote code to drill down into them to get the array Array_ nodes. This went well until I tried returning a new replacement node, and then Rector complained, and I realised the obvious thing I'd skipped over: the refactor() method expects you to return a node to replace the found node. So my approach was completely wrong.

I rewrote getNodeTypes() to search for Array_ nodes: those represent the creation of an array value. This felt more dangerous: arrays are defined in lots of places in my code! And I haven't been able to see a way to determine the parentage of a node: there do not appear to be pointers that go back up the PhpParser syntax tree (it would be handy, but would make the dump() output even worse to read!). Fortunately, the combination of array keys was unique in DrupalCodeBuilder, or at least I hoped it was fairly unique. So I wrote code to get a list of the array's keys, and then compare it to what was expected:

        foreach ($node->items as $item) {
            $seen_array_keys[] = $item->key->value;
        }
        if (array_intersect(static::EXPECTED_MINIMUM_ARRAY_KEYS, $seen_array_keys) != static::EXPECTED_MINIMUM_ARRAY_KEYS) {
            return NULL;
        }

Returning NULL from refactor() means we aren't interested in this node and don't want to change it.

With the arrays that made it through the filter, I needed to make a new node that's a class instantiation, to replace the array, passing the same values to the new statement as the array keys (mostly).

Rector's list of commonly used PhpParser nodes was really useful here.

A new statement node is made thus:

use PhpParser\Node\Name;
use PhpParser\Node\Expr\New_;

        $class = new Name('\DrupalCodeBuilder\File\CodeFile');
        return new New_($class);

This doesn't have any parameters yet, but running Rector on this with my sample set showed me it was working properly. Rector has a dry run option for development, which shows you what would change but doesn't write anything to files, so you can run it over and over again. What's confusing is that it also has a cache; until I worked this out I was repeatedly confused by some runs having no effect and no output. I have honestly no idea what the point is of caching something that's designed to make changes, but there is an option to disable it. So the command to run is: $ vendor/bin/rector --dry-run --clear-cache. Over and over again.

Once that worked, I needed to convert array items to constructor parameters. Fortunately, the value from the array items work for parameters too:

use PhpParser\Node\Arg;

        foreach ($node->items as $array_item) {
                $construct_argument = new Arg(
                   $array_item->value,
                );

That gave me the values. But I wanted named parameters for my constructor, partly because they're cool and mostly because the CodeFile class's __construct() has optional parameters, and using names makes that simpler.

Inspecting the Arg class's own constructor showed how to do this:

use PhpParser\Node\Arg;
use PhpParser\Node\Identifier;

                $construct_argument = new Arg(
                    value: $array_item->value,
                    name: new Identifier($key),
                );

Using named parameters here too to make the code clearer to read!

It's also possible to copy over any inline comments that are above one node to a new node:

            // Preserve comments.
            $construct_argument->setAttribute('comments', $array_item->getComments());

The constructor parameters are passed as a parameter to the New_ class:

        return new New_($class, $new_call_args);

Once this was all working, I decided to do some more refactoring in the CodeFile class in DrupalCodeBuilder. The changes I was making with Rector made it more apparent that in a lot of cases, I was passing empty values. Also, the $body parameter wasn't well-named, as it's an array of pieces, so could do with a more descriptive name such as $body_pieces.

Changes like this are really easy to do (though by this point, I had made a git repository for my Rector rule, so I could make further enhancements without breaking what I'd got working already).

        foreach ($node->items as $array_item) {
            $key = $array_item->key->value;

            // Rename the 'body' key.
            if ($key == 'body') {
                $key = 'body_pieces';
            }

And that's my Rector rule done.

Although it's taken me far more time than changing each file by hand, it's been far more interesting, and I've learned a lot about how Rector works, which will be useful to me in the future. I can definitely see how it's a very useful tool even for refactoring a small codebase such as DrupalCodeBuilder, where a rule is only going to be used once. It might even prompt me to undertake some minor refactoring tasks I've been putting off because of how tedious they'll be.

What I've not figured out is how to extract namespaces from full class names to an import statement, or how to put line breaks in the new statement. I'm hoping that a pass through with PHP_CodeSniffer and Drupal Coder's rules will fix those. If not, there's always good old regexes!

Regenerating plugin dependency injection with Module Builder

Dependency injection is a pattern that adds a lot of boilerplate code, but Drupal Code Builder makes it easy to add injected services to plugins, forms, and service classes.

Now that the Drupal 8 version of Module Builder (the Drupal front-end to the Drupal Code Builder library) uses an autocomplete for service names in the edit form, adding injected services is even easier, and any of the hundreds of services in your site’s codebase (443 on my local sandbox Drupal 8 site!) can be injected.

I often used this when I want to add a service to an existing plugin: re-generate the code, and copy-paste the new code I need.

This is an area in which Module Builder now outshines its Drush counterpart, because unlike the Drush front end for Drupal Code Builder, which generates code with input parameters every time, Module Builder lets you save your settings for the generated module (as a config entity).

So you can return to the plugin you generated to start with, add an extra service to it, and generate the code again. You can copy and paste, or have Module Builder write the file and then use git to revert custom code it’s removed. (The ability to insert generated code into existing files is on my list of desirable features, but is realistically a long way off, as it would be rather complex, a require the addition of a code parsing library.)

But why stop at generating code for your own modules? I recently filed an issue on Search API, suggesting that its plugins could do with tweaking to follow the standard core pattern of a static factory method and constructor, rather than rely on setters for injection. It’s not a complex change, but a lot of code churn. Then it occurred to me: Drupal Code Builder can generate that boilerplate code: simply create a module in Module Builder called ‘search_api’, and then add a plugin with the name of one that is already in Search API, and then set its injected services to the services the real plugin needs.

Drupal Code Builder already knows how to build a Search API plugin: its code analysis detects the right plugin base class and annotation to use, and also any parameters that the constructor method should pass up to the base class.

So it’s pretty quick to copy the plugin name and service names from Search API’s plugin class to the form in Module Builder, and then save and generate the code, and then copy the generated factory methods back to Search API to make a patch.

I’m now rather glad I decided to use config entities for generated entities. Originally, I did that just because it was a quick and convenient way to get storage for serialized data (and since then I discovered in other work that map fields are broken in D8 so I’m very glad I didn’t try to make then content entities!). But the ability to save the generating settings for a module, and then return to it to add to them has proved very useful.

Drupal Code Builder unit testing: bringing in the heavy stuff

I started adding unit tests to Drupal Code Builder about 3 years ago, and I’ve been gradually expanding on them ever since. It’s made refactoring the code a pleasant rather than stressful experience.

However, while all generator types are covered, the level of detail the tests go into isn’t that deep. Back when I wrote the tests, they mostly needed to check for hook implementations that were generated, and so quick and dirty regexes on the generated code did the job: match 'mymodule_form_alter' in the generated code, basically. I gradually extended those to cover things like class declarations and methods, but that approach is very much cracking at the seams.

So it’s time to switch to something more powerful, and more suited to the task.

I’ve already removed my frankly hideous attempt at verifying that generated code is correctly-formed PHP, replacing it with a call to PHP’s own code linter. My own code was running the generated PHP code files through eval() (yes, I know!) to check nothing crashed, which was quick and worked but only up to a point: tests couldn’t create the same function twice, as eval()ing code that contains a function declaration brings it into the global namespace, and it didn’t work at all for classes where while tests were being run, as the parent classes in Drupal core or contrib aren't present.

It's already proved worthwhile, as once I'd converted the tests, I found an error in the generated code: a stray quote mark in base field definitions for a content entity, which my approach wasn't picking up, and never would have.

The second phase is going to be to use PHPCS and Drupal Coder to check that generated code follows Drupal Coding Standards. I'm currently getting that to work in my testing base class, although it might be a while before I push it, as I suspect it's going to complain about quite a few nipicks in the generated code that I'll then have to spend some time fixing.

The third phase (this is a 3-phrase programme, all the best ones are) is going to be to look into using PHP-Parser to check for the presence of functions and methods in the code, rather than my regex-based approach. This is going to allow for much more thorough checking of the generated code, with things such as method order in the code, service injection, and more.

After that, it'll be back to some more refactoring and code clean-up, and then some more code generators! I have a few ideas of what else Drupal Code Builder could generate, but more ideas are welcome in the issue queue on github.

Subscribe to RSS - drupal code builder