Refactoring with Rector

Rector is a tool for making changes to PHP code, which powers tools that assist with upgrading deprecated code in Drupal. When I recently made some refactoring changes in Drupal Code Builder, which were too complex to do with search and replace regexes, it seemed like a good opportunity to experiment with Rector, and learn a bit more about it.

Besides, I'm an inveterate condiment-passer: I tend to prefer spending longer on a generalisation of a problem than on the problem itself, and the more dull the problem and the more interesting the generalisation, the more the probability increases.

So faced with a refactoring from this return from the getFileInfo() method:

    return [
      'path' => '',
      'filename' => $this->component_data['filename'],
      'body' => $body,
      'merged' =>$merged,
    ];

to this:

    return new CodeFile(
      body_pieces: $body,
      merged: $merged,
    );

which was going to be tedious as hell to do in a ton of files, obviously, I elected to spend time fiddling with Rector.

The first thing I'll say is that the same sort of approach as I use with migrations works well: work with a small indicative sample, and iterate small changes. With a migration, I will find a small number of source rows which represent different variations (or if there is too much variation, I'll iterate the iteration multiple times). I'll run the migration with just those sources, examine the result, make refinements to the migration, then roll back and repeat.

With Rector, you can specify just a single class in the code that registers the rule to RectorConfig in the rector.php file, so I picked a class which had very little code, as the dump() output of an entire PHP file's PhpParser analysis is enormous.

You then use the rule class's getNodeTypes() method to declare which node types you are interested in. Here, I made a mis-step at first. I wanted to replace Array_ nodes, but only in the getFileInfo() method. So in my first attempt, I specified ClassMethod nodes, and then in refactor() I wrote code to drill down into them to get the array Array_ nodes. This went well until I tried returning a new replacement node, and then Rector complained, and I realised the obvious thing I'd skipped over: the refactor() method expects you to return a node to replace the found node. So my approach was completely wrong.

I rewrote getNodeTypes() to search for Array_ nodes: those represent the creation of an array value. This felt more dangerous: arrays are defined in lots of places in my code! And I haven't been able to see a way to determine the parentage of a node: there do not appear to be pointers that go back up the PhpParser syntax tree (it would be handy, but would make the dump() output even worse to read!). Fortunately, the combination of array keys was unique in DrupalCodeBuilder, or at least I hoped it was fairly unique. So I wrote code to get a list of the array's keys, and then compare it to what was expected:

        foreach ($node->items as $item) {
            $seen_array_keys[] = $item->key->value;
        }
        if (array_intersect(static::EXPECTED_MINIMUM_ARRAY_KEYS, $seen_array_keys) != static::EXPECTED_MINIMUM_ARRAY_KEYS) {
            return NULL;
        }

Returning NULL from refactor() means we aren't interested in this node and don't want to change it.

With the arrays that made it through the filter, I needed to make a new node that's a class instantiation, to replace the array, passing the same values to the new statement as the array keys (mostly).

Rector's list of commonly used PhpParser nodes was really useful here.

A new statement node is made thus:

use PhpParser\Node\Name;
use PhpParser\Node\Expr\New_;

        $class = new Name('\DrupalCodeBuilder\File\CodeFile');
        return new New_($class);

This doesn't have any parameters yet, but running Rector on this with my sample set showed me it was working properly. Rector has a dry run option for development, which shows you what would change but doesn't write anything to files, so you can run it over and over again. What's confusing is that it also has a cache; until I worked this out I was repeatedly confused by some runs having no effect and no output. I have honestly no idea what the point is of caching something that's designed to make changes, but there is an option to disable it. So the command to run is: $ vendor/bin/rector --dry-run --clear-cache. Over and over again.

Once that worked, I needed to convert array items to constructor parameters. Fortunately, the value from the array items work for parameters too:

use PhpParser\Node\Arg;

        foreach ($node->items as $array_item) {
                $construct_argument = new Arg(
                   $array_item->value,
                );

That gave me the values. But I wanted named parameters for my constructor, partly because they're cool and mostly because the CodeFile class's __construct() has optional parameters, and using names makes that simpler.

Inspecting the Arg class's own constructor showed how to do this:

use PhpParser\Node\Arg;
use PhpParser\Node\Identifier;

                $construct_argument = new Arg(
                    value: $array_item->value,
                    name: new Identifier($key),
                );

Using named parameters here too to make the code clearer to read!

It's also possible to copy over any inline comments that are above one node to a new node:

            // Preserve comments.
            $construct_argument->setAttribute('comments', $array_item->getComments());

The constructor parameters are passed as a parameter to the New_ class:

        return new New_($class, $new_call_args);

Once this was all working, I decided to do some more refactoring in the CodeFile class in DrupalCodeBuilder. The changes I was making with Rector made it more apparent that in a lot of cases, I was passing empty values. Also, the $body parameter wasn't well-named, as it's an array of pieces, so could do with a more descriptive name such as $body_pieces.

Changes like this are really easy to do (though by this point, I had made a git repository for my Rector rule, so I could make further enhancements without breaking what I'd got working already).

        foreach ($node->items as $array_item) {
            $key = $array_item->key->value;

            // Rename the 'body' key.
            if ($key == 'body') {
                $key = 'body_pieces';
            }

And that's my Rector rule done.

Although it's taken me far more time than changing each file by hand, it's been far more interesting, and I've learned a lot about how Rector works, which will be useful to me in the future. I can definitely see how it's a very useful tool even for refactoring a small codebase such as DrupalCodeBuilder, where a rule is only going to be used once. It might even prompt me to undertake some minor refactoring tasks I've been putting off because of how tedious they'll be.

What I've not figured out is how to extract namespaces from full class names to an import statement, or how to put line breaks in the new statement. I'm hoping that a pass through with PHP_CodeSniffer and Drupal Coder's rules will fix those. If not, there's always good old regexes!