Since I upgraded my workstation from Fedora 25 to Fedora 27, and thus from DNF 1.1 to 2.7, I immediately notice than repomanage, the command used to clean old packages from my repository was much slower.

This week, I try to check this, and try to improve it.

1. Use case

I run dnf repomanage to find the 15 old RPMs in a tree with nearly 4000 source packages, keeping 5 versions of each package, so

dnf repomanage --old --keep 5 .

2. fedora 27, dnf 2.7, python 3.6

The task takes about 16" which is obviously too slow.

3. fedora 27, php 7.1

First idea was to quickly write a dirty php script to run the same task, and confirm the slowness issue

The first draft of this script, a 10' work, mostly parse the output of

rpm -qp --qf "%{NAME} %{EPOCHNUM} %{VERSION} %{RELEASE} %{ARCH} %{NEVR}\n" *rpm

And then sort the result using the rpmdev-vercmp command

This is obviously a terrible and inefficient solution, especially as python have some libraries to handle rpm information in a sane way.

The task takes about 6".

Conclusion: there is a huge issue with the repomanage command, so I report it as bug #1537981 - performance issue: dnf repomanage is really slow

4. rpmvercmp binding for php

Just for fun, I quickly write a PHP extension providing a wrapper for librpm and exposing the single rpmvercmp function, to avoid the very bad use of an exec call in a sort callback.

$ php  -r 'var_dump(rpmvercmp("1.2~RC1-1", "1.2-1"));'
int(-1)
$ php  -r 'var_dump(rpmvercmp("1:1.2-1", "0:1.2-2"));'
int(1)

The speed improvement was really good, the task now only takes less than a second (0.6").

5. Fedora 25, dnf 1.1, python 3.5

To have a clear idea about the change, I run the same task, on the same set of packages in the Fedora 25 virtual machine.

And the task only takes 0.27" which confirm the huge performance regression in recent version (60x slower)

6. more rpmlib bindings for php

As previous results with the php script was about 3x slower than the official repomanage command, I try, for fun, to improve it by adding a rpminfo function and avoid parsing the rpm command output.

$ php -n -d extension=modules/rpminfo.so -r 'print_r(rpminfo("remi-release-27.rpm"));'
Array
(
    [Name] => remi-release
    [Version] => 27
    [Release] => 2.fc27.remi
    [Arch] => noarch
)

Again, nice speed improvement, the task now only takes around 0.13" which is twice faster than the old python repomanage results.

7. Fedora 27, yum-utils 1.1, python 2.7

A quick try with the old deprecated repomanage command from the yum stack

Only 0.26" so a bit faster than dnf, and also confirm the huge regression.

8. Conclusion

Repomanage have a huge performance regression issue, which need to be fixed by upstream developers.

Running these various tests was really a funny game, and I'm happy with the result, PHP is faster than python, even on such a old "production quality" piece of code which exist in RPM distribution for years. And this is only a few hours work.

If you are interested in this game, you can check the code of the rpminfo extension in my git repository : php-rpminfo (see the repomanage.php script in the examples directory).

Perhaps I will add more bindings in this small new extension, and submit it to become an official pecl project, especially as the old rpmreader extension is dead.

As today is trollday, feel free to feed the troll.