About "dnf repomanage" performance regression
+
Par Remi le vendredi 26 janvier 2018, 07:29 - General - Lien permanent
Warning: this article can be a troll, but php is faster than python.
Since I upgraded my workstation from Fedora 25 to Fedora 27, and thus from DNF 1.1 to 2.7, I immediately notice than repomanage, the command used to clean old packages from my repository was much slower.
This week, I try to check this, and try to improve it.
1. Use case
I run dnf repomanage to find the 15 old RPMs in a tree with nearly 4000 source packages, keeping 5 versions of each package, so
dnf repomanage --old --keep 5 .
2. fedora 27, dnf 2.7, python 3.6
The task takes about 16" which is obviously too slow.
3. fedora 27, php 7.1
First idea was to quickly write a dirty php script to run the same task, and confirm the slowness issue
The first draft of this script, a 10' work, mostly parse the output of
rpm -qp --qf "%{NAME} %{EPOCHNUM} %{VERSION} %{RELEASE} %{ARCH} %{NEVR}\n" *rpm
And then sort the result using the rpmdev-vercmp command
This is obviously a terrible and inefficient solution, especially as python have some libraries to handle rpm information in a sane way.
The task takes about 6".
Conclusion: there is a huge issue with the repomanage command, so I report it as bug #1537981 - performance issue: dnf repomanage is really slow
4. rpmvercmp binding for php
Just for fun, I quickly write a PHP extension providing a wrapper for librpm and exposing the single rpmvercmp function, to avoid the very bad use of an exec call in a sort callback.
$ php -r 'var_dump(rpmvercmp("1.2~RC1-1", "1.2-1"));' int(-1) $ php -r 'var_dump(rpmvercmp("1:1.2-1", "0:1.2-2"));' int(1)
The speed improvement was really good, the task now only takes less than a second (0.6").
5. Fedora 25, dnf 1.1, python 3.5
To have a clear idea about the change, I run the same task, on the same set of packages in the Fedora 25 virtual machine.
And the task only takes 0.27" which confirm the huge performance regression in recent version (60x slower)
6. more rpmlib bindings for php
As previous results with the php script was about 3x slower than the official repomanage command, I try, for fun, to improve it by adding a rpminfo function and avoid parsing the rpm command output.
$ php -n -d extension=modules/rpminfo.so -r 'print_r(rpminfo("remi-release-27.rpm"));' Array ( [Name] => remi-release [Version] => 27 [Release] => 2.fc27.remi [Arch] => noarch )
Again, nice speed improvement, the task now only takes around 0.13" which is twice faster than the old python repomanage results.
7. Fedora 27, yum-utils 1.1, python 2.7
A quick try with the old deprecated repomanage command from the yum stack
Only 0.26" so a bit faster than dnf, and also confirm the huge regression.
8. Conclusion
Repomanage have a huge performance regression issue, which need to be fixed by upstream developers.
Running these various tests was really a funny game, and I'm happy with the result, PHP is faster than python, even on such a old "production quality" piece of code which exist in RPM distribution for years. And this is only a few hours work.
If you are interested in this game, you can check the code of the rpminfo extension in my git repository : php-rpminfo (see the repomanage.php script in the examples directory).
Perhaps I will add more bindings in this small new extension, and submit it to become an official pecl project, especially as the old rpmreader extension is dead.
As today is trollday, feel free to feed the troll.
Commentaires
And if you want to test this small extension, the php-pecl-rpminfo package is now available in my repository (version 0.1.1)