openSUSE MicroOS as your Desktop? Why not!

Foreword

So, let me just say that this is going to be a very quick and probably a rather terse post. Fact is, I’m giving this talk, next Saturday, within openSUSE + LibreOffice Virtual Conference 2020 and I figured that, in case anyone wanted to try replicating my setup, something that can be read (or from where to cut-&-paste) could maybe be more useful than a YouTube video and slides in PDF.

Here’s the talk, by the way. And here they are the slides (link updated to the most recent version of the slides on Oct 17 2020, 11:30):

Many topics that I’m only touching on here would deserve being discussed at length, in a dedicated post or talk… and maybe it will happen. Just not here. ;-P

What’s MicroOS

See, it did not took much before we got to something about which we could talk for hours, and instead we’re only glossing over it! So, citing from our Wiki:

“[…] openSUSE MicroOS is an operating system you don’t have to worry about. It’s designed for but not limited to container hosts and edge devices. […]. openSUSE MicroOS inherits the openSUSE Tumbleweed and SUSE Linux Enterprise knowledge while redefining the operating system into a small, efficient and reliable distribution.”

And that’s all I’m going to say about it! There are talks that you can watch, though, to get a better idea.

The MicoOS Portal in openSUSE Wiki is in a good shape, IMO, and hence can be used as a fine source of information: Portal:MicroOS.

Of course, the deal is that you help us keep it fresh and accurate, as usual with Wikis! Anyway, perhaps the Features and the Design pages are good starting points.

Oh, and we are on Twitter. 🙂

MicroOS as a Desktop

Talks again!

And this one as well, still from Richard, right at the same conference! 🙂

UPDATE: Recording of Richard’s talk at openSUSE + LibreOffice Conference 2020 is now available:

Maybe you heard, or are familiar with, Fedora Silverblue (as it’s around for a little while now, and is already usable). The end goal of MicroOS as a Desktop would be to have something like that. But green.

Well, of course, the color is not the only difference. E.g., MicroOS uses BTRFS snapshots for “playing the immutable OS” game and for giving you transactional updates that do not touch the running system. And, if something is not good at the next reboot, rolling back automatically. But again… Not this time.

So, long story short, my colleague Richard Brown had this idea of trying to use MicroOS to build an openSUSE immutable desktop system. He put up an HackWeek project [*](HackWeek is that week of the year during which SUSE encourages its employees to do whatever they found cool). I happened to find it really cool and joined. I mostly did testing and started working on toolbox. A few months later, I decided to go all-in. And here we are.

[*] Actually, that’s the HackWeek project from the latest HackWeek, i.e., the one I joined. But he was working on that in previous HackWeeks as well.

Download and Install

This is easy. Get it from here:

And install it! Pick MicroOS Desktop [GNOME] [ALPHA] as a flavor, when you get to this point:

Oh, if you want KDE, that is also fine, of course. I just happen to not really know the status of it, as I’m neither using nor testing it. My SUSE colleague Fabian is, though.

Post Install Configurations

First of all, do this. In fact, flatpak is already installed, but there is no remote configured. Also, we want GNOME Software to only deal with Flatpaks. We’re working on making the installer doing this, but we’re not there yet. So, just after finished installed, login and do the following:

$ gsettings set org.gnome.software install-bundles-system-wide false
$ gsettings set org.gnome.software allow-updates false
$ gsettings set org.gnome.software download-updates false
$ gsettings set org.gnome.software enable-repos-dialog false
$ gsettings set org.gnome.software first-run true

And the following:

$  flatpak remote-add --user --if-not-exists flathub https://flathub.org/repo/flathub.flatpakrepo

And the following:

$ sudo rm -Rf /var/cache/app-info
$ sudo transactional-update shell
  # rpm -e --nodeps libzypp-plugin-appdata
  # zypper al libzypp-plugin-appdata
  # exit
$ sudo reboot

And do actually reboot the system. It’s best if you do it immediately (or as soon as possible).

Basically, we are going to use Flatpak for pretty much all the apps, except than for the packages that are part of the core of the OS and of the desktop environment. We will pick the flatpaks from the Flathub “repository” and we will install them “as user”, i.e., they’ll go you your user’s home directory (inside an hidden directory called ~/.var).

Do this as well:

# echo "<yourusername>:100000:65536" > /etc/subuid
# echo "<yourusername>:100000:65536" > /etc/subgid

It will turn out useful when we start using podman rootless containers as toolbox-es.

DO NOT add any additional repositories. Not even Packman for the multimedia codecs. The point is that, as stated already, we want to install as few packages as possible. Ideally, you’d (have to) install none, but we’re not there yet.

I recommend disabling automatic updating and rebooting as well. It’s a cool feature, and we’re thinking about how to have it better integrated into a desktop distribution. For now, I’ve disabled it and I recommend doing the same, at least until you have learned a little bit about it, are more familiar with the idea and with the basics of how it works, and come come-up with the best automatic updating & rebooting strategy for your workflow and use case. So:

$ sudo systemctl disable --now transactional-update.timer
$ sudo systemctl disable --now rebootmgr.service

And then let’s check:

$ sudo rebootmgrctl is-active
RebootMgr is dead
$ sudo rebootmgrctl status
Error: The name org.opensuse.RebootMgr was not provided by any .service files

For more insights about all the super nice and fancy things that rebootmgr and healt-checker can do for you, check these Wiki entry Kubic:Update_and_Reboot.

System Personalization

If (like me on my workstation) have and NVIDIA graphic card and you want the proprietary driver, then yes, the proper repos and packages need to be installed. I did that like this. Don’t bother about that transactional-update command that may look weird. Just do this, for now… I’ll talk about it a bit later:

$ sudo transactional-update pkg install gnome-remote-desktop
$ sudo reboot

I personally want sudo without a password for my user, via the wheel group, so I do this (but it’s of course optional):

# usermod -a -G wheel <myuser>
# echo "%wheel  ALL = (root) NOPASSWD:ALL" > /etc/sudoers.d/wheel

I have added some packages to the base OS, to make it a little bit friendly to interact with it. But not too many. Basically, these ones:

$ sudo transactional-updaete pkg install bash-completion wget unzip nfs-client autofs gnome-shell-search-provider-nautilus
$ sudo reboot

I will pick the Firefox flatpak as a browser, but I also want to use Chrome, from time to time. There’s no flatpak for it (although, something is in the works), so I added it like this:

$ sudo transactional-update shell
  #> zypper ar http://dl.google.com/linux/chrome/rpm/stable/x86_64 Google-Chrome
  #> wget https://dl.google.com/linux/linux_signing_key.pub
  #> rpm --import linux_signing_key.pub
  #> zypper ref
  #> zypper in google-chrome-stable
  #> exit
$ sudo reboot

Having a browser installed on the base OS is also useful because then you can use it for installing GNOME Shell extensions. In fact, I tried to install them from within Firefox’s flatpak but I could not make it to work (any ideas if it’s doable, anyone?).

If you want, let’s say, the firewall and some NetworkManager plugins for your VPN, add these, or whatever else you think you’ll need. But only if they are core components or things that are absolutely only available in RPM form. The general rule of thumb is, try as hard as you can not to add too many stuff!

$ sudo transactional-update pkg install firewalld firewall-config NetworkManager-openvpn-gnome
$ sudo reboot

As stated earlier, it’s quite likely that some of these may be included within the desktop install pattern in the future. But, for now, just add them manually.

And that is it. As far as installation and post-install configuration, there is nothing more to do. So you’re all set and you can start enjoying your life in an immutable Desktop system!

I will just add a super quick summary on a couple more topics and then we leave all the rest to other blog posts.

transactional-update

zypper is present on a MicroOS system, but it does not work (feel free to try! :-P). Instead, we use transactional-update. It’s not correct to call it a wrapper around zypper because it does a lot more. But, essentially, you use it kind of like that, i.e., for installing/removing packages or for updating the system.

If you want to read more about it, check out these blog/news entries:

This talks / presentations:

And these documentation entries:

And these man entries:

IAC, just very quickly, installing something is done like this:

$ sudo transactional-update pkg install <package> <package> ...
$ sudo reboot

Or:

$ sudo transactional-update shell
  # zypper in <package> <package> ...
  # exit
$ sudo reboot

Removing, as can be easily imagined:

$ sudo transactional-update pkg remove <package> <package> …

Or same as shown already, with transactional-update shell. Let’s mention 2 things:

  • As transactional-update warns you about itself, do not use it two or more times, without having rebooted in between. If you do that, you’ll loose all the changes that you did in all the invocations except the latest one. And when you reboot, you’ll only see them being applied. That’s just how immutable OSes works, after all, and there is not really a way around it (and it’s a good thing, IMO! You just need to get used to it)
  • Try to use it as few as possible, at least for installing and removing packages. That means, basically, try to install or remove packages as few as possible. If at all. See it like this: if you’re about to transitional-update install or remove something, think twice about it and consider whether there are alternatives to that. Most likely, they are. Then go for them, and save yourself a reboot.

This may all sound weird and you may think that you’ll end up straight in some kind of “rebooting hell” if ever trying to use a system like this, but do believe me, it’s not necessarily like that. As a sort of proof, I can offer you my current uptime, on this workstation (that runs with MicroOS Desktop, of course!):

See? I’ve not rebooted since 3 days and 16 hours (and counting)! 😛

On the other hand, using transactional-update for updating the base OS is fine. I won’t say <<don’t do it too frequently>> because you just should do it how frequently you want. 🙂

So, basically, from time to time, or in general whenever you like, do a:

$ sudo transactional-update dup
$ sudo reboot

Or:

$ sudo transactional-update shell
  # zypper ref
  # zypper dup
  # exit
$ suro reboot

It’s not that different than being on Tumbleweed, from this point of view. At least not for me. In fact, even when I was using Tumbleweed I was running zypper dup whenever I found it convenient and when I was sure that I was able to reboot the system, in case some critical package is actually updated and zypper would recommend to do so.

Here on MicroOS, rebooting or not is less of a choice. Meaning that, after having updated as shown above, you better do reboot. But you can do these updates whenever you want, no problem with that.

Flatpaks

Don’t want to say much here. Just, Flatpak / Flathub is from where we take all the apps. Everything that is not part of the very core of either the OS or the desktop environment (let’s say, anything above GNOME Shell) should come from Flatpak and not from the distribution RPMs.

Exceptions are allowed, of course, if there’s something you really need which is available as RPM but not as Flatpak (e.g., for me, it’s GThumb). But the least exceptions you allow, the more true you are to the “immutable OS philosophy”.

So, you can search for a flatpak like this:

$ flatpak search Steam

Install like this:

$ flatpak install Steam

And update like this:

$ flatpak update

And perhaps from time to time do this:

$ flatpak remove --unused

There’s room for improvement (isn’t there always!), like integrating this better in GNOME Software, etc. We will work on enable that.

Ah… when you install or remove or re-install a flatpak, there’s no need to reboot!! 😀

In the talk I have a list of apps that I am using. So, if you’re curious, check that out there.

Toolbox

Toolbox is just great! It’s basically a script that build and launches for you a privileged podman (or rootless podman) container inside of which you are free again to do zypper in and zypper rm to your heart’s content.

For starter, perhaps you can have a look at the toolbox post on Kubic’s blog.

There’s so much to say about it and about how I am using it that I won’t even start, as I currently don’t have the time. For instance, I extensively use it for development. Or, I should say that I spend pretty much my entire working day inside one or more toolbox-es. But yeah, again, there are some examples in the slides and in the talk. Check them if interested, until the next post.

Let’s just say this. If you need to scan the port of an host in your LAN, but nmap is not installed in your base OS (and it should not be!), toolbox saves you from installing it with transactiona-update and rebooting before being able to use it. In fact, you do like this:

$ toolbox -r
  #> zypper ref
  #> zypper in nmap
  #> nmap -sS 192.168.0.18

Conclusions

Conclusion is that I want to talk and write more about how great I’m finding to use a system like this, and show more details of how I got here and how I’m doing things.

The talk has some of these details, so you can check that out over there. But I’ll also try to get to write more about this soon!

Come and Try… and Help!

And after all this, what’s probably most important to say than everything, is that if you think you may like the idea, not only you should try using MicroOS as your desktop system for real (as I’m doing!), but you are free and welcome to come and help us working on it.

For me, this is not part of my daily job, but I’m very passionate about it, so I’m happy to try to help anyone, and answer any question. Fire an email to me or to the opensuse-kubic mailing list (opensuse-kubic@opensuse.org).

And I’m sure that Richard and Fabian (who also work on this as a side project, AFAIK) will be equally happy to help and welcome any contribution!

Posted in Containers, Events, Linux, openSUSE, SUSE, Technology | Tagged , , , , , , | 11 Comments

Che bel piacere… Che bel piacere…

Ieri ho visto la pubblicita` di questo coso.

C’e` questa bambina che pulisce, superfelice, (<<che bel piacere… che bel piacere..>> e` la canzoncina che canta!) il sudicio finto incluso nella confezione. Poi ad un tratto esclama, ancora piu` felice: <<E` come quello della mamma!!>>.

Poi arriva anche la mamma, che si complimenta con lei. Finisce con la bambina che continua ad aspirare, e la mamma, seduta, la osserva. L’immagine e` (volutamente) sfuocata, ma si percepisce il di lei sguardo languido –forse quasi anche le lacrime– immaginando gia` quando la bambina crescera`, e potra` finalmente pulire del sudicio vero, con un aspirapolvere vero! Del resto, che cosa si potrebbe volere di piu` per propria figlia?!?!

Dobbiamo ammettere, pero`, che le cose sono migliorate molto, rispetto a qualche anno fa. Voglio dire… Secondo me, qualche anno fa, avrebbero incluso anche il babbo che torna a casa con le scarpe fangose. Si sarebbe certo complimentato anche lui con la “donnina di casa”, per poi, chesso`, schiantarsi in poltrona, e chiede pantofole e birra. E invece, almeno questo non c’e`… E son conquiste!!

PS. Quando e` partita la pubblicita`, stavo pulendo la cucina. Mi son fermato e sono andato in soggiorno a vederla. Una volta finita son tornato in cucina, ho ricominciato a pulire e, nel mentre, ho chiamato Lara, e le ho chiesto come le era sembrata quella pubblicita`. Lei ha risposto che non le piaceva perche` faceva sembrare che solo le donne fanno le pulizie.

Della serie… uno ci prova. E speriamo bene :-O

PPS. Va dato credito, se non a GrandiGiochi, quantomeno alla VORWERK (quelli del “folletto vero”) che, ad esempio, in questa pagina, in cui spiegano come divertirsi facendo le faccende coi bambini, ci sono un uomo e un bambino maschio che spazzano. 🙂

Posted in General | Tagged , , , , , , | Leave a comment

Speeding-up colored builds… on openSUSE Tumbleweed

More than a blog entry, this is a “let’s store what I did somewhere, so I don’t forget”.

Even better, if would be to put some automation together, I know that very well. 🙂
I plan to do that, actually, but I’m not there yet.

So, basically, I want compile-time warnings and errors to be clearly visible and easy to spot –while logs are flowing in a terminal– and I want to speed-up builds themselves. The tools for the job are, apparently:

I had this setup already a couple of development boxes ago, but had not put it together on the current one (running openSUSE Tumbleweed), yet.

golorgcc

colorgcc, is here:

https://software.opensuse.org/package/colorgcc

Just download it, and:

$ sudo zypper install colorgcc-1.4.4-1.12.noarch.rpm

Then:

$ cp /etc/colorgccrc $HOME/.colorgccrc

And, in that config file, do what the comment that says to uncomment says, i.e., uncomment the lines following the comment: 🙂

# Uncomment this if you want set up default path to gcc
#g++: /usr/bin/g++
#gcc: /usr/bin/gcc
#c++: /usr/bin/c++
#cc: /usr/bin/cc

And, finally, symlinks. Basically, we want the colorgcc wrapper script to be invoked, instead of one (any) of the GCC compilers. I’ve done it by creating these links in $HOME/bin/, and making sure $HOME/bin is in $PATH (and comes early enough):

$ ls -l /home/dario/bin/
c++ -> /usr/bin/colorgcc
c89-gcc -> /usr/bin/colorgcc
c99-gcc -> /usr/bin/colorgcc
cc -> /usr/bin/colorgcc
g++ -> /usr/bin/colorgcc
g++-7 -> /usr/bin/colorgcc
gcc -> /usr/bin/colorgcc
gcc-7 -> /usr/bin/colorgcc
gcc-8 -> /usr/bin/colorgcc
gccgo -> /usr/bin/colorgcc
gccgo-8 -> /usr/bin/colorgcc
$ echo $PATH
/home/dario/bin:/usr/local/bin:/usr/bin:/bin

As a test, let’s just give gcc something to complain about. For me, it worked (do you see the colors?!?):

$ cat << EOF > hello.c ; gcc hello.c -o hello
#include 

main()
{
printf("Hello World!\n");
}
EOF
hello.c:3:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
main()
^~~~

(If you’re serious with “Hello World”, do the same, but with GNU hello!)

ccache

ccache is in openSUSE’s main repos, so installing is just a matter of:

$ sudo zypper install ccache

At this point, ccache config file (and ccache’s cache) is in ~/.ccache/:

$ ccache -s
cache directory /home/dario/.ccache
primary config /home/dario/.ccache/ccache.conf
secondary config (readonly) /etc/ccache.conf
cache hit (direct)          0
cache hit (preprocessed)    0
cache miss                  0
cache hit rate           0.00 %
cleanups performed          0
files in cache              0
cache size                0.0 kB
max cache size            5.0 GB

All that’s remaining to be done is:

    • to go back in $HOME/.colorgcc, and change that same block of lines again
    • to define CCACHE_PATH=/usr/bin
# Uncomment this if you want set up default path to gcc
g++: /usr/lib64/ccache/g++
gcc: /usr/lib64/ccache/gcc
c++: /usr/lib64/ccache/c++
cc: /usr/lib64/ccache/cc
clang: /usr/lib64/ccache/clang
clang++: /usr/lib64/ccache/clang++
gfortran: /usr/lib64/ccache/gfortran
export CCACHE_PATH="/usr/bin"

(The latter, I added it to .bashrc.)

Done. Now you’ll see that, building stuff has an impact in ccache statistics (and it’s also faster, hopefully!! 🙂 )

$ ccache -s
cache hit (direct)                82
cache hit (preprocessed)           1
cache miss                       392
cache hit rate                 17.47 %
called for link                   27
called for preprocessing          27
compiler produced empty output    49
preprocessor error                 2
unsupported compiler option        2
no input file                  79579
cleanups performed                 0
files in cache                  1172
cache size                      59.9 MB
max cache size                   5.0 GB

Just FTR, some links, which were useful when setting this up in the past, on other distros:

Posted in Linux, openSUSE, SUSE, Technology, Work | Tagged , , , , , , , , | Leave a comment

LinuxLab 2018, in Florence

This Monday and Tuesday (3rd and 4th of December) were the days of LinuxLab2018. It’s only the second year they’re (Develer) doing this conference, but it’s a quite good one already, at least in my opinion.

SUSE @ LinuxLab2018

SUSE @ LinuxLab2018, in Florence

For sure, it stands out in the Italian landscape of Linux technical conference… assuming there even are others worth being mentioned. I was there last year, and have been there again this year, and this year it has been even more fun!

In particular, it was rally cool to meet again and catch-up with some of my friends and mates, mostly from the Ph.D time. Interestingly enough, most of us managed to continue “doing Linux stuff”, in one way or another, over these years. 😀
I’m glad they’ve noticed this conference as well, and recognized it as a good opportunity to see what’s around –in Linux, here in Italy– as well as sharing what they’re currently doing, as I did myself.

Having skipped more than one LinuxConf (now Open Source Summits), it was a while since I saw one of the famous Jonathan Corbet‘s “Kernel Weather Report” live. And, yes, it’s still as cool as I remembered it.

My talk, this time, was about “VIRTUALIZATION IN THE AGES OF SPECULATIVE EXECUTION HARDWARE BUGS”. I think it went well… I had too much material (and I knew it!) so I had to rush a little bit. I did a so-and-so job at covering everything that I wanted to (and I sort of have the impression I’ve been given a couple of minutes less than promised! :-P), but I’m happy about the overall turnaround. And in fact, I received good feedback. 🙂

The slides, for the interested, are below (or at this link). Please notice that, although I did my best, these stuff drives me crazy all the times I try to (re-)figure out how they work exactly! So, if anyone spot mistakes, do not hesitate to point that out to me.

LinuxLab18_ Virtualization in the age of Speculative Execution HW Bugs

Thanks to the organizers for letting me attend and present, and to SUSE for letting me go. Looking forward to next year…

Posted in General | Leave a comment

nautilus 3.6: where the hell is ‘Connect to Server’ ?

From where it has been introduced in GNOME / Nautilus, I’ve always been an assiduous, diligent, consistent and zealous user of the ‘Connect to Server’ functionality. It’s just really really really convenient, that’s all about it.

So, as soon as I finished installed something which came with Nautilus 3.6 (we’re now at 3.8, and it’s pretty much the same), which I  think at the time was Fedora 18, I started looking for it, and  was quite upset when I did not actually find it!

I looked for it by clicking on the ‘gear’ icon… and failed! Then I looked for it under the ‘arrow down’ icon… and failed!

Nautilus

Nautilus

At this point I was really mad, especially as I started to think the GNOME guys could have removed it, which would have been very, VEry, VERY bad for me! Then the light. Just by chance, I clicked on window name, on the left, in the top panel (usually called ‘the App menu’ in GNOME3 jargon, see below) and found it… Phew!

Nautilus 'Connect to Server'

Oh, BTW, the same applies to ‘Enter Location’, which I also find a lot useful.

Of course, this post comes after a while that things are like this, so I guess everyone is used to the new interface right now… But, I mean, one never knows! 😛

Posted in Fedora, Linux, Technology | Tagged , , , , , , | 5 Comments

Xen NUMA Scheduling on The Register

Hey! Seems like the stuff I’m doing for Xen 4.3 made it on The Register !!! 😛

The Register on Xen 4.3 being released (July 9th, 2013)

The NUMA-aware scheduler is an important component in Xen on machines with multiple processor sockets […]

As the number of VMs climbs in the machine, the effect of NUMA-aware scheduling increases, as you can see in these preliminary benchmark test results, and presumably this will also be the case as the number of sockets increases. It gets a bit dicey when a machine becomes overloaded with work, but even then the tweaks to make Xen appreciate the eccentricities of NUMA systems seems to help some.

Also, ‘the tweaks to make Xen appreciate the eccentricities of NUMA systems’ is, I think, the best definition for what I’m doing during most of my working hours… Thanks for that too, Register!

Posted in Technology, Work, Xen | Tagged , , , , , , , , | Leave a comment

Schrödinger’s Cat in a Virtual ‘Box’

Yes, apparently Schrödinger’s cat is alive, as the latest release of Fedora — Fedora 19, codename Schrödinger’s cat– as been released on July 2nd, and that even happened quite on time.

Fedora Logo

Fedora Logo

So, apparently, putting the cat “in a box” and all the stuff was way too easy, and that’s why we are bringing the challenge to the next level: do you dare putting Schrödinger’s cat “in a virtual box”?

In other words, do you dare install Fedora 19 within a Xen virtual machine? And if yes, how about doing that using Fedora 19 itself as Dom0?

Continue reading

Posted in Fedora, Linux, Technology, Work, Xen | Tagged , , , , , , | 5 Comments

NUMA Aware Scheduling in Xen

citrix xen green sweets

Citrix Xen Sweets (by osde8info)

So, hacking the Xen Open Source hypervisor is what I do for living (and these are the guys providing me with my monthly paycheck for that: http://www.citrix.com). During the last months, I’ve been concentrating on improving NUMA awareness of the Xen scheduler, and this an attempt to describe what that is all about…

Background and Motivation

The official Xen blog already hosted a couple of stories about what is going on, in the Xen development community, regarding improving Xen NUMA support. Therefore, if you really are interested in some background and motivation, feel free to check them out:

Long story  short, they say how NUMA is becoming more and more common and that, therefore, it is very important to: (1) achieve a good initial placement, when creating a new VM; (2) have a solution that is both flexible and effective enough to take advantage of that placement during the whole VM lifetime. The former, basically, means: <<When starting a new Virtual Machine, to which NUMA node should I “associate” it with?>>. The latter is more about: <<How hard should the VM be associated to that NUMA node? Could it, perhaps temporarily, run elsewhere?>>.

NUMA Placement and Scheduling

So, here’s the situation: automatic initial placement has been included in Xen 4.2, inside libxl. This means, when a VM is created (of course, if that happens through libxl) a set of heuristics decide on which NUMA node his memory has to be allocated, and the vCPUs of the VM are statically pinned to the pCPUs of such node.
On the other hand, NUMA aware scheduling  has been under development during the last months, and is going to be included in Xen 4.3. This mean, instead of being statically pinned, the vCPUs of the VM will strongly prefer to run on the pCPUs of the NUMA node, but they can run somewhere else as well… And this is what this status report is all about.

NUMA Aware Scheduling Development

The development of this new feature started pretty early in the Xen 4.3 development cycle, and has undergone a couple of major rework along the way. The very first RFC for it dates back to the Xen 4.2 development cycle, and it showed interesting performance already. However, what was decided at the time was to concentrate only on placement, and leave scheduling for the future. After that, v1, v2 and v3 of a patch series entirely focused on NUMA aware scheduling followed. It has been discussed during XenSummit NA 2012, in a talk about NUMA future development in Xen in general (slides here).  While at it, a couple of existing scheduling anomalies of the stock credit scheduler where found and fixed (for instance, the one described here).

Right now, we can say we are almost done. In fact, v3 received positive feedback and is basically what is going to be merged, and so what Xen 4.3 will ship. Actually, there is going to be a v4 (being released on xen-devel right at the same time of this blog post), but it only accommodates very minor changes, and it is 100% functionally equal to v3.

Any Performance Numbers?

Sure thing! Benchmarks similar to the ones already described in the previous blog posts have been performed. More specifically, directly from the cover letter of the v3 of the patch series, here’s what has been done:

I ran the following benchmarks (again):
* SpecJBB is all about throughput, so pinning
  is likely the ideal solution.
* Sysbench-memory is the time it takes for
  writing a fixed amount of memory (and then
  it is the throughput that is measured). What
  we expect is locality to be important, but
  at the same time the potential imbalances
  due to pinning could have a say in it.
* LMBench-proc is the time it takes for a
  process to fork a fixed number of children.
  This is much more about latency than
  throughput, with locality of memory
  accesses playing a smaller role and, again,
  imbalances due to pinning being a potential
  issue.

This all happened on a 2 node host, where 2 to 10 VMs (2 vCPUs and 960 RAM each) were executing the various benchmarks concurrently. Here they are the results:

 ----------------------------------------------------
 | SpecJBB2005, throughput (the higher the better)  |
 ----------------------------------------------------
 | #VMs | No affinity |  Pinning  | NUMA scheduling |
 |    2 |  43318.613  | 49715.158 |    49822.545    |
 |    6 |  29587.838  | 33560.944 |    33739.412    |
 |   10 |  19223.962  | 21860.794 |    20089.602    |
 ----------------------------------------------------
 | Sysbench memory, throughput (the higher the better)
 ----------------------------------------------------
 | #VMs | No affinity |  Pinning  | NUMA scheduling |
 |    2 |  469.37667  | 534.03167 |    555.09500    |
 |    6 |  411.45056  | 437.02333 |    463.53389    |
 |   10 |  292.79400  | 309.63800 |    305.55167    |
 ----------------------------------------------------
 | LMBench proc, latency (the lower the better)     |
 ----------------------------------------------------
 | #VMs | No affinity |  Pinning  | NUMA scheduling |
 ----------------------------------------------------
 |    2 |  788.06613  | 753.78508 |    750.07010    |
 |    6 |  986.44955  | 1076.7447 |    900.21504    |
 |   10 |  1211.2434  | 1371.6014 |    1285.5947    |
 ----------------------------------------------------

Which, reasoning in terms of %-performance increase/decrease, means NUMA aware
scheduling does as follows, as compared to no-affinity at all and to static pinning:

     ----------------------------------
     | SpecJBB2005 (throughput)       |
     ----------------------------------
     | #VMs | No affinity |  Pinning  |
     |    2 |   +13.05%   |  +0.21%   |
     |    6 |   +12.30%   |  +0.53%   |
     |   10 |    +4.31%   |  -8.82%   |
     ----------------------------------
     | Sysbench memory (throughput)   |
     ----------------------------------
     | #VMs | No affinity |  Pinning  |
     |    2 |   +15.44%   |  +3.79%   |
     |    6 |   +11.24%   |  +5.72%   |
     |   10 |    +4.18%   |  -1.34%   |
     ----------------------------------
     | LMBench proc (latency)         |
     | NOTICE: -x.xx% = GOOD here     |
     ----------------------------------
     | #VMs | No affinity |  Pinning  |
     ----------------------------------
     |    2 |    -5.66%   |  -0.50%   |
     |    6 |    -9.58%   | -19.61%   |
     |   10 |    +5.78%   |  -6.69%   |
     ----------------------------------

The tables show how, when not in overload (where overload=’more vCPUs than pCPUs’), NUMA scheduling is the absolute best. In fact, not only it does a lot better than no-pinning on throughput biased benchmarks, as well as a lot better than pinning on latency biased benchmarks (especially with 6 VMs), it also equals or beats both under adverse circumstances (adverse to NUMA scheduling, i.e., beats/equals pinning in throughput benchmarks, and beats/equals no-affinity on the latency benchmark).

When the system is overloaded, NUMA scheduling scores in the middle, as it could have been expected. It must also be noticed that, when it brings benefits, they are not as huge as in the non-overloaded case. However, this only means that there is still room for more optimization, right?  In some more details, the current way a pCPU is selected for a vCPU that is waking-up, couples particularly bad with the new concept of NUMA node affinity. Changing this is not trivial, because it involves rearranging some locks inside the scheduler code, but is already being worked-on.
Anyway, even with what we have right now, we are overloading the test box by 20% here (without counting Dom0 vCPUs!) and still seeing improvements, which is definitely not bad!

What Else Is Going On?

Well, a lot… To the point that it is probably pointless to try make a list here! I maintain a NUMA roadmap on our Wiki, which I’m trying to keep updated and, more important, to honor and fulfill so, if interested in knowing what will come next, go check it out!

Posted in General, Technology, Work, Xen | Tagged , , , , , , , , , , , , , | Leave a comment

“An Oxford University study (http://garyhaq.wordpress.com/2011/04/29/the-impact-of-the-meat-on-our-plate/www.foe.co.uk/resource/reports/healthy_planet_eating.pdf) funded by Friends of the Earth showed that more than 45,000 lives a year could be saved if everyone ate meat no more than two or three times a week.”

Everyone should really think about this!

A Human Ecologist's View

FROM Paul McCartney to Lord Stern, more people are promoting the benefits of a meatless society.

Meat production not only contributes to climate change and land degradation but is also a cause of air and water pollution and biodiversity loss. The farming industry accounts for nine per cent of UK total greenhouse gases, half of which come from sheep, cows and goats. Is the meat on our plate really worth the impact on the planet?

Deforestation, manure and livestock flatulence all contribute to global warming and are associated with excessive meat consumption.

As nations become richer, they tend to eat more meat and more livestock has to be raised to keep up with the demand.

In turn, more grazing land is required and more forests are cut down to expand farmland. As trees get the chop the carbon dioxide that they have absorbed over their lifetime is eventually released back…

View original post 680 more words

Posted in General | Leave a comment

At least it looks like things are improving, perhaps, in a not too far future, neither all this “external” work nor these hacks will be necessary any longer…

Posted in Linux, Technology | Tagged , , , | Leave a comment