Tracking Service Function Chaining with Skydive

Skydive is “an open source real-time network topology and protocols analyzer”. It is a tool (with CLI and web interface) to help analyze and debug your network (OpenStack, OpenShift, containers, …). Dropped packets somewhere? MTU issues? Routing problems? These are some issues where running skydive whill help.

So as an update on my previous demo post (this time based on the Newton release), let’s see how we can trace SFC  with this analyzer!

devstack installation

Not a lot of changes here, check out devstack on the stable/newton branch, grab the local.conf file I prepared (configure to use skydive 0.9 release) and run “./”!

For the curious, the SFC/Skydive specific parts are:
enable_plugin networking-sfc stable/newton

# Skydive
enable_plugin skydive refs/tags/v0.9.0
enable_service skydive-agent skydive-analyzer

Skydive web interface and demo instances

Before running the script to configure the SFC demo instances, open the skydive web interface (it listens on port 8082, check your instance firewall if you cannot connect):


The login was configured with devstack, so if you did not change, use admin/pass123456.
Then add the demo instances as in the previous demo:
$ git clone -b sfc_newton_demo
$ ./openstack-scripts/

And watch as your cloud goes from “empty” to “more crowded”:

Skydive CLI, start traffic capture

Now let’s enable traffic capture on the integration bridge (br-int), and all tap interfaces (more details on the skydive CLI available in the documentation):
$ export SKYDIVE_USERNAME=admin
$ export SKYDIVE_PASSWORD=pass123456
$ /opt/stack/go/bin/skydive --conf /tmp/skydive.yaml client capture create --gremlin "G.V().Has('Name', 'br-int', 'Type', 'ovsbridge')"
$ /opt/stack/go/bin/skydive --conf /tmp/skydive.yaml client capture create --gremlin "G.V().Has('Name', Regex('^tap.*'))"

Note this can be done in the web interface too, but I wanted to show both interfaces.

Track a HTTP request diverted by SFC

Make a HTTP request from the source VM to the destination VM (see previous post for details). We will highlight the nodes where this request has been captured: in the GUI, click on the capture create button, select “Gremlin expression”, and use the query:

This expression reads as “on all captured flows matching IP address and port 80, show nodes”. With the CLI you would get a nice JSON output of these nodes, here in the GUI these nodes will turn yellow:

If you look at our tap interface nodes, you will see that two are not highlighted. If you check their IDs, you will find that they belong to the same service VM, the one in group 1 that did not get the traffic.

If you want to single out a request, in the skydive GUI, select one node where capture is active (for example br-int). In the flows table, select the request, scroll down to get its layer 3 tracking ID “L3TrackingID” and use it as Gremlin expression:

Going further

Now it’s your time to experiment! Modify the port chain, send a new HTTP request, get its L3TrackingID, and see its new path. I find the latest ID quickly with this CLI command (we will see how the skydive experts will react to this):
$ /opt/stack/go/bin/skydive --conf /tmp/skydive.yaml client topology query --gremlin "G.Flows().Has('Network','','Transport','80').Limit(1)" | jq ".[0].L3TrackingID"

You can also check each flow in turn, following the paths from a VM to another one, go further with SFC, or learn more about skydive:

app-text/tesseract 4.0 alpha ebuild available for testing

Tesseract is one of the best open-source OCR software available, and I recently took over ebuilds maintainership for it. Current development is still quite active, and since last stable release they added a new OCR engine based on LSTM neural networks. This engine is available in an alpha release, and initial numbers show a much faster OCR pass, with fewer errors.

Sounds interesting? If you want to try it, this alpha release is now in tree (along with a live ebuild). I insist on the alpha tag, this is for testing, not for production; so the ebuild masked by default, and you will have to add to your package.unmask file:
The ebuild also includes some additional changes, like current documentation generated with USE=doc (available in stable release too), and updated linguas.

Testing with paperwork

The initial reason I took over tesseract is that I also maintain paperwork ebuilds, a personal document manager, to handle scanned documents and PDFs (which is heavy tesseract user). It recently got a new 1.1 release, if you want to give it a try!

Local testing of OpenStack Grafana dashboard changes

OpenStack has a Grafana dashboard with infrastructure metrics, including CI jobs history (failure rate, …). These dashboards are configured via YML files, hosted in the project-config repo, with the help of grafyaml.

As a part of the Neutron stadium, projects like networking-sfc are expected to have a working grafana dashboard for failure rates in gates. I updated the configuration file for networking-sfc recently, but wanted to locally test these changes before sending them for review.

Documentation mentions the steps with the help of puppet, but I wanted to try and configure a local test server. Here are my notes on the process!

Installing the Grafana server

I run this on a Centos 7 VM, with some of the usual development packages already installed (git, gcc, python, pip, …). Some steps will be distribution-specific, like grafana install here.

Grafana has some nice documentation, but for my test server, I just installed it with packagecloud repository:
[root@grafana ~]# wget
[root@grafana ~]# vi # Never blindly run a downloaded script ;)
[root@grafana ~]# bash

Then start the server:
[root@grafana ~]# systemctl start grafana-server
(optionally, run “systemctl enable grafana-server” if you want it to start at boot)
And check that you can connect to http://${SERVER_IP}:3000, the default login password is admin / admin

Install and configure grafyaml

Seeing the main dashboard? Good, now open the API keys menu, and generate a key with Admin role (required as we will change the data source).

Now install grafyaml via pip (some distributions have a package for it, but not Centos):
[root@grafana ~]# pip install grafyaml

Create the configuration file /etc/grafyaml/grafyaml.conf with the following content (use the API key you just generated):

url = http://localhost:3000
apikey = generated_admin_key

Configure a dashboard

Now get the current configuration for OpenStack dashboards, and add one of them:
[root@grafana ~]# git clone # or sync from your local copy
[root@grafana ~]# grafana-dashboard update project-config/grafana/datasource.yaml
[root@grafana ~]# grafana-dashboard update project-config/grafana/networking-sfc.yaml

The first update command will add the OpenStack graphite datasource, the second one adds the current networking-sfc dashboard (the one I wanted to update in this case).
If everything went fine, refresh the grafana page, you should be able to select the Networking SFC Failure rates dashboard and see the same graphs as on the main site.

Modifying the dashboard

But we did not set up this system just to mimick the existing dashboards, right? Now it’s time to add your modifications to the dashboard YAML file, and test them.

A small tip on metrics names: if you want to be sure “stats_counts.zuul.pipeline.check.job.gate-networking-sfc-python27-db-ubuntu-xenial.FAILURE” is a correct metric, is your friend!
This is a web interface to the datasource, and allows you to look for metrics by exact name (Search), with some auto-completion help (Auto-completer), or browsing a full tree (Tree).

Now that you have your metrics, update the YAML file with new entries, then you can validate (the YAML structure only, for metrics names see previous paragraph) and update your grafana dashboard with:
[root@grafana ~]# grafana-dashboard validate project-config/grafana/networking-sfc.yaml
[root@grafana ~]# grafana-dashboard update project-config/grafana/networking-sfc.yaml

Refresh your browser and you can see how your modifications worked out!

Next steps

Remember that this is a simple local test setup (default account, api key with admin privileges, manual configuration, …). This can be used as a base guide for a real grafana/grafyaml server, but the next steps are left as an exercise for the reader!

In the meantime, I found it useful to be able to try and visualize my changes before sending the patch for review.

Service Function Chaining demo with devstack

After a first high-level post, it is time to actually show networking-sfc in action! Based on a documentation example, we will create a simple demo, where we route some HTTP traffic through some VMs, and check the packets on them with tcpdump:

SFC demo diagram

This will be hosted on a single node devstack installation, and all VMs will use the small footprint CirrOS image, so this should run on “small” setups.

Installing the devstack environment

On your demo system (I used Centos 7), check out devstack on the Mitaka branch (remember to run devstack as a sudo-capable user, not root):

[stack@demo ~]$ git clone -b stable/mitaka

Grab my local configuration file that enables the networking-sfc plugin, rename it to local.conf in your devstack/ directory.
If you prefer to adapt your current configuration file, just make sure your devstack checkout is on the mitaka branch, and add the SFC parts:
enable_plugin networking-sfc

Then run the usual “./” command, and go grab a coffee.

Deploy the demo instances

To speed this step up, I regrouped all the following items in a script. You can check it out (at a tested revision for this demo):
[stack@demo ~]$ git clone -b sfc_mitaka_demo

The script will:

  • Configure security (disable port security, set a few things in security groups, create a SSH key pair)
  • Create source, destination systems (with a basic web server)
  • Create service VMs, configuring the network interfaces and static IP routing to forward the packets
  • Create the SFC items (port pair, port pair  group, flow classifier, port chain)

I highly recommend to read it, it is mostly straightforward and commented, and where most of the interesting commands are hidden. So have a look, before running it:
[stack@demo ~]$ ./openstack-scripts/
WARNING: setting legacy OS_TENANT_NAME to support cli tools.
Updated network: private
Created a new port:

route: SIOCADDRT: File exists
WARN: failed: route add -net "" gw ""
You can safely ignore the route errors at the end of the script (they are caused by duplicate default route on the service VMs).

Remember, from now on, to source the credentials file in your current shell before running CLI commands:
[stack@demo ~]$ source ~/devstack/openrc demo demo

We first get the IP addresses for our source and destination demo VMs:[vagrant@defiant-devstack ~]$ openstack server show source_vm -f value -c addresses; openstack server show dest_vm -f value -c addresses

private=, fd73:381c:4fa2:0:f816:3eff:fe65:12fd

Now, we look for the tap devices associated to our service VMs:
[stack@demo ~]$ neutron port-list -f table -c id -c name

| name           | id                                   |
| p1in           | 897df85a-26c3-4491-888e-8cc58f19cea1 |
| p1out          | fa838294-317d-46df-b10e-b1734dd62faf |
| p2in           | c86dafc7-bda6-4537-b806-be2282f7e11e |
| p2out          | 12e58ea8-a9ab-4d0b-9fd7-707dc6e99f20 |
| p3in           | ee14f406-e9d6-4047-812b-aa04514f50dd |
| p3out          | 2d86403b-4639-40a0-897e-68fa0c759f01 |

These devices names follow the tap<port ID first 10 digits> pattern, so for example tap897df85a-26 is the associated  for the p1in port here

See SFC in action

In this example we run a request loop from client_vm to dest_vm (remember to use the IP addresses found in the previous section):
[stack@demo ~]$ ssh cirros@
$ while true; do curl; sleep 1; done
Welcome to dest-vm
Welcome to dest-vm
Welcome to dest-vm

So we do have access to the web server! But does the packets really go through the service VMs? To confirm that, in another shell, run tcpdump on the tap interfaces:

# On the outgoing interface of VM 3
$ sudo tcpdump port 80 -i tap2d86403b-46
tcpdump: WARNING: tap2d86403b-46: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tap2d86403b-46, link-type EN10MB (Ethernet), capture size 65535 bytes
11:43:20.806571 IP > Flags [S], seq 2951844356, win 14100, options [mss 1410,sackOK,TS val 5010056 ecr 0,nop,wscale 2], length 0
11:43:20.809472 IP > Flags [.], ack 3583226889, win 3525, options [nop,nop,TS val 5010057 ecr 5008744], length 0
11:43:20.809788 IP > Flags [P.], seq 0:136, ack 1, win 3525, options [nop,nop,TS val 5010057 ecr 5008744], length 136
11:43:20.812226 IP > Flags [.], ack 39, win 3525, options [nop,nop,TS val 5010057 ecr 5008744], length 0
11:43:20.817599 IP > Flags [F.], seq 136, ack 40, win 3525, options [nop,nop,TS val 5010059 ecr 5008746], length 0

Here are some other examples (skipping the tcpdump output for clarity):
# You can check other tap devices, confirming both VM 1 and VM2 get traffic
$ sudo tcpdump port 80 -i tapfa838294-31
$ sudo tcpdump port 80 -i tap12e58ea8-a9

# Now we remove the flow classifier, and check the tcpdump output
$ neutron port-chain-update --no-flow-classifier PC1
$ sudo tcpdump port 80 -i tap2d86403b-46 # Quiet time

# We restore the classifier, but remove the group for VM3, so tcpdump will only show traffic on other VMs
$ neutron port-chain-update --flow-classifier FC_demo --port-pair-group PG1 PC1
$ sudo tcpdump port 80 -i tap2d86403b-46 # No traffic
$ sudo tcpdump port 80 -i tapfa838294-31 # Packets!

# Now we remove VM1 from the first group
$ neutron port-pair-group-update PG1 --port-pair PP2
$ sudo tcpdump port 80 -i tapfa838294-31 # No more traffic
$ sudo tcpdump port 80 -i tap12e58ea8-a9 # Here it is

# Restore the chain to its initial demo status
$ neutron port-pair-group-update PG1 --port-pair PP1 --port-pair PP2
$ neutron port-chain-update --flow-classifier FC_demo --port-pair-group PG1 --port-pair-group PG2 PC1

Where to go from here

Between these examples, the commands used in the demo script, and the documentation, you should have enough material to try your own commands! So have fun experimenting with these VMs.

Note that in the meantime we released the Newton version (3.0.0), which also includes the initial OpenStackClient (OSC) interface, so I will probably update this to run on Newton and with some shiny “openstack sfc xxx” commands. I also hope to make a nicer-than-tcpdumping-around demo later on, when time permits.

What is “Service Function Chaining”?

This is the first article in a series about Service Function Chaining (SFC for short), and its OpenStack implementation, networking-sfc, that I have been working on.

The SFC acronym can easily appear in Software-defined networking (SDN), in a paper about Network function virtualization (NFV), in some IETF documents, … Some of these broader subjects use other names for SFC elements, but this is probably a good topic for another post/blog.
If you already know SFC elements, you can probably skip to the next blog post.


So what is this “Service Function Chaining”? Let me quote the architecture RFC:

The delivery of end-to-end services often requires various service functions. These include traditional network service functions such
as firewalls and traditional IP Network Address Translators (NATs),
as well as application-specific functions. The definition and instantiation of an ordered set of service functions and subsequent
“steering” of traffic through them is termed Service Function
Chaining (SFC).

I see SFC as a higher level of abstraction routing: in a typical network, you route all the traffic coming from Internet through a firewall box. So you set up the firewall system, with its network interfaces (Internet and intranet sides), and add some IP routes to steer the traffic through.
SFC uses the same concept, but with logical blocks: if a packet matches some conditions (it is Internet traffic), force it through a series of “functions” (in that case, only one function: a firewall system). And voilà, you have your Service function chain!

I like this simple comparison as it introduces most of the SFC elements:

  • service function: a.k.a. “bump in the wire”. This is a transparent system that you want some flows to go through (typical use cases: firewall, load balancer, analyzer).
  • flow classifier: the “entry point”, it determines if a flow should go through the chain. This can be based on IP attributes (source/dest adress/port, …), layer 7 attributes or even from metadata in the flow, set by a previous chain.
  • port pair:  as the name implies, this is a pair of ports (network interfaces) for a service function (the firewall in our example). The traffic is routed to the “in” port, and is expected to exit the VM through the “out” port. This can be the same port
  • port chain: the SFC object itself, a set of flow classifiers and a set of port pairs (that define the chain sequence).

An additional type not mentioned before is the port pair group: if you have multiple service functions of an identical type, you can regroup them to distribute the flows among them.

Use cases and advantages

OK, after seeing all these definitions, you may wonder “what’s the point?” What I have seen so far is that it allows:

  • complex routing made easier. Define a sequence of logical steps, and the flow will go through it.
  • HA deployments: add multiple VMS in a same group, and the load will be distributed between them.
  • dynamic inventory. Add or remove functions dynamically, either to scale a group (add a load balancer, remove an analyzer), change functions order, add a new function in the middle of some chain, …
  • complex classification. Flows can be classified based on L7 criterias, output from a previous chain (for example a Deep-Packet Inspection system).

Going beyond these technical advantages, you can read an RFC that is actually a direct answer to this question: RFC 7498

Going further

To keep a reasonable post length, I did not talk about:

  • How does networking-sfc tag traffic? Hint: MPLS labels
  • Service functions may or may not be SFC-aware: proxies can handle the SFC tagging
  • Upcoming feature: support for Network Service Header (NSH)
  • Upcoming feature: SFC graphs (allowing complex chains and chains of chains)
  • networking-sfc modularity: reference implementation uses OVS, but this is juste one of the possible drivers
  • Also, networking-sfc architecture in general
  • SFC use in VNF Forwarding Graphs (VNFFG)


SFC has abundant documentation, both in the OpenStack project and outside. Here is some additional reading if you are interested (mostly networking-sfc focused):

New job and new blog category

Sorry blog, this announcement comes late for you (I updated sites like Linkedin some time ago), but better late than never!

I got myself a new job in May, joining the Red Hat software developers working on OpenStack. More specifically, I will work mostly on the network parts: Neutron itself (the “networking as a service” main project), but also other related projects like Octavia (load balancer), image building, and more recently Service Function Chaining.

Working upstream on these projects, I plan to write some posts about them, which will be regrouped in a new OpenStack category. I am not sure yet about the format (short popularisation items and tutorials, long advanced technical topics, a mix of both, …), we will see. In all cases, I hope it will be of interest to some people 🙂

PS for Gentoo Universe readers: don’t worry, that does not mean I will switch all my Linux boxes to RHEL/CentOS/Fedora! I still have enough free time to work on Gentoo

Nextcloud available in portage, migration thoughts

I think there has been more than enough news on nextcloud origins, the fork from owncloud, … so I will keep this post to the technical bits.

Installing nextcloud on Gentoo

For now, the differences between owncloud 9 and nextcloud 9 are mostly cosmetic, so a few quick edits to owncloud ebuild ended in a working nextcloud one. And thanks to the different package names, you can install both in parallel (as long as they do not use the same database, of course).

So if you want to test nextcloud, it’s just a command away:

# emerge -a nextcloud

With the default webapp parameters, it will install alongside owncloud.

Migrating owncloud data

Nothing official again here, but as I run a small instance (not that much data) with simple sqlite backend, I could copy the data and configuration to nextcloud and test it while keeping owncloud.
Adapt the paths and web user to your setup: I have these webapps in the default /var/www/localhost/htdocs/ path, with www-server as the web server user.

First, clean the test data and configuration (if you logged in nextcloud):

# rm /var/www/localhost/htdocs/nextcloud/config/config.php
# rm -r /var/www/localhost/htdocs/nextcloud/data/*

Then clone owncloud’s data and config. If you feel adventurous (or short on available disk space), you can move these files instead of copying them:

# cp -a /var/www/localhost/htdocs/owncloud/data/* /var/www/localhost/htdocs/nextcloud/data/
# cp -a /var/www/localhost/htdocs/owncloud/config/config.php /var/www/localhost/htdocs/nextcloud/config/

Change all owncloud occurences in config.php to nextcloud (there should be only one, for ‘datadirectory’. And then run the (nextcloud) updater. You can do it via the web interface, or (safer) with the CLI occ tool:

# sudo -u www-server php /var/www/localhost/htdocs/nextcloud/occ upgrade

As with “standard” owncloud upgrades, you will have to reactivate additional plugins after logging in. Also check the nextcloud log for potential warnings and errors.

In my test, the only non-official plugin that I use, files_reader (for ebooks)  installed fine in nextcloud, and the rest worked as fine as owncloud with a lighter default theme 🙂
For now, owncloud-client works if you point it to the new /nextcloud URL on your server, but this can (and probably will) change in the future.

More migration tips and news can be found in this nextcloud forum post, including some nice detailed backup steps for mysql-backed systems migration.

Setting USE_EXPAND flags in package.use

This has apparently been supported in Portage for some time, but I only learned it recently from a gentoo-dev mail: you do not have to write down the expanded USE-flags in package.use anymore (or set them in make.conf)!

For example, if I wanted to set some APACHE2_MODULES and a custom APACHE2_MPM, the standard package.use entry would be something like:

www-servers/apache apache2_modules_proxy apache2_modules_proxy apache2_modules_proxy_http apache2_mpms_event ssl

Not as pretty/convenient as a ‘APACHE2_MODULES=”proxy proxy_http”‘ line in make.conf. Here is the best-of-both-worlds syntax (also supported in Paludis apparently):

www-servers/apache ssl APACHE2_MODULES: proxy proxy_http APACHE2_MPMS: event

Or if you use python 2.7 as your main python interpreter, set 3.4 for libreoffice-5.1 😉

app-office/libreoffice PYTHON_SINGLE_TARGET: python3_4

Have fun cleaning your package.use file

Testing clang 3.7.0 OpenMP support

Soon after llvm 3.7.0 reached the Gentoo tree, Jeremi Piotrowski opened two bugs to fix OpenMP support in our ebuilds. sys-devel/llvm-3.7.0-r1 (a revision bump that also now installs LLVM utilities) now has a post-install message with a short recap, but here is the blog version.

As detailed in upstream release notes, OpenMP in clang is disabled by default (and I kept this default in the Gentoo package), and the runtime is a separate package. So:

  • emerge sys-libs/libomp
  • add “-fopenmp=libomp” in your CFLAGS
  • check with a sample program like:
#include <stdio.h>
#include <omp.h>

int main() {
#pragma omp parallel
    printf("Hello from thread %d, nthreads %d\n", omp_get_thread_num(), omp_get_num_threads());
  • You should get multiple lines on STDOUT (default threads count is one per CPU):
% clang -fopenmp=libomp test_omp.c -o test_omp
% OMP_NUM_THREADS=6 ./test_omp
Hello from thread 4, nthreads 6
Hello from thread 5, nthreads 6
Hello from thread 1, nthreads 6
Hello from thread 3, nthreads 6
Hello from thread 0, nthreads 6
Hello from thread 2, nthreads 6

Some additional reading:

LLVM 3.7.0 released, Gentoo package changes

As announced here, the new LLVM release is now available. If you are interested after reading the release notes (or the clang ones), I just added the corresponding packages in Gentoo, although still masked for some additional testing. mesa-11.0 release candidates compile fine with it though.
2015/09/04 edit: llvm 3.7.0 is now unmasked!

On the Gentoo side, this version bump has some nice changes:

  • We now use the cmake/ninja build system: huge thanks to fellow developer mgorny for his hard work on this rewrite. This is the recommended build system upstream now, and some features will only be available with it
  • LLDB, the debugger  is now available via USE=lldb. This is not a separate package as the build system for llvm and subprojects is quite monolithic. clang is in the same situation: sys-devel/clang package currently does almost nothing, the magic is done in llvm[clang])
  • These new ebuilds include the usual series of fixes, including some recent ones by gienah

On the TODO list, some tests are still failing on 32bit multilib (bug #558604), ocaml doc compilation/installation has a strange bug (see the additional call on docs/ocaml_doc target in the ebuild). And let’s not forget adding other tools, like polly. Another recent suggestion: add llvm’s OpenMP library needed for OpenMP with clang.

Also, if people were still using the dragonegg GCC plugin, note that it is not released anymore upstream. As 3.6.x versions had compilation problems, this package received its last rites.