Why XORP sucks?

At first I’d like to make sure that I will not misunderstood. Following text isn’t the rant. I have been contributing to XORP for some time and continue to do so. I love hacking on network code, I like learning and the best way to learn is to write code. XORP is very promising and very unique project, but ... it could be much more ...

And to make it clear even more - I wrote this text because I don’t want to explain same stuff over and over again. I’m old man, I’m already realising that my time is limited here in earth and I better do other useful and/or interesting things than repeating myself ;). This also applies to feedback. Constructive feedback is welcome, but I will not even bother to answer to mails which have questions already answered here.

This is text addressed to everyone interested in topic somewhat - users and developers (although especially to the XORP and Vyatta developers).

Routing suites

At the moment there is three routing protocols suites actively developed in open source world and all three have their places under the sun. I’m using all three and have written code for all of them as well.

OpenBGD/OpenOSPFd/etc

Developed by OpenBSD project and bound closely to OpenBSD features. Portability isn’t the goal at all. Porting to BSD’s should be rather easy, Linux would be much more work. My quick hack “port” of really old version can be found here.

The goal is performance and integration with other OpenBSD router/firewall features - pf, altq, carp etc. No other routing software offers this. Excellent choice if OpenBSD is your platform, you want to build router/firewall for dual homed Internet connection or you want just fastest protocol implementation with smallest memory usage (at least regarding BGP).

The UNIX way - one program should do one thing, but do it well. That’s the biggest problem as well. It’s not the suite actually.

Zebra/Quagga

Quagga was forked from Zebra some time ago by community with some help from me. I have also been Quagga maintainer for some time, but I don’t work on it any more actively because of lack of time/motivation/etc.

Oldest one, therefore no wonder that it has biggest feature set. Really the suite of routing protocols. Process per protocol + zebra process for RIB. Provides most of stuff needed to implement new protocol. You can call it routing suite if you have to, but there comes problem - it’s only routing suite. Any attempt to implement something more on top of it except routing protocols is real pain - it’s not designed for that. Of course it has advantages and disadvantages. It doesn’t have the bloat of abstracted design, but isn’t suitable to build whole control plane solutions (I know about ZebOS, but ... cough).

Excellent choice if you will want to add routing protocols into solution on top of any Unix, need good integration between routing protocols etc, but don’t expect it to be IOS (it’s the mistake many users do). It’s ROUTING SUITE.

XORP

And here comes XORP - full control plane software. Forwarding plane abstraction, really powerful and universal IPC etc. Runs on top of Unixes, Windows and Click (and theoretically on any platform you write forwarding engine abstraction support for).

What you have to keep in mind that although it’s really powerful platform, it’s also the biggest (and therefore the slowest) and most complicated one. You can build fully integrated solutions on top of it (see what Vyatta does), but don’t ever expect the speed or memory usage of OpenBGPD from it - it’s not designed for that.

As you can see, we have three different platforms - from small and fast, but not really portable Unix daemon OpenBGPD to big and really portable full control plane solution - XORP. They are all different and all have their place where they have at least potential to shine by design over alternatives.

Whats wrong?

Now to the point ... Although XORP is even older than OpenBGPD, it really isn’t ready yet for production. It lacks most of features you’d expect from modern BGP implementation (I really don’t understand how I could work at all with no peer groups, no per peer policies and no way to reorder terms in policies), OSPF works somewhat, but has critical bugs yet which renders it unusable for production, CLI has only some very primitive commands available for verification etc. etc. The only area XORP shines in is multicast routing. And it’s not because it’s a lot more mature, but because it’s the only one in open source world.

Before someone will kick me about this - yes, I know about Vyatta, respect to these guys. In many areas they do what is needed to do to give open source alternatives to commercial routing platforms. But their work doesn’t actually change position of the XORP in the competition with alternatives. They are doing what I wouldn’t do at the moment - building full solution on top of XORP (that I wouldn’t do it, is my tragedy of course, not theirs, although it might become theirs as well ;).

Anyway, back to the point - why XORP isn’t mature yet? The lack of manpower of course. Every project needs developers, more is better. XORP has only one problem - it sucks too much to attract new developers. Although I have deepest respect to the people doing this hard job writing XORP, no jokes here. Let me explain ...

The kind of developers routing suites/platforms or any network software have to attract are developers like me. Developers like me have chance to be most productive. Who am I? I’m network administrator with some coding skills, quite deep knowledge about routers in general and with need to build some custom solutions from time to time. Developers like me can do a lot for open source projects - building solutions on top of these projects gives to us very good motivation to do testing, hunting and fixing bugs, implementing new features we need and even writing whole new modules/protocols etc. Being one of maintainers of the Quagga, I can say that most of contributions (from bugreports to the patches features) came from such people. There is another class of developers who can do a lot of contributions into open source projects - developers of (often) small companies who also build solutions on top of open source. But their motivations and methods of making choices are not different from mines.

It is very important to realise that there is one thing we don’t do though - we don’t go to rewrite core features. If we’ll use some platform as basis for solution, we expect core to be already in place, usable, having most of features we need etc.

So, what’s wrong with XORP? There are two big problems at the moment which keeps me away from building any solution on top of XORP:

These are actually two things where XORP should shine - powerful and universal IPC and CLI, but both of them have problems rendering them close to be unusable.

Performance

If you don’t know what I mean, look at this mail:

http://mailman.icsi.berkeley.edu/pipermail/xorp-hackers/2006-February/000680.html

Things have been improved since then, but not a lot. I have feeling that some Bugzilla entries are related as well. Anyway, I think that all agree that this is very far from being acceptable. Full BGP table isn’t the only problem, but is the problem every user trying to use XORP as AS border encounters probably. It shows a lot. It’s the application many of potential users would start deployment of the platform. It’s the first question most of BGP users will ask. Also, because communication between shell and daemons is done via IPC as well, how many seconds it would be acceptable to wait for you if you execute “show bgp routes | match xxx.xxx.xxx.xxx”? At the moment it takes 25 seconds on my recent laptop.

I would be even worse case - I’d like to use platform scaling to millions of routes. No, I haven’t done XORP tests with millions of routes ;).

It’s also important to realise that XORP can’t never be as fast in this test as Quagga and OpenBGPD can (in theory). In XORP routing info has to flow from protocol process through two processes to the fib (kernel routing table):

bgp -> rib -> fea -> fib

In Quagga routing info has to flow from protocol daemon through one process to the fib:

bgpd -> zebra -> fib

In OpenBGPD routing info is put directly into fib:

bgpd -> fib

It’s obvious that IPC and modular architecture has it’s cost.

Good news is that Marko Zec is working on the issue and there has been some progress, but as I said already, it’s not there yet. It’s already obvious that there is no single big problem, so it takes time.

CLI

I’ll not talk about tiny features like which output which command should have, which commands cli should have at all etc. That’s not the point. These are exactly issues developers like me can work with. I’ll talk about CLI (mostly about configuration part) in general.

At first, what’s the purpose of the CLI? Good CLI has to (IMHO):

Let’s look at every issue more closely.

"... have well defined structure that scales while keeping required space minimal ..."

Biggest issue in XORP CLI. Good example of well defined structure has been for long time IOS (industry standard cli ;P). With time it became obvious that IOS CLI has obvious scaling problems - have you seen the router with ten BGP peer groups and hundreds of BGP peers? I have this router ...

This is the place where Junos in fact redefined scaling CLI. You can have more nested configuration which allows much more fine control and gives much better overview. And there is real support to work with parts of configuration! XORP CLI mimics Junos CLI quite well here, but really fails to keep vertical space minimum.

The cause of most issues is very simple - pure technical limitations. XORP CLI has very limited set of constructs to use in configuration and behaviour of these constructs isn’t very flexible either. In fact that’s the problem all first time users see at first and obvious reaction is “what the crap?!”. I see it all the time.

Let’s look at the source of the problem at first - limited set of constructs. Whole configuration is built actually from three types of nodes:

node_name value {
}
node_name {
}
node_name: value

And that’s all. All nodes may appear only in form I wrote them here. You of course notice, that one obvious node type are missing: leaf node without value. Although it causes less trouble, I wish there should be nodes (both leaf and multi) without name as well.

Anyway, let’s look at what trouble it causes:

policy-statement testing {
    term testing {
        from {
            protocol: "rip"
        }
        then {
            accept {
            }
        }
    }
}

Look at “accept” - pure waste of space and purely because leaf node without value doesn’t exist at all.

fea {
    unicast-forwarding4 {
    }
    unicast-forwarding6 {
    }
}

Two useless line which don’t give any information. Now it’s real multinode, it can contain “disable: <bool>” leaf node (although it should be topic of another discussion), but it’s empty at the moment. I, as user, don’t really care whether it’s multinode or whatever (at least while viewing configuration).

Let’s return to the policy example and look at how it looks in Junos:

policy-statement testing {
    from protocol rip;
    then accept;
}

Term might not exist in Junos if there is only one in policy statement, but that’s not the point. Let’s look at how will it look like if we add some more constructs:

policy-statement testing {
    from {
        protocol rip;
        metric 5;
    }
    then accept;
}

The point is that Juniper engineers introduced great feature - collapsing multinodes. If multinode contains only one leaf node, it’s collapsed into one line. No info is lost for user (as I already said, it’s NOT important for user whether it’s multinode or not) and a lot of space is preserved. As we saw, typical redistribution policy takes 11 lines in XORP, 4 lines in Junos.

I promised not to touch stuff not directly related with CLI problems, but will eat my words now. I want to illustrate the point that design is also important regarding preserving vertical space. Let’s look at how interfaces are put into RIP domain:

rip {
    interface eth0 {
        vif eth0 {
            address 1.1.1.1 {
            }
        }
    }
}

Eight lines for info which can be given with three:

rip {
    interface 1.1.1.1;
}

or

rip {
    interface eth0/eth0;
}

Point being: address must be unique on interface anyway, so interface can be identified by single entry - by address or by interface/vif pair if interface has single address (or primary address concept implemented). And now imagine how it would be to describe router in RIP/OSPF/etc domain with 50 interfaces (it will be 302 vs. 52 lines of conf, fyi).

Why is preserving space so important? Real life configurations have trend to grow. To grow in all senses - take more space, be more complicated, more variables etc. It’s really important to do the best to help users to preserve control over configurations, especially routing policies. Terminal emulators users are using have very limited numbers of lines, you know ;). Every line counts. If space can be preserved without loss of information and sanity, it must be done. At the moment there is 2-3 times less info in the terminal with XORP than it is with Junos. This is far too much.

"... allow to describe everything you encouter in real life without any limitations real life doesn't set ..."

This includes a lot of issues and not yet implemented features in current CLI. And note that I haven’t interested in just implementations of examples or workarounds. I really need features themselves. Some highlights:

Missing conflicting nodes

Good example is “accept” and “reject” in policies - having both doesn’t make sense. If user enters “set then accept” and then “set then reject”, accept command should be removed from configuration. Or another example - OSPF area can’t be normal, nssa or stub same time.

Missing support for overloading leaf nodes

Very popular wish. Seems that nobody understands why user have to write “network4 x.x.x.x/y” and “network6 xxxx::/y”. It’s obvious that if network is IPv4 address, it’s IPv4 network? No? It isn’t about addresses only of course. For example interface can be identified by name or by IPv[46] address, so I want to have leaf node “intrface [ipv4 address|ipv6 address|name]”.

Missing support for arrays

It has to be illustarted. Example is taken from real life - flow route used to block random DDOS attack:

route cc-dos {
    match {
        destination x.x.x.x/32;
        protocol [ tcp udp ];
        destination-port [ 53 80 6666-6667 ];
        packet length [ <=100 140 ];
    }
    then discard;
}

Basically it’s match which matches packets with:

(destination x.x.x.x/32) AND (protocol tcp OR udp) AND (destination port 53 OR 80 OR in range 6666-6667) AND (packet length is less than 100 OR exactly 140)

Now try to do it with current XORP (or better with Vyatta, it has usable support for firewall ;).

More data types, even custom ones

Do you write IP addresses “x.x.x.x prefix-length y” in real life? No, you write “x.x.x.x/y”, so I want data type “ipv4 address with prefix”. I’d also like to have data types “ISO address”, “BGP community”, “interface name” etc. But note, that I don’t want to write much code for that, it should be in template language - just masks, ideally described via regexes.

"... be fully deterministic ..."

There is two issues:

First issue is at least slightly improved lately although I haven’t checked closely how far it goes. But second is still fully there. Imagine network-lists with thousands of entries, thousands of interfaces, thousands of BGP peers etc. This includes both configuration and “show” commands. Sorting should be everywhere where it makes sense:

Just wishlist

There are some more items from personal wishlist items for CLI, but these are not critical issues.

Comments

Imagine the situation where some feature will be deprecated, it would be really nice to warn users after upgrade to the newer software with comment in configuration:

## some_feature is deprecated, use other_feature instead
some_feature {
    ...
}

Or look at “show | display detail” in Junos router. Nice, isn’t it?

"deactivate" keyword

You can deactivate every node in CLI and router acts as it wouldn’t exist, but it stays in config with “deactivated: " note. It replaces fully these ugly “disable: true” keywords in current CLI, doesn’t need any support from daemons etc.

Whatever limitations I can think about

There is much of sanity check work which can be done just in shell or router manager. For example while trying to commit configuration with nonexistant policy, it’s CLI job to reject it already, no need to send messages to modules, get rejects, rollback etc.

And the status?

There is good news and there is bad news. Good news is that develeopers are aware that CLI sucks - see this bugzilla entry. Bad news is that we disagree about priority. I think that it’s most important for the project to fix core (ie. including CLI) to attract more developers and to take off really, but developers don’t think so - see this mail.

Although there are other areas in XORP core which need a lot improvement as well - logging, RIB (recursive nexthop resolving), etc - these are all not so important any more (there are people who disagree with me regarding this, I know). It’s important to show potential at least in basic core areas, in areas where XORP should shine by design.

Anyway, at the moment XORP will remain in status “won’t buy” for me and probably for others like me as well (of course, there are exceptions probably). Sorry, especially Vyatta guys, but you have chance to improve the situation as well ;).

See you in XORP/Vyatta bugzillas and lists ...