Tuesday, October 4, 2016

OVN and Containers

Untitled Document.md

Overview

Following up on my previous post, the subject of this discussion is OVN integration with containers. By the end of this lab we will have created a container host “VM” which houses a pair of containers. These containers will be tied direcly into an OVN logical switch and will be reachable directly from all VMs within the logical network.

The OVN Container Networking Model

According to the man page for ovn-architecture, OVN’s container networking strategy of choice is to use a VLAN trunk connection to the conainer host VM and require that traffic from each container be isolated within a unique VLAN. This, of course, means that some level of coordination must take place between OVN and the container host to ensure that they are in sync regarding which VLAN tag is being used for a given container. It also places a certain level of responsibility on the container host to make sure that containers are properly isolated from one another internally.

Going into a bit more detail, the basic idea is that within OVN you will create a logical port representing the connection to the host VM. You will then define logical ports for your containers, mapping them to the “parent” VM logical port, and defining a VLAN tag to use. OVN will then configure OVS flows which will map VLAN tagged traffic from the parent VM’s logical port to the appropriate container logical port. The diagram below illustrates this design.

ovn containers

The Existing Setup

Take a moment to review the current setup prior to continuing.

The lab network:

ovn lab

The OVN logical network:

ovn logical components

Defining the Logical Network

For this lab we are going to create a new fake “VM”, vm5, which will host our fake “containers”. The new VM will plug into the existing DMZ switch alongside vm1 and vm2. We’re going to use DHCP for both the new VM and its containers.

Before creating the logical port for vm5 we need to locate the DHCP options that we created for the DMZ network during the previous lab. We’ll query the OVN northbound DB directly for this information. Here is the output on my system.

root@ubuntu1:~# ovn-nbctl list DHCP_Options
_uuid               : 7e32cec4-957e-46fa-a4cc-34218e1e17c8
cidr                : "172.16.255.192/26"
external_ids        : {}
options             : {lease_time="3600", router="172.16.255.193", server_id="172.16.255.193", server_mac="02:ac:10:ff:01:93"}

_uuid               : c0c29381-c945-4507-922a-cb87f76c4581
cidr                : "172.16.255.128/26"
external_ids        : {}
options             : {lease_time="3600", router="172.16.255.129", server_id="172.16.255.129", server_mac="02:ac:10:ff:01:29"}

We want the UUID for the “172.16.255.128/26” network (c0c29381-c945-4507-922a-cb87f76c4581 in my case). Capture this UUID for use in later commands.

Lets create the logical port for vm5. This should look pretty familiar. Be sure to replace {uuid} with the UUID from the DHCP options entry above. From ubuntu1:

ovn-nbctl lsp-add dmz dmz-vm5
ovn-nbctl lsp-set-addresses dmz-vm5 "02:ac:10:ff:01:32 172.16.255.132"
ovn-nbctl lsp-set-port-security dmz-vm5 "02:ac:10:ff:01:32 172.16.255.132"
ovn-nbctl lsp-set-dhcpv4-options dmz-vm5 {uuid}

Now we will create the logical ports for the containers which live on vm5. This process is nearly identical to creating a normal logical port but with a couple of additional settings. From ubuntu1:

# create the logical port for c51
ovn-nbctl lsp-add dmz dmz-c51
ovn-nbctl lsp-set-addresses dmz-c51 "02:ac:10:ff:01:33 172.16.255.133"
ovn-nbctl lsp-set-port-security dmz-c51 "02:ac:10:ff:01:33 172.16.255.133"
ovn-nbctl lsp-set-dhcpv4-options dmz-c51 {uuid}

# set the parent logical port and vlan tag for c51
ovn-nbctl set Logical_Switch_Port dmz-c51 parent_name=dmz-vm5
ovn-nbctl set Logical_Switch_Port dmz-c51 tag=51

# create the logical port for c52
ovn-nbctl lsp-add dmz dmz-c52
ovn-nbctl lsp-set-addresses dmz-c52 "02:ac:10:ff:01:34 172.16.255.134"
ovn-nbctl lsp-set-port-security dmz-c52 "02:ac:10:ff:01:34 172.16.255.134"
ovn-nbctl lsp-set-dhcpv4-options dmz-c52 {uuid}

# set the parent logical port and vlan tag for c52
ovn-nbctl set Logical_Switch_Port dmz-c52 parent_name=dmz-vm5
ovn-nbctl set Logical_Switch_Port dmz-c52 tag=52

So the only real difference here is that we’ve set a parent_name and tag for the container logical ports. You can validate these by looking at the database entries. For example, the output on my system:

root@ubuntu1:~# ovn-nbctl find Logical_Switch_Port name="dmz-c51"
_uuid               : ea604369-14a9-4e25-998f-ec99c2e7e47e
addresses           : ["02:ac:10:ff:01:31 172.16.255.133"]
dhcpv4_options      : c0c29381-c945-4507-922a-cb87f76c4581
dhcpv6_options      : []
dynamic_addresses   : []
enabled             : []
external_ids        : {}
name                : "dmz-c51"
options             : {}
parent_name         : "dmz-vm5"
port_security       : ["02:ac:10:ff:01:31 172.16.255.133"]
tag                 : 51
tag_request         : []
type                : ""
up                  : false

Configuring vm5

The first thing to remember about this lab is that we’re not using real VMs, but rather simulating them as ovs internal ports directly on the Ubuntu hosts. For vm1-vm4 we’ve been creating these internal ports directly on br-int but for vm5 our requirements are a bit different so we’ll be using a dedicated ovs bridge. This bridge, called br-vm5, will not be managed by OVN and will simulate how you might actually configure an ovs bridge internal to a real container host VM. This bridge will provide local networking to both the VM and its containers and will be configured to perform VLAN tagging. Below is a diagram illustrating how things will look when we’re done.

ovn container lab

My lab setup is simple in that I’m placing the containers all on the same logical switch. However, there is no requirement to do this since I can place the container logical port on any logical switch that I wish.

Lets get started. The first step is to create the setup for vm5. We’re going to do this on the ubuntu2 host. From ubuntu2:

# create the bridge for vm5
ovs-vsctl add-br br-vm5

# create patch port on br-vm5 to br-int
ovs-vsctl add-port br-vm5 brvm5-brint -- set Interface brvm5-brint type=patch options:peer=brint-brvm5

# create patch port on br-int to br-vm5. set external id to dmz-vm5 since this is our connection to vm5 
ovs-vsctl add-port br-int brint-brvm5 -- set Interface brint-brvm5 type=patch options:peer=brvm5-brint
ovs-vsctl set Interface brint-brvm5 external_ids:iface-id=dmz-vm5

# create vm5 within a namespace. vm5 traffic will be untagged
ovs-vsctl add-port br-vm5 vm5 -- set interface vm5 type=internal
ip link set vm5 address 02:ac:10:ff:01:32
ip netns add vm5
ip link set vm5 netns vm5
ip netns exec vm5 dhclient vm5

Verify connectivity from vm5 by pinging its default gateway.

root@ubuntu2:~# ip netns exec vm5 ping 172.16.255.129
PING 172.16.255.129 (172.16.255.129) 56(84) bytes of data.
64 bytes from 172.16.255.129: icmp_seq=1 ttl=254 time=0.797 ms
64 bytes from 172.16.255.129: icmp_seq=2 ttl=254 time=0.509 ms
64 bytes from 172.16.255.129: icmp_seq=3 ttl=254 time=0.404 ms

Configuring the vm5 “Containers”

Now that vm5 is up and working we can configure its fake “containers”. These are going to look almost exactly like our fake “vms” with the exception that we’re configuring them for vlan tagging.

# create c51 within a namespace. c51 traffic will be tagged with vlan 51
ip netns add c51
ovs-vsctl add-port br-vm5 c51 tag=51 -- set interface c51 type=internal
ip link set c51 address 02:ac:10:ff:01:33
ip link set c51 netns c51
ip netns exec vm5 dhclient c51

# create c52 within a namespace. c52 traffic will be tagged with vlan 52
ip netns add c52
ovs-vsctl add-port br-vm5 c52 tag=52 -- set interface c52 type=internal
ip link set c52 address 02:ac:10:ff:01:34
ip link set c52 netns c52
ip netns exec c52 dhclient c52

Verify connectivity.

root@ubuntu2:~# ip netns exec c51 ping 172.16.255.129
PING 172.16.255.129 (172.16.255.129) 56(84) bytes of data.
64 bytes from 172.16.255.129: icmp_seq=1 ttl=254 time=1.33 ms
64 bytes from 172.16.255.129: icmp_seq=2 ttl=254 time=0.420 ms
64 bytes from 172.16.255.129: icmp_seq=3 ttl=254 time=0.371 ms

root@ubuntu2:~# ip netns exec c52 ping 172.16.255.129
PING 172.16.255.129 (172.16.255.129) 56(84) bytes of data.
64 bytes from 172.16.255.129: icmp_seq=1 ttl=254 time=1.53 ms
64 bytes from 172.16.255.129: icmp_seq=2 ttl=254 time=0.533 ms
64 bytes from 172.16.255.129: icmp_seq=3 ttl=254 time=0.355 ms

Final Words

As per the ovn-architecture guide, if you run containers directly on a hypervisor or otherwise patch them directly into the integration bridge then they have the potential to completely bog down an OVN system depending on the scale of your setup. The “nested” network solution is nice in that it greatly reduces the number of VIFs on the integration bridge and thus minimizes the performance hit which containers would otherwise entail. Again, the point of this exercise was not to set up a real world container simulation, but rather to demonstrate the in-built container networking feature set of OVN.

Monday, October 3, 2016

OVN and ACLs

Untitled Document.md

Overview

Building upon my previous post I will now examine basic network security using OVN Access Control Lists.

OVN Access Control Lists and Address Sets

ACLs within OVN are implemented in the ACL table of the northbound DB and may be implemented using the acl commands of ovn-nbctl. At present, ACLs may only be applied to logical switches but it is conceivable that the ability to apply them to routers would be a future enhancement.

ACLs are applied and evaluated in one of two directions: ingress to a logical port from a workload (to-lport) and egress from a logical port to a workload (from-lport). Additionally, every ACL is assigned a priority which determines their order of evaluation. The highest priority is evaluated first and ACLs may be given identical priorities. However, in the case of two ACLs having an identical priority and both matching a given packet, only one will be evaluated. Exactly which one ultimately matches is indeterminite meaning you can’t really be sure which rule will be applied in a given situation. Moral of the story: try to use unique priorities in most cases.

The rules for matching in ACLs are based on the flow syntax from OVS and should look familiar to anyone with a programming background. The syntax is explained in the “Logical_Flow table” section of the man page for ovn-sb. Its worth a read. In particular you should pay attention to the section discussing the “!=” match rule. It is also worth highlighting the point that you can not create an ACL matching on a port with type=router.

In order to reduce the number of entries in the ACL table you may make use of address sets which define groups of identical type addresses. For example, a group of IPv4 addresses/networks, a group of mac addresses, or a group of IPv6 addresses may be placed within a named address set. Address sets can then be referenced by name (in the form of $name) within the match clause of an ACL.

Lets run through some samples.

# allow all ip traffic from port "ls1-vm1" on switch "ls1" and allowing related connections back in
ovn-nbctl acl-add ls1 from-lport 1000 "inport == \"ls1-vm1\" && ip" allow-related

# allow ssh to ls1-vm1
ovn-nbctl acl-add ls1 to-lport 999 "outport == \"ls1-vm1\" && tcp.dst == 22" allow-related

# block all IPv4/IPv6 traffic to ls1-vm1
ovn-nbctl acl-add ls1 to-lport 998 "outport == \"ls1-vm1\" && ip" drop

Note the use of “allow-related”. This is doing exactly what it seems like in that, under the covers, it is permitting related traffic through in the opposite direction (eg. responses, fragements, etc…). In the second rule I have used allow-related in order to allow responses to ssh back from the server.

Lets take a look at address sets. As mentioned earlier, address sets are groups of addresses of the same type. Address sets are created using the database commands of ovn-nbctl and are applied to ACLs using the name of the address set. Here are some examples:

ovn-nbctl create Address_Set name=wwwServers addresses=172.16.1.2,172.16.1.3
ovn-nbctl create Address_Set name=www6Servers addresses=\"fd00::1\",\"fd00::2\"
ovn-nbctl create Address_Set name=macs addresses=\"02:00:00:00:00:01\",\"02:00:00:00:00:02\"

Note the use of double quotes with address sets containing the “:” character. You’ll get an error if you don’t quote these.

Lab Testing

Lets experiment with ACLs in our lab environment. Here is a quick review of the setup.

The lab network:

ovn lab

The OVN logical network:

ovn logical components

As a first step we’ll open up external access to the servers in our DMZ tier by creating a static NAT rule for each of them.

From ubuntu1:

# create snat-dnat rule for vm1 & apply to edge1
ovn-nbctl -- --id=@nat create nat type="dnat_and_snat" logical_ip=172.16.255.130 \
external_ip=10.127.0.250 -- add logical_router edge1 nat @nat

# create snat-dnat rule for vm2 & apply to edge1
ovn-nbctl -- --id=@nat create nat type="dnat_and_snat" logical_ip=172.16.255.131 \
external_ip=10.127.0.251 -- add logical_router edge1 nat @nat

Test connectivity from ubuntu1.

root@ubuntu1:~# ping 10.127.0.250
PING 10.127.0.250 (10.127.0.250) 56(84) bytes of data.
64 bytes from 10.127.0.250: icmp_seq=1 ttl=62 time=2.57 ms
64 bytes from 10.127.0.250: icmp_seq=2 ttl=62 time=1.23 ms
64 bytes from 10.127.0.250: icmp_seq=3 ttl=62 time=0.388 ms

root@ubuntu1:~# ping 10.127.0.251
PING 10.127.0.251 (10.127.0.251) 56(84) bytes of data.
64 bytes from 10.127.0.251: icmp_seq=1 ttl=62 time=3.15 ms
64 bytes from 10.127.0.251: icmp_seq=2 ttl=62 time=1.52 ms
64 bytes from 10.127.0.251: icmp_seq=3 ttl=62 time=0.475 ms

Also, check that the VMs can connect to the outside world using the proper IPs.

root@ubuntu2:~# ip netns exec vm1 ping 10.127.0.130
PING 10.127.0.130 (10.127.0.130) 56(84) bytes of data.
64 bytes from 10.127.0.130: icmp_seq=1 ttl=62 time=3.05 ms

root@ubuntu3:~# ip netns exec vm2 ping 10.127.0.130
PING 10.127.0.130 (10.127.0.130) 56(84) bytes of data.
64 bytes from 10.127.0.130: icmp_seq=1 ttl=62 time=1.87 ms

root@ubuntu1:~# tcpdump -i br-eth1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
17:51:01.055258 IP 10.127.0.250 > 10.127.0.130: ICMP echo request, id 4565, seq 12, length 64
17:51:01.055320 IP 10.127.0.130 > 10.127.0.250: ICMP echo reply, id 4565, seq 12, length 64

17:51:56.378089 IP 10.127.0.251 > 10.127.0.130: ICMP echo request, id 4301, seq 6, length 64
17:51:56.378160 IP 10.127.0.130 > 10.127.0.251: ICMP echo reply, id 4301, seq 6, length 64

Excellent. We can see from the tcpdump on ubuntu1 that the VMs are using their proper NAT addresses. Lets apply some security policy. Firstly, we’ll completely lock down the DMZ.

# default drop
ovn-nbctl acl-add dmz to-lport 900 "outport == \"dmz-vm1\" && ip" drop
ovn-nbctl acl-add dmz to-lport 900 "outport == \"dmz-vm2\" && ip" drop

Lets do a quick access check from ubuntu1.

root@ubuntu1:~# ping 10.127.0.250
PING 10.127.0.250 (10.127.0.250) 56(84) bytes of data.
^C
--- 10.127.0.250 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1007ms

root@ubuntu1:~# ping 10.127.0.251
PING 10.127.0.251 (10.127.0.251) 56(84) bytes of data.
^C
--- 10.127.0.251 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1007ms

DMZ servers are now unreachable externally, however we’ve also managed to kill their outbound access.

root@ubuntu2:~# ip netns exec vm1 ping 10.127.0.130
PING 10.127.0.130 (10.127.0.130) 56(84) bytes of data.
^C
--- 10.127.0.130 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1008ms

root@ubuntu3:~# ip netns exec vm2 ping 10.127.0.130
PING 10.127.0.130 (10.127.0.130) 56(84) bytes of data.
^C
--- 10.127.0.130 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1008ms

Lets fix that.

# allow all ip trafficand allowing related connections back in
ovn-nbctl acl-add dmz from-lport 1000 "inport == \"dmz-vm1\" && ip" allow-related
ovn-nbctl acl-add dmz from-lport 1000 "inport == \"dmz-vm2\" && ip" allow-related

And verify.

root@ubuntu2:~# ip netns exec vm1 ping 10.127.0.130
PING 10.127.0.130 (10.127.0.130) 56(84) bytes of data.
64 bytes from 10.127.0.130: icmp_seq=1 ttl=62 time=4.16 ms
64 bytes from 10.127.0.130: icmp_seq=2 ttl=62 time=3.07 ms

root@ubuntu3:~# ip netns exec vm2 ping 10.127.0.130
PING 10.127.0.130 (10.127.0.130) 56(84) bytes of data.
64 bytes from 10.127.0.130: icmp_seq=1 ttl=62 time=3.59 ms
64 bytes from 10.127.0.130: icmp_seq=2 ttl=62 time=2.30 ms

Lets allow inbound https to the DMZ servers.

# allow tcp 443 in and related connections back out
ovn-nbctl acl-add dmz to-lport 1000 "outport == \"dmz-vm1\" && tcp.dst == 443" allow-related
ovn-nbctl acl-add dmz to-lport 1000 "outport == \"dmz-vm2\" && tcp.dst == 443" allow-related

Lets verify. For this we’ll need something listening on tcp 443. I like to use ncat, so the first step is to install it on all 3 Ubuntu hosts. It is actually part of the nmap package.

apt-get -y install nmap

Now we can start a process to listen on 443. The process will terminate at the end of the connection but you can use the -k flag to keep it going if you want.

From ubuntu2:

ip netns exec vm1 ncat -l -p 443

From ubuntu3:

ip netns exec vm2 ncat -l -p 443

And check connectivity from ubuntu1. If the connection succeeds then it will stay open until you terminate it. If not, then it should time out after 1 second.

root@ubuntu1:~# ncat -w 1 10.127.0.250 443
^C

root@ubuntu1:~# ncat -w 1 10.127.0.251 443
^C

That worked. Lets secure the “inside” servers as well. We’ll make it really tight, blocking all outbound access and only allowing access to tcp 3306 from the dmz. We’ll use an address set for allowing access from the DMZ. Note the use of the single quotes in the “acl-add” commands for allowing access to 3306. This is important. We’re actually referring to the address set by its literal name prefixed with a ‘$’ character. We don’t want bash to interpret this as a variable which is why we use the single quote.

# create an address set for the dmz servers. they fall within a common /31
ovn-nbctl create Address_Set name=dmz addresses=\"172.16.255.130/31\"

# allow from dmz on 3306
ovn-nbctl acl-add inside to-lport 1000 'outport == "inside-vm3" && ip4.src == $dmz && tcp.dst == 3306' allow-related
ovn-nbctl acl-add inside to-lport 1000 'outport == "inside-vm4" && ip4.src == $dmz && tcp.dst == 3306' allow-related

# default drop
ovn-nbctl acl-add inside to-lport 900 "outport == \"inside-vm3\" && ip" drop
ovn-nbctl acl-add inside to-lport 900 "outport == \"inside-vm4\" && ip" drop

Again, we’ll use ncat to listen on our VMs but this time well start it on vm3/vm4

From ubuntu2:

ip netns exec vm3 ncat -l -p 3306

From ubuntu3:

ip netns exec vm4 ncat -l -p 3306

Check connectivity from dmz to inside:

root@ubuntu2:~# ip netns exec vm1 ncat -w 1 172.16.255.195 3306
^C

root@ubuntu3:~# ip netns exec vm2 ncat -w 1 172.16.255.194 3306
    ^C

That seems to have worked. One final check. Lets make sure that vm3/vm4 are isolated from each other.

root@ubuntu2:~# ip netns exec vm3 ncat -w 1 172.16.255.195 3306
Ncat: Connection timed out.

root@ubuntu3:~# ip netns exec vm4 ncat -w 1 172.16.255.194 3306
Ncat: Connection timed out.

Clean up

Be sure to remove the ACLs and address sets prior to quitting. From ubuntu1:

ovn-nbctl acl-del dmz
ovn-nbctl acl-del inside
ovn-nbctl destroy Address_Set dmz

Final Words

Traditionally it has been that firewalling was performed on an in-path, layer-3 device such as a router or dedicated firewall appliance. In this regard it may seem odd that OVN applies policy on the logical switch, however in reality this approach is advantageous in that it creates an easy way to secure workloads in an east-west fashion by enforcing security at the logical port level. This approach to security has come to be known as “micro segmentation” due to the fact that it allows an administrator to apply security policy in a very fine grained manner. When you think about it, much of the heirarchy that exists in traditional network design (think web-tier/db-tier) is due to the fact that network security could previously only be done on some central appliance which sat between tiers. The micro segmentation approach actually allows you to flatten your designs to the point where you may end up having a single logical switch for everything with the “tiers” describing the security policy rather than describing the network layout.

In the next post I will cover the OVN model for enabling container networking.

Thursday, September 29, 2016

The OVN Load Balancer

Untitled Document.md

Overview

Building upon my previous post I will explore the the load balancing feature of OVN. But before getting started, lets review the setup created in the last lab.

The lab network:

ovn lab

The OVN logical network:

ovn logical components

The OVN Load Balancer

The OVN load balancer is intended to provide very basic load balancing services to workloads within the OVN logical network space. Due to its simple feature set it is not designed to replace dedicated appliance-based load balancers which provide many more bells & whistles for advanced used cases.

The load balancer uses a hash-based algorithm to balance requests for a VIP to an associated pool of IP addresses within logical space. Since the hash is calculated using the headers of the client request the balancing should appear fairly random, with each individual client request getting stuck to a particular member of the load balancing pool for the duration of the connection. Load balancing in OVN may be applied to either a logical switch or a logical router. The choice of where to apply the feature depends on your specific requirements. There are caveats to each approach.

When applied to a logical router, the following considerations need to be kept in mind:

  1. Load balancing may only be applied to a “centralized” router (ie. a gateway router).
  2. Due to point #1, load balancing on a router is a non-distributed service.

When applied to a logical switch, the following considerations need to be kept in mind:

  1. Load balancing is “distributed” in that it is applied on potentially multiple OVS hosts.
  2. Load balancing on a logical switch is evaluted only on traffic ingress from a VIF. This means that it must be applied on the “client” logical switch rather than on the “server” logical switch.
  3. Due to point #2, you may need to apply the load balancing to many logical switches depending on the scale of your design.

Using Our Fake “VMs” as Web Servers

In order to demonstrate the load balancer we want to create a pair of web servers in our “dmz” which will serve up uniquely identifiable files. In order to keep things simple, we’ll use a single line python web server running in our vm1/vm2 namespaces. Lets kick things off by starting up our web servers.

From ubuntu2:

mkdir /tmp/www
echo "i am vm1" > /tmp/www/index.html
cd /tmp/www
ip netns exec vm1 python -m SimpleHTTPServer 8000

From ubuntu3:

mkdir /tmp/www
echo "i am vm2" > /tmp/www/index.html
cd /tmp/www
ip netns exec vm2 python -m SimpleHTTPServer 8000

The above commands will create a web server, listening on TCP 8000, which will serve up a file that can be used to identify the vm which served the file.

We’ll also want to be able to test connectivity to our web servers. For this we’ll use curl from the global namespace of our Ubuntu hosts. Be sure to install curl on them if its not already.

apt-get -y install curl

Configuring the Load Balancer Rules

As a first step we’ll need to define our load balancing rules; namely the VIP and the back-end server IP pool. All that is involved here is to create an entry in the OVN northbound DB and capture the resulting UUID. For our testing we’ll use the VIP 10.127.0.254 which resides within the “data” network in the lab. We’ll use the addresses of vm1/vm2 as our pool IPs.

From ubuntu1:

uuid=`ovn-nbctl create load_balancer vips:10.127.0.254="172.16.255.130,172.16.255.131"`
echo $uuid

The above command creates an entry in the load_balancer table of the northbound DB and stores the resulting UUID to the variable “uuid”. We’ll reference this variable in later commands.

The Gateway Router As a Load Balancer

Lets apply our load balancer profile to the OVN gateway router “edge1”.

From ubuntu1:

ovn-nbctl set logical_router edge1 load_balancer=$uuid

You can verify that this was applied by checking the database entry for edge1.

ovn-nbctl get logical_router edge1 load_balancer

Now, from the global namespace of any of the Ubuntu hosts we can attempt to connect to the VIP.

root@ubuntu1:~# curl 10.127.0.254:8000
i am vm2
root@ubuntu1:~# curl 10.127.0.254:8000
i am vm1
root@ubuntu1:~# curl 10.127.0.254:8000
i am vm2

I ran the above tests several times and the load balancing appeared to be quite random.

Lets see what happens if we disable one of our web servers. Try stopping the python process running in the vm1 namespace. Here is what I got:

root@ubuntu1:~# curl 10.127.0.254:8000
curl: (7) Failed to connect to 10.127.0.254 port 8000: Connection refused
root@ubuntu1:~# curl 10.127.0.254:8000
i am vm2

As you can see, the load balancer is not performing any sort of health checking. At present, the assumption is that health checks would be performed by an orchestration solution such as Kubernetes but it would be resonable to assume that this feature would be added at some future point.

Restart the python web server on vm1 before moving to the next test.

Load balancing works externally, but lets see what happens when we try to access the VIP from an internal VM. Try using curl from vm3 on ubuntu2:

root@ubuntu2:~# ip netns exec vm3 curl 10.127.0.254:8000
i am vm1
root@ubuntu2:~# ip netns exec vm3 curl 10.127.0.254:8000
i am vm2

Nice. This seems to work as well, but also raises an interesting point. Take a second look at the logical diagram for our OVN network and think about the traffic flow for the curl request from vm3. Also, look at the logs from the python web server. Mine are below:

10.127.0.130 - - [29/Sep/2016 09:53:44] "GET / HTTP/1.1" 200 -
10.127.0.129 - - [29/Sep/2016 09:57:42] "GET / HTTP/1.1" 200 -

Notice the client IP addresses in the logs. The first is from ubuntu1 per the previous round of tests. The second IP is edge1 itself and is from the request from vm3. Why is the request coming from edge1 rather than from vm3 directly? The answer is that the OVN developer who implemented load balancing took into account something known as “proxy mode” where the load balancer hides the client side IP under certain circumstances. Why is this necessary? Think about what would happen if the web server saw the real IP of vm3. The response from the server would route back directly to vm3, bypassing the load balancer on edge1. From the perspective of vm3 it would look like it made a request to the VIP but received a reply from the real IP of one of the web servers. This obviously would not work which is why proxy-mode functionality is important.

Lets remove the load balancer profile and move on to a second round of tests.

ovn-nbctl clear logical_router edge1 load_balancer
ovn-nbctl destroy load_balancer $uuid

Configuring the Load Balancer On a Logical Switch

Lets see what happens when we apply the load balancing rules to various logical switches within our setup. Since we’re moving load balancing away from the edge the first step we need is to create a new load balancer profile with an internal VIP. We’ll use 172.16.255.62 for this.

From ubuntu1:

uuid=`ovn-nbctl create load_balancer vips:172.16.255.62="172.16.255.130,172.16.255.131"`
echo $uuid

As a first test lets apply it to the “inside” logical switch.

From ubuntu1:

# apply and verify
ovn-nbctl set logical_switch inside load_balancer=$uuid
ovn-nbctl get logical_switch inside load_balancer

And test from vm3 (which resides on “inside”):

root@ubuntu2:~# ip netns exec vm3 curl 172.16.255.62:8000
i am vm1
root@ubuntu2:~# ip netns exec vm3 curl 172.16.255.62:8000
i am vm1
root@ubuntu2:~# ip netns exec vm3 curl 172.16.255.62:8000
i am vm2

This seems to work. Lets remove the load balancer from “inside” and apply it to “dmz”.

From ubuntu1:

ovn-nbctl clear logical_switch inside load_balancer
ovn-nbctl set logical_switch dmz load_balancer=$uuid
ovn-nbctl get logical_switch dmz load_balancer

And again test from vm3:

root@ubuntu2:~# ip netns exec vm3 curl 172.16.255.62:8000
^C

No good. It hangs. Lets try from vm1 (which resides on “dmz”):

root@ubuntu2:~# ip netns exec vm1 curl 172.16.255.62:8000
^C

Also no good. This highlights the requirement that load balancing be applied on the client’s logical switch rather than the server’s logical switch.

Be sure to clean up. From ubuntu1:

ovn-nbctl clear logical_switch dmz load_balancer
ovn-nbctl destroy load_balancer $uuid

Final Words

Basic load balancing is a very “nice to have” feature. Given that it is built directly into OVN means one less piece of software to deploy within your SDN. While the feature set is minimal, I actually think that it covers the needs of a very broad set of users. Given time, I also expect that certain limitations such as lack of health checking will be addressed.

In my next post I will look into network security using OVN.

Tuesday, September 27, 2016

The OVN Gateway Router

Untitled Document.md

Overview

Building upon my previous post I will now add an OVN gateway router into the lab setup. This gateway router will provide access to the lab network from within our overlay network.

The Lab

In order to demonstrate the gateway router we will need to add another physical network to our Ubuntu hosts. For my purposes I will add the network 10.127.0.128/25 via eth1 of each of my hosts. The final lab diagram is illustrated below.

ovn lab

Introducing the OVN L3 Gateway

An OVN Gateway serves as an onramp/offramp between the overlay network and the physical network. They come in two flavors: layer-2 which bridge an OVN logical switch into a VLAN, and layer-3 which provide a routed connection between an OVN router and the physical network. For the purposes of this lab we will focus on creating a layer-3 gateway which will serve as the demarcation point between our physical and logical networks.

Unlike a distributed logical router (DLR), an OVN gateway router is centralized on a single host (chassis) so that it may provide services which cannot yet be distributed (NAT, load balancing, etc…). As of this publication there is a restriction that gateway routers may only connect to other routers via a logical switch, whereas DLRs may connect to one other directly via a peer link. Work is in progress to remove this restriction.

It should be noted that it is possible to have multiple gateway routers tied into an environment, which means that it is possible to perform ingress ECMP routing into logical space. However, it is worth mentioning that OVN currently does not support egress ECMP between gateway routers. Again, this is being looked at as a future enhancement.

Make ubuntu1 an OVN Host

Rather than using a host which houses VMs, lets use ubuntu1 for our OVN gateway router host. Begin by installing the proper OVN packages for the host role:

dpkg -i ovn-host_2.6.0-1_amd64.deb

And then register with OVN Central (itself):

ovs-vsctl set open . external-ids:ovn-remote=tcp:127.0.0.1:6642
ovs-vsctl set open . external-ids:ovn-encap-type=geneve
ovs-vsctl set open . external-ids:ovn-encap-ip=10.127.0.2

And verify connectivity:

root@ubuntu1:~# netstat -antp | grep 127.0.0.1
tcp        0      0 127.0.0.1:6642          127.0.0.1:55566         ESTABLISHED 4999/ovsdb-server
tcp        0      0 127.0.0.1:55566         127.0.0.1:6642          ESTABLISHED 15212/ovn-controlle

Also, be sure to create the integration bridge if wasn’t created automatically by OVN:

ovs-vsctl add-br br-int -- set Bridge br-int fail-mode=secure

OVN Logical Design

Lets review the planned design before we start configuring things. The OVN logical network we are creating is illustrated below.

ovn logical components

As you can see we are adding the following new components:

  • OVN gateway router (edge1)
  • logical switch (transit) used to connect the edge1 and tenant1 routers
  • logical switch (outside) used to connect edge1 to the lab network

Adding the L3 Gateway

As mentioned earlier the gatway router will be bound to a specific chassis (ubuntu1 in our case). In order to accomplish this binding we will need to locate the chassis id for ubuntu1. Using the ovn-sbctl show command from ubuntu1 you should see output similar to this:

ovn-sbctl show
Chassis "833ae1bd-ced3-494a-a95b-f2dc54172b71"
        hostname: "ubuntu1"
        Encap geneve
                ip: "10.127.0.2"
                options: {csum="true"}
Chassis "239f2c28-90ff-468f-a701-655585c630bf"
        hostname: "ubuntu3"
        Encap geneve
                ip: "10.127.0.3"
                options: {csum="true"}
        Port_Binding "dmz-vm2"
        Port_Binding "inside-vm4"
Chassis "517d558e-158a-4cb2-8870-283e9d39685e"
        hostname: "ubuntu2"
        Encap geneve
                ip: "10.127.0.129"
                options: {csum="true"}
        Port_Binding "inside-vm3"
        Port_Binding "dmz-vm1"

Copy the Chassis UUID of the ubuntu1 host for use below.

Create the new logical router. Be sure to substitute {chassis_id} with a valid UUID. From ubuntu1:

# create router edge1
ovn-nbctl create Logical_Router name=edge1 options:chassis={chassis_uuid}

# create a new logical switch for connecting the edge1 and tenant1 routers
ovn-nbctl ls-add transit

# edge1 to the transit switch
ovn-nbctl lrp-add edge1 edge1-transit 02:ac:10:ff:00:01 172.16.255.1/30
ovn-nbctl lsp-add transit transit-edge1
ovn-nbctl lsp-set-type transit-edge1 router
ovn-nbctl lsp-set-addresses transit-edge1 02:ac:10:ff:00:01
ovn-nbctl lsp-set-options transit-edge1 router-port=edge1-transit

# tenant1 to the transit switch
ovn-nbctl lrp-add tenant1 tenant1-transit 02:ac:10:ff:00:02 172.16.255.2/30
ovn-nbctl lsp-add transit transit-tenant1
ovn-nbctl lsp-set-type transit-tenant1 router
ovn-nbctl lsp-set-addresses transit-tenant1 02:ac:10:ff:00:02
ovn-nbctl lsp-set-options transit-tenant1 router-port=tenant1-transit

# add static routes
ovn-nbctl lr-route-add edge1 "172.16.255.128/25" 172.16.255.2
ovn-nbctl lr-route-add tenant1 "0.0.0.0/0" 172.16.255.1

ovn-sbctl show

Notice the port bindings on the ubuntu1 host. You can now test connectivity to edge1 from vm1 on ubuntu2:

root@ubuntu2:~# ip netns exec vm1 ping 172.16.255.1
PING 172.16.255.1 (172.16.255.1) 56(84) bytes of data.
64 bytes from 172.16.255.1: icmp_seq=1 ttl=253 time=1.07 ms
64 bytes from 172.16.255.1: icmp_seq=2 ttl=253 time=1.13 ms
64 bytes from 172.16.255.1: icmp_seq=3 ttl=253 time=1.00 ms

Connecting to the “data” Network

We’re going to use the eth1 interface of ubuntu1 as our connection point between the edge1 router and the “data” network. In order to accomplish this we’ll need to set up OVN to used the eth1 interface directly through a dedicated OVS bridge. This type of connection is known as a “localnet” in OVN.

# create new port on router 'edge1'
ovn-nbctl lrp-add edge1 edge1-outside 02:0a:7f:00:01:29 10.127.0.129/25

# create new logical switch and connect it to edge1
ovn-nbctl ls-add outside
ovn-nbctl lsp-add outside outside-edge1
ovn-nbctl lsp-set-type outside-edge1 router
ovn-nbctl lsp-set-addresses outside-edge1 02:0a:7f:00:01:29
ovn-nbctl lsp-set-options outside-edge1 router-port=edge1-outside

# create a bridge for eth1
ovs-vsctl add-br br-eth1

# create bridge mapping for eth1. map network name "dataNet" to br-eth1
ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=dataNet:br-eth1

# create localnet port on 'outside'. set the network name to "dataNet"
ovn-nbctl lsp-add outside outside-localnet
ovn-nbctl lsp-set-addresses outside-localnet unknown
ovn-nbctl lsp-set-type outside-localnet localnet
ovn-nbctl lsp-set-options outside-localnet network_name=dataNet

# connect eth1 to br-eth1
ovs-vsctl add-port br-eth1 eth1

Test connectivity to edge1-outside from vm1

root@ubuntu2:~# ip netns exec vm1 ping 10.127.0.129
PING 10.127.0.129 (10.127.0.129) 56(84) bytes of data.
64 bytes from 10.127.0.129: icmp_seq=1 ttl=253 time=1.74 ms
64 bytes from 10.127.0.129: icmp_seq=2 ttl=253 time=0.781 ms
64 bytes from 10.127.0.129: icmp_seq=3 ttl=253 time=0.582 ms

Giving the Ubuntu Hosts Access to the “data” Network

Lets give the Ubuntu hosts a presence on the data network. For ubuntu2/ubuntu3 its simply a matter of setting an IP on their physical nics (eth1 in my setup). For ubuntu1 we’ll set an IP on the br-eth1 interface.

On ubuntu1:

ip addr add 10.127.0.130/24 dev br-eth1
ip link set br-eth1 up

On ubuntu2:

ip addr add 10.127.0.131/24 dev eth1
ip link set eth1 up

On ubuntu3:

ip addr add 10.127.0.132/24 dev eth1
ip link set eth1 up

Testing from ubuntu1 to edge1

root@ubuntu1:~# ping 10.127.0.129  
PING 10.127.0.129 (10.127.0.129) 56(84) bytes of data.
64 bytes from 10.127.0.129: icmp_seq=1 ttl=254 time=0.563 ms
64 bytes from 10.127.0.129: icmp_seq=2 ttl=254 time=0.290 ms
64 bytes from 10.127.0.129: icmp_seq=3 ttl=254 time=0.333 ms

Configuring NAT

Lets see what happens when we attempt to ping ubuntu1 from vm1:

root@ubuntu2:~# ip netns exec vm1 ping 10.127.0.130
PING 10.127.0.130 (10.127.0.130) 56(84) bytes of data.
^C
--- 10.127.0.130 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2016ms

Unsurprisingly, this does not work. Why not? Lets look at the output of a tcpdump from ubuntu1:

root@ubuntu1:~# tcpdump -i br-eth1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
14:41:53.057993 IP 172.16.255.130 > 10.127.0.130: ICMP echo request, id 19359, seq 1, length 64
14:41:54.065696 IP 172.16.255.130 > 10.127.0.130: ICMP echo request, id 19359, seq 2, length 64
14:41:55.073667 IP 172.16.255.130 > 10.127.0.130: ICMP echo request, id 19359, seq 3, length 64

We can see the requests coming in, however our responses are returning through a different interface (not seen in the tcpdump output). This is due to the fact that ubuntu1 has no route to 172.16.255.130 and is responding via its own default gateway. In order to get things working we will need to take 1 of 2 possible approaches:

  1. add static routes on the Ubuntu hosts
  2. set up NAT on the OVN gateway router

We’ll opt for option 2 because it is much less of a hassle than trying to manage static routes.

With OVN there are 3 types of NAT which may be configured:

  • DNAT – used to translate requests to an externally visible IP to an internal IP
  • SNAT – used to translate requests from one or more internal IPs to an externally visible IP
  • SNAT-DNAT – used to create a “static NAT” where an external IP is mapped to an internal IP, and vice versa

Since we don’t need (or want) the public network to be able to directly access our internal VMs, lets focus on allowing outbound SNAT from our VMs. In order to create NAT rules we’ll need to manipulate the OVN northbound database directly. The syntax is a bit strange but I’ll explain later. From ubuntu1:

# create snat rule which will nat to the edge1-outside interface
ovn-nbctl -- --id=@nat create nat type="snat" logical_ip=172.16.255.128/25 \
external_ip=10.127.0.129 -- add logical_router edge1 nat @nat

In brief, this command is creating an entry in the “nat” table of the northbound database, storing the resulting UUID within the ovsdb variable “@nat”, and then adding the UUID stored in @nat to the “nat” field of the “edge1” entry in the “logical_router” table of the northbound database. If you want to know the details of the northbound database then be sure to check out the man page for ovn-nb. The man page for ovn-nbctl also explains the command syntax used above.

Testing connectivity from vm1:

root@ubuntu2:~# ip netns exec vm1 ping 10.127.0.130
PING 10.127.0.130 (10.127.0.130) 56(84) bytes of data.
64 bytes from 10.127.0.130: icmp_seq=40 ttl=62 time=2.39 ms
64 bytes from 10.127.0.130: icmp_seq=41 ttl=62 time=1.61 ms
64 bytes from 10.127.0.130: icmp_seq=42 ttl=62 time=1.28 ms

As seen above, we can now ping the outside world from our internal VMs.

Final Words

Overlay networks are almost entirely useless unless you can connect them to the outside world. OVN gateways provide a means for making such a connection.

In the next post I will explore another important feature of OVN: the OVN load balancer.

Wednesday, September 21, 2016

An Introduction to OVN Routing

Untitled Document.md

Overview

Building upon my previous post I will now add basic layer-3 networking into the OVN setup. The end result will be a pair of logical switches connected by a logical router. As an added bonus, the router will be configured to serve IP addresses via the DHCP service which is built into OVN.

Re-Architecting the Logical Components

Since the setup is starting to become more complex we’re going to re-architect a bit. The new setup will be as follows:

  • 2 logical switches: “dmz” and “inside”
  • logical router “tenant1” connecting the two logical switches
  • IP network 172.16.255.128/26 for “dmz”
  • IP network 172.16.255.192/26 for “inside”
  • a pair of “VMs” on each logical switch

The proposed logical network is illustrated below.

ovn logical components

A Word On Routing

As part of our setup we will be creating an OVN router, also known as a “distributed logical router” (DLR). A DLR differs from a traditional router in that it is not an actual appliance but rather a logical construct (not unlike a logical switch). DLRs exist solely as a function in OVS: in other words each OVS instance is capable of simulating a layer-3 router hop locally prior to forwarding the traffic across the overlay network.

Creating the Logical Switches and Router

Define the new logical switches. From ubuntu1:

ovn-nbctl ls-add inside
ovn-nbctl ls-add dmz

Add the logical router with its associated router and switch ports:

# add the router
ovn-nbctl lr-add tenant1

# create router port for the connection to dmz
ovn-nbctl lrp-add tenant1 tenant1-dmz 02:ac:10:ff:01:29 172.16.255.129/26

# create the dmz switch port for connection to tenant1
ovn-nbctl lsp-add dmz dmz-tenant1
ovn-nbctl lsp-set-type dmz-tenant1 router
ovn-nbctl lsp-set-addresses dmz-tenant1 02:ac:10:ff:01:29
ovn-nbctl lsp-set-options dmz-tenant1 router-port=tenant1-dmz

# create router port for the connection to inside
ovn-nbctl lrp-add tenant1 tenant1-inside 02:ac:10:ff:01:93 172.16.255.193/26

# create the inside switch port for connection to tenant1
ovn-nbctl lsp-add inside inside-tenant1
ovn-nbctl lsp-set-type inside-tenant1 router
ovn-nbctl lsp-set-addresses inside-tenant1 02:ac:10:ff:01:93
ovn-nbctl lsp-set-options inside-tenant1 router-port=tenant1-inside

ovn-nbctl show

Adding DHCP

DHCP within OVN works a bit differently than most server solutions. The general idea is that the administrator will:

  1. define a set of DHCP options for use with a given subnet
  2. create logical switch ports which define both the mac address and the IP address expected to exist behind that port
  3. assign the DHCP options to that port.
  4. set port security to allow only the assigned addresses

Lets get started by configuring the logical ports for the (4) VMs we’ll be adding. From ubuntu1:

ovn-nbctl lsp-add dmz dmz-vm1
ovn-nbctl lsp-set-addresses dmz-vm1 "02:ac:10:ff:01:30 172.16.255.130"
ovn-nbctl lsp-set-port-security dmz-vm1 "02:ac:10:ff:01:30 172.16.255.130"

ovn-nbctl lsp-add dmz dmz-vm2
ovn-nbctl lsp-set-addresses dmz-vm2 "02:ac:10:ff:01:31 172.16.255.131"
ovn-nbctl lsp-set-port-security dmz-vm2 "02:ac:10:ff:01:31 172.16.255.131"

ovn-nbctl lsp-add inside inside-vm3
ovn-nbctl lsp-set-addresses inside-vm3 "02:ac:10:ff:01:94 172.16.255.194"
ovn-nbctl lsp-set-port-security inside-vm3 "02:ac:10:ff:01:94 172.16.255.194"

ovn-nbctl lsp-add inside inside-vm4
ovn-nbctl lsp-set-addresses inside-vm4 "02:ac:10:ff:01:95 172.16.255.195"
ovn-nbctl lsp-set-port-security inside-vm4 "02:ac:10:ff:01:95 172.16.255.195"

ovn-nbctl show

You may have noted that, unlike the previous lab, we are defining both mac and IP addresses as part of the logical switch definition. The IP address definition serves 2 purposes for us:

  1. It enables ARP suppression by allowing OVN to locally answer ARP requests for IP/mac combinations it knows about.
  2. It acts as a DHCP host assignment mechanism by issuing the defined IP address to any DHCP requests it sees from that port.

Next we need to define our DHCP options and assign them to our logical ports. The process here is going to be a bit different than we’ve seen before in that we will be directly interacting with the OVN NB database. The reason for this approach is that we need to capture the UUID of the DHCP_Options entry we create so that we can assign it to our switch ports. To do this we will capture the output of the ovn-nbctl command to a pair of bash variables.

dmzDhcp="$(ovn-nbctl create DHCP_Options cidr=172.16.255.128/26 \
options="\"server_id\"=\"172.16.255.129\" \"server_mac\"=\"02:ac:10:ff:01:29\" \
\"lease_time\"=\"3600\" \"router\"=\"172.16.255.129\"")" 
echo $dmzDhcp

insideDhcp="$(ovn-nbctl create DHCP_Options cidr=172.16.255.192/26 \
options="\"server_id\"=\"172.16.255.193\" \"server_mac\"=\"02:ac:10:ff:01:93\" \
\"lease_time\"=\"3600\" \"router\"=\"172.16.255.193\"")"
echo $insideDhcp

ovn-nbctl dhcp-options-list

See the man page for ovn-nb if you want to know more about the OVN NB database.

Now we’ll assign the DHCP_Options to our logical switch ports using the UUID stored within the variables.

ovn-nbctl lsp-set-dhcpv4-options dmz-vm1 $dmzDhcp
ovn-nbctl lsp-get-dhcpv4-options dmz-vm1

ovn-nbctl lsp-set-dhcpv4-options dmz-vm2 $dmzDhcp
ovn-nbctl lsp-get-dhcpv4-options dmz-vm2

ovn-nbctl lsp-set-dhcpv4-options inside-vm3 $insideDhcp
ovn-nbctl lsp-get-dhcpv4-options inside-vm3

ovn-nbctl lsp-set-dhcpv4-options inside-vm4 $insideDhcp
ovn-nbctl lsp-get-dhcpv4-options inside-vm4

Configuring the VMs

As in the last lab we will be using fake “VMs” using OVS internal ports and network namespaces. The difference now is that we will use DHCP for address assignments. Lets set up the VMs.

On ubuntu2:

ip netns add vm1
ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal
ip link set vm1 address 02:ac:10:ff:01:30
ip link set vm1 netns vm1
ovs-vsctl set Interface vm1 external_ids:iface-id=dmz-vm1
ip netns exec vm1 dhclient vm1
ip netns exec vm1 ip addr show vm1
ip netns exec vm1 ip route show

ip netns add vm3
ovs-vsctl add-port br-int vm3 -- set interface vm3 type=internal
ip link set vm3 address 02:ac:10:ff:01:94
ip link set vm3 netns vm3
ovs-vsctl set Interface vm3 external_ids:iface-id=inside-vm3
ip netns exec vm3 dhclient vm3
ip netns exec vm3 ip addr show vm3
ip netns exec vm3 ip route show

On ubuntu3:

ip netns add vm2
ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal
ip link set vm2 address 02:ac:10:ff:01:31
ip link set vm2 netns vm2
ovs-vsctl set Interface vm2 external_ids:iface-id=dmz-vm2
ip netns exec vm2 dhclient vm2
ip netns exec vm2 ip addr show vm2
ip netns exec vm2 ip route show

ip netns add vm4
ovs-vsctl add-port br-int vm4 -- set interface vm4 type=internal
ip link set vm4 address 02:ac:10:ff:01:95
ip link set vm4 netns vm4
ovs-vsctl set Interface vm4 external_ids:iface-id=inside-vm4
ip netns exec vm4 dhclient vm4
ip netns exec vm4 ip addr show vm4
ip netns exec vm4 ip route show

And testing connectivity from vm1 on ubuntu2:

# ping the default gateway on tenant1
root@ubuntu2:~# ip netns exec vm1 ping 172.16.255.129
PING 172.16.255.129 (172.16.255.129) 56(84) bytes of data.
64 bytes from 172.16.255.129: icmp_seq=1 ttl=254 time=0.689 ms
64 bytes from 172.16.255.129: icmp_seq=2 ttl=254 time=0.393 ms
64 bytes from 172.16.255.129: icmp_seq=3 ttl=254 time=0.483 ms

# ping vm2 through the overlay
root@ubuntu2:~# ip netns exec vm1  ping 172.16.255.131
PING 172.16.255.131 (172.16.255.131) 56(84) bytes of data.
64 bytes from 172.16.255.131: icmp_seq=1 ttl=64 time=2.16 ms
64 bytes from 172.16.255.131: icmp_seq=2 ttl=64 time=0.573 ms
64 bytes from 172.16.255.131: icmp_seq=3 ttl=64 time=0.446 ms

# ping vm3 through the router, via the local ovs bridge
root@ubuntu2:~# ip netns exec vm1  ping 172.16.255.194
PING 172.16.255.194 (172.16.255.194) 56(84) bytes of data.
64 bytes from 172.16.255.194: icmp_seq=1 ttl=63 time=1.37 ms
64 bytes from 172.16.255.194: icmp_seq=2 ttl=63 time=0.077 ms
64 bytes from 172.16.255.194: icmp_seq=3 ttl=63 time=0.076 ms

# ping vm4 through the router, across the overlay
root@ubuntu2:~# ip netns exec vm1  ping 172.16.255.195
PING 172.16.255.195 (172.16.255.195) 56(84) bytes of data.
64 bytes from 172.16.255.195: icmp_seq=1 ttl=63 time=1.79 ms
64 bytes from 172.16.255.195: icmp_seq=2 ttl=63 time=0.605 ms
64 bytes from 172.16.255.195: icmp_seq=3 ttl=63 time=0.503 ms

Final Words

OVN makes layer-3 overlay networking relatively easy and pain free. The fact that services such as DHCP are built directly into the system should help to reduce the number of external dependencies needed to build an effective SDN solution. In the next post I will discuss ways of connecting our (currently isolated) overlay network to the outside world.

Monday, September 19, 2016

A Primer on OVN

Untitled Document.md

Overview

OVN is a virtual networking platform developed by the fine folks over at openvswitch.org. The project was announced in early 2015 and has just recently released the first production ready version, version 2.6. In this posting I’ll walk through the basics of configuring a simple layer-2 overlay network between 3 hosts. But first, a brief overview of how the system functions.

OVN works on the premise of a distributed control plane where components are co-located on each node in the network. The roles within OVN are:

  • OVN Central – Currently a single host supports this role and this host acts as a central point of API integration by external resources such as a cloud management platform. The central control houses the OVN northbound database, which keeps track of high-level logical constructs such as logical switches/ports, and the OVN southbound database which determines how to map logical constructs in ovn-northdb to the physical world.
  • OVN Host – This role is distributed amongst all nodes which contain virtual networking end points such as VMs. The OVN Host contains a “chassis controller” which connects upstream to the ovn-southdb as its authoratative source of record for physical network information, and southbound to OVS for which it acts as an openflow controller.

The Lab

My lab is running as a nested setup on a single esxi machine. The OVN lab itself will consist of 3 Ubuntu 16.04 servers connected to a common management subnet (10.127.0.0/25). The hosts and their IP addresses are as follows:

  • ubuntu1 10.127.0.2 – will serve as OVN Central
  • ubuntu2 10.127.0.3 – will serve as an OVN Host
  • ubuntu3 10.127.0.4 – will serve as an OVN Host

The network setup for the lab is illustrated below.

ovn lab

For the sake of ease of testing I will simulate virtual machine workloads on the Ubuntu hosts by creating OVS internal interfaces and sandboxing them within a network namespace. The namespaces will ensure that our OVN overlay network is completely isolated from the lab network.

Building Open vSwitch 2.6

Open vSwitch version 2.6 was released on 2016/09/28 and may be downloaded from here. Since I am using a Ubuntu based system I found the instructions file INSTALL.Debian.md included with the download to work well. Below is a brief summary of the instructions. You should build ovs on either one of the three Ubuntu machines or on a dedicated build machine with an identical kernel version.

Update & install dependencies

apt-get update
apt-get -y install build-essential fakeroot

Install Build-Depends from debian/control file

apt-get -y install graphviz autoconf automake bzip2 debhelper dh-autoreconf libssl-dev libtool openssl
apt-get -y install procps python-all python-twisted-conch python-zopeinterface python-six

Check the working directory & build

cd openvswitch-2.6.0

# if everything is ok then this should return no output
dpkg-checkbuilddeps

`DEB_BUILD_OPTIONS='parallel=8 nocheck' fakeroot debian/rules binary`

The .deb files for ovs will be built and placed in the parent directory (ie. in …/). The next step is to build the kernel modules.

Install datapath sources

cd ..
apt-get -y install module-assistant
dpkg -i openvswitch-datapath-source_2.6.0-1_all.deb 

Build kernel modules using module-assistant

m-a prepare
m-a build openvswitch-datapath

Copy the resulting deb package. Note that your version may differ slightly depending on your specific kernel version.

cp /usr/src/openvswitch-datapath-module-*.deb ./

Transfer the following to all three Ubuntu hosts:

  • openvswitch-datapath-module-*.deb
  • openvswitch-common_2.6.0-1_amd64.deb
  • openvswitch-switch_2.6.0-1_amd64.deb
  • ovn-common_2.6.0-1_amd64.deb
  • ovn-central_2.6.0-1_amd64.deb
  • ovn-host_2.6.0-1_amd64.deb

Installing Open vSwitch

Install OVS/OVN + dependencies on ubuntu1:

apt-get update
apt-get -y install python-six python2.7
dpkg -i openvswitch-datapath-module-*.deb
dpkg -i openvswitch-common_2.6.0-1_amd64.deb openvswitch-switch_2.6.0-1_amd64.deb
dpkg -i ovn-common_2.6.0-1_amd64.deb ovn-central_2.6.0-1_amd64.deb ovn-host_2.6.0-1_amd64.deb

Install OVS/OVN + dependencies on ubuntu2 and ubuntu3:

apt-get update
apt-get -y install python-six python2.7
dpkg -i openvswitch-datapath-module-*.deb
dpkg -i openvswitch-common_2.6.0-1_amd64.deb openvswitch-switch_2.6.0-1_amd64.deb
dpkg -i ovn-common_2.6.0-1_amd64.deb ovn-host_2.6.0-1_amd64.deb

Once the packages are installed you should notice that ubuntu1 is now listening on the TCP ports associated with ovn-northd (6641) and the OVN southbound database (6642). Here is the output of a netstat on my system:

root@ubuntu1:~# netstat -lntp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:6641            0.0.0.0:*               LISTEN      1798/ovsdb-server
tcp        0      0 0.0.0.0:6642            0.0.0.0:*               LISTEN      1806/ovsdb-server

If you do not see ovsdb-server listening on 6641/6642 then you may need to manually start the daemons via the init scripts (/etc/init.d/openvswitch-switch and /etc/init.d/ovn-central).

Creating the Integration Bridge

OVN will be responsible for managing a single bridge within OVS and anything that we wish to be connected to an OVN logical switch must be connected to this bridge. Although we can name this bridge anything we want, the standard convention is to name it “br-int”. We’ll want to verify that this bridge does not already exist and add it if needed.

On all ubuntu2/ubuntu3:

ovs-vsctl list-br

If you do not see “br-int” listed in the output then you’ll need to create the bridge manually. Remember, the integration bridge must exist on all hosts which will house VMs.

ovs-vsctl add-br br-int -- set Bridge br-int fail-mode=secure
ovs-vsctl list-br

The “fail-mode=secure” is a security feature which configures the bridge to drop traffic by default. This is important since we do not want tenants of the integration bridge to be able to communicate if our flows were somehow erased or if we plugged tenants into the bridge before the OVN controller was started.

Connecting the Chassis Controllers to the Central Controller

The next step in the setup is to connect the chassis controllers on ubuntu2/ubuntu3 to our central controller on ubuntu1.

On ubuntu2:

ovs-vsctl set open . external-ids:ovn-remote=tcp:10.127.0.2:6642
ovs-vsctl set open . external-ids:ovn-encap-type=geneve
ovs-vsctl set open . external-ids:ovn-encap-ip=10.127.0.3

On ubuntu3:

ovs-vsctl set open . external-ids:ovn-remote=tcp:10.127.0.2:6642
ovs-vsctl set open . external-ids:ovn-encap-type=geneve
ovs-vsctl set open . external-ids:ovn-encap-ip=10.127.0.4

These commands will trigger the OVN chassis controller on each host to open a connection to the OVN Central host on ubuntu1. We’ve specified that our overlay networks should use the geneve protocol to encapsulate our data plane traffic.

You may verify the connectivity with netstat. The output on my ubuntu3 machine is as follows:

root@ubuntu3:~# netstat -antp | grep 10.127.0.2
tcp        0      0 10.127.0.4:39256        10.127.0.2:6642         ESTABLISHED 3072/ovn-controller

Defining the Logical Network

Below is a diagram which illustrates the OVN logical components which will be created as part of this lab.

ovn logical components

Lets start by define the OVN logical switch “ls1” along with its associated logical ports. From ubuntu1:

# create the logical switch
ovn-nbctl ls-add ls1

# create logical port
ovn-nbctl lsp-add ls1 ls1-vm1
ovn-nbctl lsp-set-addresses ls1-vm1 02:ac:10:ff:00:11
ovn-nbctl lsp-set-port-security ls1-vm1 02:ac:10:ff:00:11

# create logical port
ovn-nbctl lsp-add ls1 ls1-vm2
ovn-nbctl lsp-set-addresses ls1-vm2 02:ac:10:ff:00:22
ovn-nbctl lsp-set-port-security ls1-vm2 02:ac:10:ff:00:22

ovn-nbctl show

In each command set we’ve created a uniquely named logical port. Normally these logical ports will be named with a UUID to ensure uniqueness but for our purposes we’ll use names which are more human friendly. We’re also defining which mac addresses we expect to be associated with each logical port and we take the further step of locking down the logical port using port security (allows only the macs we’ve listed to source from the logical port).

Adding “Fake” Virtual Machines

The next step is to create “virtual machines” which we’ll connect to our logical switch. As mentioned before, we’ll use OVS internal ports and network namespaces to simulate virtual machines. We’ll use the address space 172.16.255.0/24 for our logical switch.

On ubuntu2:

ip netns add vm1
ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal
ip link set vm1 netns vm1
ip netns exec vm1 ip link set vm1 address 02:ac:10:ff:00:11
ip netns exec vm1 ip addr add 172.16.255.11/24 dev vm1
ip netns exec vm1 ip link set vm1 up
ovs-vsctl set Interface vm1 external_ids:iface-id=ls1-vm1

ip netns exec vm1 ip addr show

On ubuntu3:

ip netns add vm2
ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal
ip link set vm2 netns vm2
ip netns exec vm2 ip link set vm2 address 02:ac:10:ff:00:22
ip netns exec vm2 ip addr add 172.16.255.22/24 dev vm2
ip netns exec vm2 ip link set vm2 up
ovs-vsctl set Interface vm2 external_ids:iface-id=ls1-vm2

ip netns exec vm2 ip addr show

If you’ve not worked much with namespaces or OVS then the above commands may be a bit confusing. In brief, we’re creating a network namespace named for our fake VM, adding an internal OVS port, adding the port to the namespace, setting up the IP interface from within the namespace (netns exec commands), and finally creating an OVS interface with an external_id set to the ID we defined for our logical port in OVN. Its this last bit which triggers OVS to alert OVN that a logical port has just come online. OVN acts upon this notification by pushing instructions down to the local chassis controller of the host machine which, in turn, pushes network flows down to OVS.

Note that I have explicitly set the mac addresses of these interfaces to match what we’ve defined in our OVN logical switch configuration. This is important. The logical network will not work if the mac addresses are not properly mapped. Keep in mind that I have actually reversed the workflow a bit here. Normally you would not change the mac address of the VM, but rather push an existing, known mac into OVN/OVS. My goal was to make the mac/IP easily visible for demonstration purposes, thus I manually set them.

Verifying and Testing Connectivity

From ubuntu1 we can verify the logical network configuration using ovn-sbctl. Here is the output from my system:

root@ubuntu1:~# ovn-sbctl show
Chassis "239f2c28-90ff-468f-a701-655585c630bf"
        hostname: "ubuntu3"
        Encap geneve
                ip: "10.127.0.4"
                options: {csum="true"}
        Port_Binding "ls1-vm2"
Chassis "517d558e-158a-4cb2-8870-283e9d39685e"
        hostname: "ubuntu2"
        Encap geneve
                ip: "10.127.0.3"
                options: {csum="true"}
        Port_Binding "ls1-vm1"

Note the port bindings associated to each of our OVN Host machines. In order to test connectivity we’ll simply launch a ping from the namespace of vm1. Here is the output from my machine:

root@ubuntu2:~# ip netns exec vm1 ping 172.16.255.22
PING 172.16.255.22 (172.16.255.22) 56(84) bytes of data.
64 bytes from 172.16.255.22: icmp_seq=1 ttl=64 time=1.60 ms
64 bytes from 172.16.255.22: icmp_seq=2 ttl=64 time=0.638 ms
64 bytes from 172.16.255.22: icmp_seq=3 ttl=64 time=0.344 ms

Adding a 3rd “VM” and Migrating It

Lets add a 3rd “VM” to our setup and then simulate migrating it between hosts. First, define its logical port using ovn-nbctl on ubuntu1:

ovn-nbctl lsp-add ls1 ls1-vm3
ovn-nbctl lsp-set-addresses ls1-vm3 02:ac:10:ff:00:33
ovn-nbctl lsp-set-port-security ls1-vm3 02:ac:10:ff:00:33

ovn-nbctl show

Next, create the interface for this VM on ubuntu2.

ip netns add vm3
ovs-vsctl add-port br-int vm3 -- set interface vm3 type=internal
ip link set vm3 netns vm3
ip netns exec vm3 ip link set vm3 address 02:ac:10:ff:00:33
ip netns exec vm3 ip addr add 172.16.255.33/24 dev vm3
ip netns exec vm3 ip link set vm3 up
ovs-vsctl set Interface vm3 external_ids:iface-id=ls1-vm3

ip netns exec vm3 ip addr show

Test connectivity from vm3.

root@ubuntu2:~# ip netns exec vm3 ping 172.16.255.22
PING 172.16.255.22 (172.16.255.22) 56(84) bytes of data.
64 bytes from 172.16.255.22: icmp_seq=1 ttl=64 time=2.04 ms
64 bytes from 172.16.255.22: icmp_seq=2 ttl=64 time=0.337 ms
64 bytes from 172.16.255.22: icmp_seq=3 ttl=64 time=0.536 ms

Note the OVN southbound DB configuration on ubuntu1. We see that ubuntu2 has 2 registered port bindings.

root@ubuntu1:~# ovn-sbctl show
Chassis "239f2c28-90ff-468f-a701-655585c630bf"
        hostname: "ubuntu3"
        Encap geneve
                ip: "10.127.0.4"
                options: {csum="true"}
        Port_Binding "ls1-vm2"
Chassis "517d558e-158a-4cb2-8870-283e9d39685e"
        hostname: "ubuntu2"
        Encap geneve
                ip: "10.127.0.3"
                options: {csum="true"}
        Port_Binding "ls1-vm3"
        Port_Binding "ls1-vm1"

In order to simulate a migration of vm3 we’ll delete the “vm3” namespace on ubuntu2, remove its port on br-int, and then recreate the setup on ubuntu3. From ubuntu2:

ip netns del vm3
ovs-vsctl --if-exists --with-iface del-port br-int vm3
ovs-vsctl list-ports br-int

From ubuntu3:

ip netns add vm3
ovs-vsctl add-port br-int vm3 -- set interface vm3 type=internal
ip link set vm3 netns vm3
ip netns exec vm3 ip link set vm3 address 02:ac:10:ff:00:33
ip netns exec vm3 ip addr add 172.16.255.33/24 dev vm3
ip netns exec vm3 ip link set vm3 up
ovs-vsctl set Interface vm3 external_ids:iface-id=ls1-vm3

And test connectivity:

root@ubuntu3:~# ip netns exec vm3 ping 172.16.255.11
PING 172.16.255.11 (172.16.255.11) 56(84) bytes of data.
64 bytes from 172.16.255.11: icmp_seq=1 ttl=64 time=1.44 ms
64 bytes from 172.16.255.11: icmp_seq=2 ttl=64 time=0.407 ms
64 bytes from 172.16.255.11: icmp_seq=3 ttl=64 time=0.395 ms

Again, note the OVN southbound DB configuration on ubuntu1. We see that the port binding has moved.

root@ubuntu1:~# ovn-sbctl show
Chassis "239f2c28-90ff-468f-a701-655585c630bf"
        hostname: "ubuntu3"
        Encap geneve
                ip: "10.127.0.4"
                options: {csum="true"}
        Port_Binding "ls1-vm2"
        Port_Binding "ls1-vm3"
Chassis "517d558e-158a-4cb2-8870-283e9d39685e"
        hostname: "ubuntu2"
        Encap geneve
                ip: "10.127.0.3"
                options: {csum="true"}
        Port_Binding "ls1-vm1"

Cleaning Up

Lets clean up the environment before quitting.

From ubuntu1:

# delete the logical switch and its ports
ovn-nbctl ls-del ls1

From ubuntu2:

# delete vm1
ip netns del vm1
ovs-vsctl --if-exists --with-iface del-port br-int vm1

From ubuntu3:

# delete vm2 and vm3
ip netns del vm2
ovs-vsctl --if-exists --with-iface del-port br-int vm2

ip netns del vm3
ovs-vsctl --if-exists --with-iface del-port br-int vm3

Final Words

As you can see, creating layer-2 overlay networks is relatively straight forward using OVN. In the next post I will discuss OVN layer-3 networking by introducing the OVN logical router.

Tuesday, August 9, 2016

Adding PXE support to Dnsmasq

Untitled Document.md

Overview

Expanding upon my previous post I will set up Dnsmasq to provide PXE services for my lab environment.

Installing syslinux

The first step is to copy the relevant syslinux files into my already existing /tftpboot directory. Since my primary use case with PXE is to boot esxi hypervisors, I’ll need syslinux version 3.86 (evidently later versions are not compatible with mboot.c32 included with esxi).

cd /tmp
wget https://www.kernel.org/pub/linux/utils/boot/syslinux/3.xx/syslinux-3.86.tar.gz
tar xzvf syslinux-3.86.tar.gz
cp /tmp/syslinux-3.86/core/pxelinux.0 /tftpboot/
cp /tmp/syslinux-3.86/memdisk/memdisk /tftpboot/
cp /tmp/syslinux-3.86/com32/menu/menu.c32 /tftpboot/
cd

Now I can create a basic PXE menu. To start, my menu will do nothing but boot local disk.

mkdir /tftpboot/pxelinux.cfg
cat > /tftpboot/pxelinux.cfg/default << EOF
DEFAULT menu.c32
PROMPT 0
TIMEOUT 300
ONTIMEOUT local

MENU TITLE PXE Server

LABEL local
MENU LABEL Boot local hard drive
LOCALBOOT 0

EOF

I can now append additional entries to the menu in order to support booting other operating systems.

Configuring an iPXE boot image

One use case I have in the lab is to perform an inital boot of a VM with the iPXE ISO image. I then use iPXE to bootstrap other images to the VM. For this, I’ll need to prepare the PXE server to serve up the iPXE image.

Grab the iPXE ISO image.

cd /tftpboot
wget http://boot.ipxe.org/ipxe.iso
cd

Add a PXE menu entry for ipxe.

cat >> /tftpboot/pxelinux.cfg/default << EOF
LABEL ipxe
MENU LABEL ipxe
linux memdisk
initrd ipxe.iso
append iso raw

EOF

Done. You should now be able to PXE boot an iPXE image… as odd as that sounds.

Configuring an esxi boot image

Another use case I have in the lab is the ability to kickstart physical hypervisors with esxi. For this, I’ll enable an esxi boot image with support for a kickstart file to automate the install.

Download the ISO image for esxi to /tmp of the CoreOS machine. For ease of scripting later on, store the esxi ISO file name in a variable. Adjust this per your exact version.

ESX_ISO=VMware-VMvisor-Installer-6.0.0-3620759.x86_64.iso

Mount the ISO and copy the files to /tftpboot, modifying the boot.cfg file to reflect the local directory structure.

mkdir /mnt/iso
mount /tmp/${ESX_ISO} /mnt/iso/
mkdir /tftpboot/esxi
rsync -a /mnt/iso/ /tftpboot/esxi
cat /mnt/iso/boot.cfg | sed -e "s#/##g" -e "3s#^#prefix=esxi\n#" > /tftpboot/esxi/boot.cfg
umount /mnt/iso

Add a PXE menu entry for esxi with support for a kickstart file. My environment listens on the interface mgmt0 port 8080, so change the MGMT_IF and HTTP_PORT variables below per your environment.

MGMT_IF=mgmt0
HTTP_PORT=8080
MGMT_IP=$(ip -o -4 addr show ${MGMT_IF} | awk '{print $4}' | cut -d/ -f1)

cat >> /tftpboot/pxelinux.cfg/default << EOF
LABEL esxi
MENU LABEL esxi
kernel esxi/mboot.c32
append -c esxi/boot.cfg ks=http://${MGMT_IP}:${HTTP_PORT}/esxi/kickstart.cfg
ipappend 2

EOF

Create the kickstart file. This assumes a working http server (as per this post).

mkdir /var/www/esxi
touch /var/www/esxi/kickstart.cfg

Edit /var/www/esxi/kickstart.cfg to look as below. Replace ‘aValidLicenseHere’ in the last line with an actual license, otherwise remove the line entirely. Also, replace the value of ‘rootpw’ with your own password.

accepteula
install --firstdisk --overwritevmfs
rootpw VMware1!
reboot

%include /tmp/networkconfig
%pre --interpreter=busybox

# extract network info from bootup
VMK_IF="vmk0"
VMK_NET=$(localcli network ip interface ipv4 get | grep "${VMK_IF}")
VMK_IP=$(echo "${VMK_NET}" | awk '{print $2}')
VMK_NETMASK=$(echo "${VMK_NET}" | awk '{print $3}')
GATEWAY=$(esxcfg-route | awk '{print $5}')
DNS=$(localcli network ip dns server list | grep 'DNS Servers' | awk '{print $3}'| cut -d, -f1)
HOSTNAME=$(nslookup "${VMK_IP}" | grep Address | grep "${VMK_IP}" | awk '{print $4}')

echo "network --bootproto=static --addvmportgroup=true --device=vmnic0 --ip=${VMK_IP} --netmask=${VMK_NETMASK} --gateway=${GATEWAY} --nameserver=${DNS} --hostname=${HOSTNAME}" > /tmp/networkconfig

%firstboot --interpreter=busybox

vim-cmd hostsvc/enable_ssh
vim-cmd hostsvc/start_ssh
vim-cmd hostsvc/enable_nest_shell
vim-cmd hostsvc/start_nest_shell
nestcli system settings advanced set -o /UserVars/SuppressShellWarning -i 1
vim-cmd vimsvc/license --set aValidLicenseHere

You should now be able to PXE boot esxi within the environment.

Thursday, August 4, 2016

CoreOS Local Mirror

Untitled Document.md

Overview

Expanding upon my previous post I will set up my CoreOS NAT gateway to provide a local etcd discovery service as well as to provide a local CoreOS mirror for use with installing additional nodes within my lab environment.

Enable etcd and configure the discovery service

In my orginal post I did not enable etcd. I will do so now.

Set up the systemd unit file. Adjust the value of ETC_ADV_CLIENT if needed.

ETC_ADV_CLIENT='http://10.127.0.1:2379'

cat > /etc/systemd/system/etcd2.service << EOF
[Unit]
Description=etcd2
Conflicts=etcd.service

[Service]
User=etcd
Type=notify
Environment=ETCD_DATA_DIR=/var/lib/etcd2
Environment=ETCD_NAME=%m
ExecStart=/usr/bin/etcd2 --listen-peer-urls 'http://0.0.0.0:2380' --listen-client-urls 'http://0.0.0.0:2379'  --advertise-client-urls '${ETC_ADV_CLIENT}'
Restart=always
RestartSec=10s
LimitNOFILE=40000
TimeoutStartSec=0

[Install]
WantedBy=multi-user.target
EOF

Enable & start

systemctl enable etcd2.service
systemctl start etcd2.service

Allow etcd through iptables by adding the following to the *filter section of /var/lib/iptables/rules-save.

# allow etcd from mgmt network
-A INPUT -s 10.127.0.0/24 -p tcp --dport 2379:2380 -j ACCEPT

and apply the rules

iptables-restore /var/lib/iptables/rules-save

Using the local discovery service

In order to make use of the local discovery service you will simply create an entry within the well-known key prefix as documented here. You may then boot your cluster per the instructions. I make a slight syntactical change by using etcdctl rather than curl and setting a TTL on my registration key. Change the value of TTL_SECONDS to something sane for your use case. The key will automatically self-destruct at the end of the TTL period.

UUID=$(uuidgen)
TTL_SECONDS=300
etcdctl mkdir _etcd/registry/${UUID} --ttl ${TTL_SECONDS}
etcdctl set _etcd/registry/${UUID}/_config/size 3
echo $UUID

Create a web server app

In order to create a local mirror we need http services on our CoreOS machine. For this piece you can choose to use a pre-existing app container or you can follow this example and create your own using golang. If you choose the golang route then you’ll need a machine with a go compiler installed in order to create the binary. My example roughly follows the example given as part of the rkt documentation.

Create a file httpd.go

package main

import (
"log"
"net/http"
)

func main() {
log.Fatal(http.ListenAndServe("0.0.0.0:8080", http.FileServer(http.Dir("/var/www"))))
}

Compile with

CGO_ENABLED=0 go build -ldflags '-extldflags "-static"' httpd.go

Transfer the binary file httpd to the CoreOS machine (my machine is named natcore).

scp httpd natcore:/tmp

Create an app container

From the CoreOS machine we will create the directory structure for our ACI image and then use it to build the ACI. I am roughly following the example from the aci spec.

Create the directory structure for our app

mkdir /var/www
mkdir -p /tmp/go-httpd/rootfs/bin
mkdir -p /tmp/go-httpd/rootfs/var/www
mv /tmp/httpd /tmp/go-httpd/rootfs/bin

Create the manifest. Change the DOMAIN variable to your actual domain name.

DOMAIN=localdomain
cat > /tmp/go-httpd/manifest << EOF
{
    "acKind": "ImageManifest",
    "acVersion": "0.8.6",
    "name": "${DOMAIN}/go-httpd",
    "labels": [
        {"name": "version","value": "1.0.0"},
        {"name": "os", "value": "linux"},
        {"name": "arch", "value": "amd64"}
    ],
    "app": {
        "exec": [
            "/bin/httpd"
        ],
        "user": "0",
        "group": "0"
    }
}
EOF

Build the container

actool build /tmp/go-httpd/ /tmp/go-httpd.aci

Test the ACI image

rkt --insecure-options=image run --net=host /tmp/go-httpd.aci

From another shell verify that the app is listening on 8080

curl localhost:8080

Terminate the container with the escape sequence

^]^]^]

The ACI should now be visible to rkt.

rkt image list

Create the systemd unit file for go-httpd.service.

cat > /etc/systemd/system/go-httpd.service << EOF  
[Unit]
Description=go-httpd
ExecStartPre=/usr/bin/mkdir /var/www

[Service]
TimeoutStartSec=0
ExecStart=/usr/bin/rkt run --hostname=${HOSTNAME} --net=host \
--volume var-www,kind=host,source=/var/www \
${DOMAIN}/go-httpd:1.0.0 \
--mount volume=var-www,target=/var/www
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF

Enable, start, and verify the go-httpd service.

systemctl enable go-httpd.service
systemctl start go-httpd.service
curl localhost:8080

Allow access through iptables by adding the following to the *filter section of /var/lib/iptables/rules-save.

# allow go-http access from mgmt network
-A INPUT -s 10.127.0.0/24 -p tcp --dport 8080 -j ACCEPT

and apply the rules

iptables-restore /var/lib/iptables/rules-save

Populate the local mirror

We’ll now copy the CoreOS production image files into /var/www. This will effectively create a local mirror which may then be used by the coreos-install script. For my example I am going to mirror CoreOS version 1010.6.0 but you should adjust the version according to your needs.

COS_VERSION=1010.6.0
mkdir -p /var/www/CoreOS/${COS_VERSION}
wget -r -l1 -np -nd "https://stable.release.core-os.net/amd64-usr/${COS_VERSION}/" -P /var/www/CoreOS/${COS_VERSION} -A "coreos_production_image*"
wget -r -l1 -np -nd "https://stable.release.core-os.net/amd64-usr/${COS_VERSION}/" -P /var/www/CoreOS/${COS_VERSION} -A "coreos_production_pxe*"

In order to make use of the local mirror when installing additional CoreOS machines, you will need to provide the -b option to coreos-install and specify the base URL for the local mirror. For example:

coreos-install -b http://10.127.0.1:8080/CoreOS -d /dev/sda -c cloud-config.yml

Monday, August 1, 2016

CoreOS + Dnsmasq

Untitled Document.md

Overview

Expanding upon my previous post I will add DHCP/TFTP services to my CoreOS NAT gateway using the dnsmasq image provided by quay.io.

Configure the Dnsmasq Container

All steps are performed from the console of the CoreOS machine.

Fetch the pre-made container from quay.io.

rkt fetch coreos.com/dnsmasq:v0.3.0

Set environment variables according to your setup. These will be used when generating config files for dnsmasq.

DNS_PRIMARY=8.8.8.8
DNS_DOMAIN=localdomain
PUB_IF=pub0
MGMT_IF=mgmt0
DATA_IF=data0
DMZ_IF=dmz0
MGMT_DHCP=10.127.0.128,10.127.0.254
DATA_DHCP=10.127.1.128,10.127.1.254
DMZ_DHCP=10.127.2.128,10.127.2.254

Create the systemd unit file for dnsmasq.service.

cat > /etc/systemd/system/dnsmasq.service << EOF  
[Unit]
Description=dnsmasq
ExecStartPre=/usr/bin/mkdir /etc/dnsmasq
ExecStartPre=/usr/bin/mkdir /tftpboot

[Service]
TimeoutStartSec=0
ExecStart=/usr/bin/rkt run --hostname=natcore --net=host \
--volume etc-dnsmasq,kind=host,source=/etc/dnsmasq \
--volume tftpboot,kind=host,source=/tftpboot \
coreos.com/dnsmasq:v0.3.0 \
--mount volume=etc-dnsmasq,target=/etc/dnsmasq \
--mount volume=tftpboot,target=/tftpboot \
-- -d -C /etc/dnsmasq/dnsmasq.conf -R -S ${DNS_PRIMARY}
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF

Create the required directories for dnsmasq and create a symlink to /etc/hosts.

mkdir /tftpboot
mkdir /etc/dnsmasq
ln /etc/hosts /etc/dnsmasq/hosts

Create dnsmasq.conf

cat > /etc/dnsmasq/dnsmasq.conf << EOF
### GENERAL SETTINGS ###    
local=/${DNS_DOMAIN}/
domain=${DNS_DOMAIN}
expand-hosts
addn-hosts=/etc/dnsmasq/hosts

### TFTP SETTINGS ###    
dhcp-boot=pxelinux.0
enable-tftp
tftp-root=/tftpboot

### DHCP SETTINGS ###    
# ntp
dhcp-option=42,0.0.0.0    
# default gw set to mgmt network
dhcp-option=${MGMT_IF},3,10.127.0.1
# no services on public interface
no-dhcp-interface=${PUB_IF}

# ${MGMT_IF} dhcp
dhcp-range=${MGMT_IF},${MGMT_DHCP},12h

# ${DATA_IF} dhcp
dhcp-range=${DATA_IF},${DATA_DHCP},12h

# ${DMZ_IF} dhcp
dhcp-range=${DMZ_IF},${DMZ_DHCP},12h
EOF

Add some entries to /etc/hosts for natcore (optional).

cat >> /etc/hosts << EOF
10.127.0.1 mgmt0.natcore
10.127.1.1 data0.natcore
10.127.2.1 dmz0.natcore
EOF

Start the Container

Enable and start services.

systemctl enable /etc/systemd/system/dnsmasq.service
systemctl start dnsmasq.service

Verify the service.

systemctl status dnsmasq.service

Verify dnsmasq is listening on UDP 53/69.

netstat -lnup

If for some reason dnsmasq fails to start then you may view the logs using machinectl to find the machine ID of the container.

machinectl list

and using journalctl to view its logs (provide the id returned from machinectl)

journalctl -M {id}

Be sure to allow TFTP through iptables by adding the following to the *filter section of /var/lib/iptables/rules-save.

# allow tftp from mgmt network
-A INPUT -s 10.127.0.0/24 -p udp --dport 69 -j ACCEPT

and apply the rules

iptables-restore /var/lib/iptables/rules-save

As part of a later post I’ll walk through configuring PXE services using Dnsmasq.

Saturday, July 30, 2016

CoreOS as a NAT Gateway

Untitled Document.md

Overview

Recently I’ve started working a bit with CoreOS and decided to test it as a NAT gateway for my lab environment. This guide provides a walk through of how I configured CoreOS for this purpose. You will need a workstation with access to (preferably) a bash shell in order to follow along.

This post is the first in a series on installing a CoreOS based lab environment. The other relevant posts are:

Description of the Environment

My lab environment consists of a handful of physical servers running VMware esx and the following four infrastructure networks:

  • public – public facing external network x.x.x.x/y
  • mgmt – private management network 10.127.0.0/24
  • data – private data transfer network 10.127.1.0/24
  • dmz – private dmz network 10.127.2.0/24

All physical servers have access to all 4 of the infrastructure networks. The CoreOS VM (referred to as “natcore”) is connected to all 4 networks, is configured as the default gateway for the private nets, and is set up for IP masquerade on its public interface.

My personal preference is to name interfaces in such a way as to reflect their function. In order to accomplish this I will make use of systemd.link files to name interfaces based on their mac-address. Additionally, I will use systemd.network files to set up static IP addresses for the internal interfaces. Finally, I will provide iptables rules files (v4 and v6) to both filter requests to natcore and provide the actual nat functionality.

Preparing the configdrive

The installation will make use of the config drive feature of CoreOS to configure the system upon first boot. As a first step in this process we will define a few variables which will be used to populate the cloud-config file. These are as follows:

  • COS_CONFIG_DRIVE_PATH: the base path of the config drive
  • COS_HOSTNAME: the hostname of the machine
  • COS_LAB_NET: the summary address for the lab ip space
  • COS_PUB_IF: the name of the public interface
  • COS_PUB_IF_MAC: the mac address of the public net interface
  • COS_MGMT_IF: the name of the mgmt interface
  • COS_MGMT_IF_MAC: the mac address of the mgmt net interface
  • COS_MGMT_IF_IP: the ip/mask of the mgmt net interface
  • COS_DATA_IF: the name of the data interface
  • COS_DATA_IF_MAC: the mac address of the data net interface
  • COS_DATA_IF_IP: the ip/mask of the data net interface
  • COS_DMZ_IF: the name of the dmz interface
  • COS_DMZ_IF_MAC: the mac address of the dmz net interface
  • COS_DMZ_IF_IP: the ip/mask of the dmz net interface
  • COS_USERNAME: the primary user to configure
  • COS_AUTH_KEY: rsa public key to be used for key-based ssh

From the terminal of your workstation define the variables. You’ll need to adjust the username, auth key, and mac-addresses to fit your environment.

COS_CONFIG_DRIVE_PATH=/tmp/configdrive
COS_HOSTNAME=natcore
COS_LAB_NET=10.127.0.0/22
COS_PUB_IF=pub0
COS_PUB_IF_MAC=00:0c:29:83:dd:8e
COS_MGMT_IF=mgmt0
COS_MGMT_IF_MAC=00:50:56:00:00:01
COS_MGMT_IF_IP=10.127.0.1/24
COS_DATA_IF=data0
COS_DATA_IF_MAC=00:50:56:01:00:01
COS_DATA_IF_IP=10.127.1.1/24
COS_DMZ_IF=dmz0
COS_DMZ_IF_MAC=00:50:56:02:00:01
COS_DMZ_IF_IP=10.127.2.1/24
COS_USERNAME=user1
COS_AUTH_KEY="ssh-rsa AAAAB..."

Create the file structure for the config drive.

mkdir -p ${COS_CONFIG_DRIVE_PATH}/openstack/latest

Create the cloud-config file.

cat > ${COS_CONFIG_DRIVE_PATH}/openstack/latest/user_data << EOF
#cloud-config
hostname: "${COS_HOSTNAME}"
manage_etc_hosts: "localhost"
users:
- name: "${COS_USERNAME}"
    groups:
    - "sudo"
    - "rkt"
    - "wheel"
    ssh-authorized-keys:
    - "${COS_AUTH_KEY}"
        
write_files:
- path: "/etc/systemd/network/10-${COS_PUB_IF}.link"
    permissions: "0644"
    owner: "root:root"
    content: |
    [Match]
    MACAddress=${COS_PUB_IF_MAC}      
    [Link]
    Name=${COS_PUB_IF}

- path: "/etc/systemd/network/10-${COS_MGMT_IF}.link"
    permissions: "0644"
    owner: "root:root"
    content: |
    [Match]
    MACAddress=${COS_MGMT_IF_MAC}
    [Link]
    Name=${COS_MGMT_IF}
    
- path: "/etc/systemd/network/10-${COS_DATA_IF}.link"
    permissions: "0644"
    owner: "root:root"
    content: |
    [Match]
    MACAddress=${COS_DATA_IF_MAC}
    [Link]
    Name=${COS_DATA_IF}
    
- path: "/etc/systemd/network/10-${COS_DMZ_IF}.link"
    permissions: "0644"
    owner: "root:root"
    content: |
    [Match]
    MACAddress=${COS_DMZ_IF_MAC}
    [Link]
    Name=${COS_DMZ_IF}
    
- path: "/etc/systemd/network/20-${COS_MGMT_IF}.network"
    permissions: "0644"
    owner: "root:root"
    content: |
    [Match]
    Name=${COS_MGMT_IF}
    [Network]
    Address=${COS_MGMT_IF_IP}
    
- path: "/etc/systemd/network/20-${COS_DATA_IF}.network"
    permissions: "0644"
    owner: "root:root"
    content: |
    [Match]
    Name=${COS_DATA_IF}
    [Network]
    Address=${COS_DATA_IF_IP}
    
- path: "/etc/systemd/network/20-${COS_DMZ_IF}.network"
    permissions: "0644"
    owner: "root:root"
    content: |
    [Match]
    Name=${COS_DMZ_IF}
    [Network]
    Address=${COS_DMZ_IF_IP}
    
- path: "/var/lib/iptables/rules-save"
    permissions: "0644"
    owner: "root:root"
    content: |
    *filter
    :INPUT DROP [0:0]
    :FORWARD ACCEPT [0:0]
    :OUTPUT ACCEPT [0:0]

    # allow to loopback
    -A INPUT -i lo -j ACCEPT
    
    # allow established
    -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT

    # allow icmp
    -A INPUT -p icmp --icmp-type any -j ACCEPT

    # allow ssh
    -A INPUT -p tcp --dport 22 -j ACCEPT

    # allow dhcp from internal interfaces
    -A INPUT -i ${COS_MGMT_IF} -p udp --dport 67:68 -j ACCEPT
    -A INPUT -i ${COS_DATA_IF} -p udp --dport 67:68 -j ACCEPT
    -A INPUT -i ${COS_DMZ_IF} -p udp --dport 67:68 -j ACCEPT

    # allow dns from internal networks
    -A INPUT -s ${COS_LAB_NET} -p udp --dport 53 -j ACCEPT
    
    # allow ntp from internal networks
    -A INPUT -s ${COS_LAB_NET} -p udp --dport 123 -j ACCEPT

    COMMIT

    *nat
    -A POSTROUTING -o ${COS_PUB_IF} -s ${COS_LAB_NET} -j MASQUERADE

    COMMIT
    
- path: "/var/lib/ip6tables/rules-save"
    permissions: "0644"
    owner: "root:root"
    content: |
    *filter
    :INPUT DROP [0:0]
    :FORWARD DROP [0:0]
    :OUTPUT ACCEPT [0:0]
    COMMIT
    
coreos:
update:
    # I dont want my nat gateway rebooting automatically
    reboot-strategy: off
units:
    - name: iptables-restore.service
    enable: true
    - name: ip6tables-restore.service
    enable: true
    - name: ntpd.service
    enable: true
    command: start
    - name: systemd-timesyncd.service
    enable: true
    
EOF

Create an ISO image from our config drive directory structure.

mkisofs -R -V config-2 -o configdrive.iso ${COS_CONFIG_DRIVE_PATH}
rm -r ${COS_CONFIG_DRIVE_PATH}

Copy the file “configdrive.iso” to the datastore of the esx machine hosting

the CoreOS machine (eg. /vmfs/volumes/datastore1).

Boot the machine

For the purposes of my lab I generally install from ISO. To do so, download the latest CoreOS image and transfer it to the datastore of the esx machine hosting the CoreOS machine (eg. /vmfs/volumes/datastore1).

Next, create a VM with (roughly) the following specs:

  • 2G memory
  • 1 disk (size based on your planned usage)
  • 4 nics (e1000) attached to networks public,mgmt,data,dmz
  • cdrom drive (with CoreOS image attached)
  • cdrom drive (with configdrive.iso attached)

Boot the VM from the CoreOS ISO image.

Once the machine boots you may ssh to it using the auth key provided as part of the cloud-config. Upon login you will notice that your config files specified as part of the cloud-config are present, however, they are not applied. This is because CoreOS creates these files after systemd has alread registered network interfaces and started iptables. This is not a big deal since the very next step is to install to disk and reboot.

sudo coreos-install -d /dev/sda -c /media/configdrive/openstack/latest/user_data

The above command will download/install CoreOS using the cloud-config created as part of our config drive. Once it completes then you should reboot the system. Upon reboot CoreOS will configure itself using the cloud-config file. As before, the config files are created but not applied. Rebooting a second time will apply them.

After a second reboot you will notice that your network interfaces are named according to the systemd.link rules that we defined and that iptables rules are applied. CoreOS seems to have ip forwarding enabled by default, but it is good to verify that the following command returns a 1.

cat /proc/sys/net/ipv4/ip_forward

Some additional rules

I tend to run an rdp machine within my lab and prefer to pipe rdp through directly to that host. In order to enable this we’ll need to set up a static nat rule which, in my case, will forward rdp to 10.127.0.2. Edit the file /var/lib/iptables/rules-save and make the nat section look as follows:

*nat
# rdp
-A PREROUTING -i pub0 -p tcp --dport 3389 -j DNAT --to 10.127.0.2

-A POSTROUTING -o pub0 -s 10.127.0.0/22 -j MASQUERADE

You may now apply this change by running

iptables-restore /var/lib/iptables/rules-save

You should now have a fully functional nat gateway. In a later post I’ll cover adding dhcp and tftp services to this machine.