Two years of Kubernetes on AWS

This post is not the normal post on experiences and discoveries of two years spent bringing Kubernetes to production on AWS. Instead, I wrote this to offer a look back at what it meant to run Kubernetes on AWS two years ago, by first describing some key facts from 2016 and then having a look at how things evolved today, hoping that this would help getting an idea of how things changed and how we can make them change for the better in the future. Last but not least, I will mention some interesting topics that the community is focusing on in 2018 and what the community needs, all from my personal point of view. This is a write up of a Meetup talk, you can find the slides here.

October 2016

A premise

In October 2016, Kubernetes has recently celebrated its first birthday party (July 2016) and it’s getting more and more popular thanks to a number of factors, including the increasing popularity of containers and the incredible evangelization effort of Google, mostly lead by Kelsey Hightower.

In 2016, I have a good enough familiarity with Kubernetes, I’ve installed it on GCE, used GKE extensively, but never really used it on AWS. Similarly to what other people are experiencing, the company I’m working for is mostly using AWS as their cloud provider and thinking of migrating to Google is simply unreasonable. For this reason, I start my journey by setting up my very first Kubernetes cluster on AWS.

The missing deployment architecture

A lot of the discussions around deploying Kubernetes on AWS were around the usual topics: are we going to deploy a cluster per availability zone or should the cluster span multiple availability zones? Are we going to create multi-region clusters? Are we going to run etcd on the masters or outside of the masters? And are we gonna use a single master containing the entire control plane or adopting an highly available (HA) multi-master setup which was already supported by Kubernetes?

The answer to those questions were nowhere to be found but varied from person to person, company to company. Let’s try to go step by step to discover more details on those topics.

To multi-AZ or not to multi-AZ?

As already introduced, one of the common questions was if the cluster should span multiple availability zones or not. Back then, most of the Kubernetes users (or better, operators) were suggesting that they were deploying Kubernetes using autoscaling groups (at least for the worker nodes), design that naturally fits AWS.

In this case, the effort of going from a single availability zone to a multi availability zone was minimal and the increased availability attracted many people. Going with a multi-AZ setup introduced some challenges though.

For example, EBS volumes in AWS are per availability zone. This implies that if the cluster does not have nodes with capacity in the given zone, pods requiring a volume could stay in pending state forever if there was no node in that zone. This is a common mistake that I’ve seen happening many times from people that are new to Kubernetes: someone creates a very small cluster with 2 nodes but spanning 3 AZs and enables autoscaling (and thus scaling up and down over time). Then, they run a stateful workload that never goes into running state because there is no node in the availability zone to satisfy the volume constraint… and that isn’t really easy to debug if we are not super familiar with the subject and how Kubernetes scheduling works.

Another issue came with the cluster autoscaler: the implementation available in 2016 was not aware of zones and thus inefficient at autoscaling nodes when the nodes would span multiple AZs. Still, most of the people decided to stick with the multiple AZs in one autoscaling group for worker nodes.

What about multi region?

One additional option would be to deploy clusters cross region. This was, to my knowledge, not attempted at all with Kubernetes in 2016 or at least I don’t know anyone actually trying that back then.

To have a multi-region setup, we should keep into account that there is the need for the Kubelet to talk to the API server and the latencies involved with multi-region deployments would probably be a problem or at least something to deal with and test carefully. Even more important than that, the benefit would probably just not be worth the pain of setting this up and the provisioner tools (which I’m gonna cover later) did not support that. The Kubernetes community made a bet on Federation as a way to orchestrate workloads across multiple federated cluster, in a way solving multi-region deployments.

Kubernetes cluster Federation

The cluster federation is based on a control plane that would allow operators to orchestrate workloads across different clusters. That was promised as more resembling to the way Borg works inside Google: the Kubernetes cluster would be mapped similarly to Borg cells, which can be deployed, for example, to a single rack in a datacenter and the central control plane could have handled multi-cell deployments and so on.

The idea behind federation encouraged to have small clusters and federating them with a control plane to achieve higher availability and make it easier to orchestrate the Kubernetes clusters themselves. For a lot of people federation looked like “something to look forward in the next year” at this stage, look later in this post to know what happened then ;-) .

Single or multiple master nodes?

Like pretty much everything in software and technology in general, the discussion around having a single master or a multi-master setup was highly opinionated: some people were saying that the multi-master was a no brainer and not something that it would be worth renouncing, while others were just okay with a single master.

Kube-AWS (a provisioner tool we will cover later) was already supporting multi-master deployments at this stage and Kops (another provisioning tool) was also supporting it, but it would default to a single master.

That topic, which seems to be trivial, requires a lot of thinking and it’s all but a detail. There are in fact several tradeoffs to discuss around that:

  • a single master can guarantee a lower availability compared to a multi master setup.
  • a multi master can allow for continuous cluster updates as we can upgrade one master at a time keeping the cluster operational.
  • a multi master setup is more expensive than a single master one (especially if we want to deploy etcd outside of the cluster).

It’s worth nothing that GKE, which at this point was the most used (and probably the only) managed Kubernetes solution, was using a single master only. We can imagine that the considerations around the decision could have been done to help not setting expectations (or SLOs) that are too high while learning to operate a new system, something to definitely take into account when starting with Kubernetes.

etcd

This wouldn’t be a blogpost on Kubernetes without talking about etcd. Etcd is the main storage behind Kubernetes and for sure it was not ready for prime time in 2016. It had bugs, performance issues and sometimes required to run manual compactions and other operations to make it survive the users operations, especially when running a high number of pods and nodes.

Clearly, on super small clusters clusters the issues were not visible, but still relevant things to take into account and that made the experience more difficult to users that wanted to run real production workloads at scale.

And of course, docker

In 2016, there were still a bunch of bugs regarding Docker that were easy to spot with Kubernetes. While running a single ec2 instance with docker on top of it (which means that Docker is essentially just used as a package mechanism) would not trigger such behaviours, with a container orchestration system it would be pretty easy to reach situation where docker would just hang (i.e. docker ps would never return anything and block).

That was reported a bit all over the place and I used to have a “golden version” of Docker that I compiled manually from my machine from a rc release that would run perfectly… but other versions wouldn’t.

Even GKE, which at this time was the reference for all the people running Kubernetes on any cloud, used to run a script on every node called docker_monitoring whose content is the following:

# We simply kill the process when there is a failure. Another systemd service will
# automatically restart the process.
function docker_monitoring {
  while [ 1 ]; do
    if ! timeout 10 docker ps > /dev/null; then
      echo "Docker daemon failed!"
      pkill docker
      # Wait for a while, as we don't want to kill it again before it is really up.
      sleep 30
    else
      sleep "${SLEEP_SECONDS}"
    fi
  done
}

That’s not a joke.

Provisioning tools

There are many provisioning tools out there, but in 2016 the space was not really super crowded (yet). To deploy a cluster on AWS, there were mostly only Kops (v1.4) and Kube-AWS (v0.8). Plenty of people, at the same time, especially given that the maturity of the two projects was not spectacular, were starting their own project.

At the very same time, the Kubernetes community was starting the effort on unifying the way to configure a node to run Kubernetes, an idea that is somehow taken from Kops, that resulted in the efforts behind kubeadm.

Kops v1.4

In 2016, Kops worked already pretty well. It was indeed the easiest way to setup a cluster on AWS. The work of people like Justin Santa Barbara and Kris Nova was really good and contained already lots of ideas that inspired both Kubernetes itself and other cluster provisioners.

The downside of Kops back then was that is was a bit difficult to understand. Kops contained a lot of code and tried to work across different clouds and to be a somehow battery included solution. While that can be a good thing, it is also the reason why several people started their own projects, thinking they could make something simpler, which sometimes was true, but somehow benefited the community less, bringing a bit of confusions for newcomers with regards to what to use, a lot of duplication (which is bad) and brought even more opinions (which is good).

Kube-AWS

Kube-AWS was trying a different approach, aiming only at AWS (hence the name), supporting only CoreOS as operating system and trying to have little code as a general approach.

The community behind the project was a bit smaller compare to Kops (none of the two is huge as of 2016) and the project seems to be maintained by a smaller set of people. While the project gained some good traction, it stayed behind in terms of popularity compared to Kops.

More questions

While the topics above are the main ones around the topic of deploying Kubernetes on AWS, that’s not everything that needs to be solved to have a production ready cluster. To have something that can really host production workloads, teams need to figure out topics such as:

  • Monitoring
  • Logging
  • Autoscaling (nodes and pods)
  • Security best practices
  • Authn, Authz
  • Overlay network configuration
  • Load balancing / Ingress traffic (ELB, ELBv2)
  • How to do cluster upgrades

That’s really a lot and there was no easy answer for some of those topics on AWS. CloudWatch was no replacement for something like StackDriver on Google Cloud, there was no native integration with the AWS IAM for authentication and authorization, overlay networks were all pretty young and buggy, there was no support for the ELBv2 a.k.a. Application Load Balancer (ALB).

This meant that any team that wanted to start using Kubernetes on AWS had a lot of topics to figure out and only the minimal basic things were working out of the box.

Back to the future!

Let’s jump to today. It’s October 2018 and the situation is pretty different. It’s easy to say that the core of Kubernetes is actually quite stable in terms of basic functionalities and that new features are still being added at an incredible pace.

As of today there are even more provisioner tools out there and a (partially) managed solution from AWS (EKS) which simplifies a lot of the operations around Kubernetes.

The architecture itself is (partially) stable, in the sense that most of the people are going with similar approaches without having to discuss all the questions all over again and, finally, the Kubernetes community is moving up the stack, trying to solve more problems than only the ones related to how to get a cluster up and running.

Core (kind of) stable

Deployments, ConfigMaps, DaemonSets and so on are here to stay. Those object are not seeing lots of change and the functionalities around those are solid enough. The code that deals with those feels strong and most of the bugs have been fixed. Even given that, there are still lots of quirks and weird bug in the system.

Some of them are not only Kubernetes specific, but are more evident in Kubernetes like the DNS intermittent delays of 5s, the effect on CPU limits on application latency or issues that are still found in CNI implementations like weave that do not handle “simple” cases (at least on AWS) where node come and go frequently which ultimately result in the cluster networking not working.

In terms of new features, Kubernetes sees lots of them at every release, it is enough to have a look at the release notes of the latest Kubernetes version (1.12) to understand that.

This clearly also means a bit of instability and the feeling that we are dealing with a system that is never done.

Staying up to date with Kubernetes feels today more difficult than it used to be.

The project itself is moving really fast and plenty of companies around the world come up with new ideas every day that complement the offering. But what if you want to stay at least up to date with the software you run in your cluster? Well, the best approach seems to be to do continuous update, the project is moving so fast that is super easy to run into an API that changed and this being unable to use some software and maybe we risk of experiencing some bugs that are already fixed in later versions (even if critical things are usually back-ported).

Another good idea is to use fully managed solutions as much as we can as those can easy the upgradeability of our cluster. An alternative to that is to build a bit of custom automation around open source tools, but this of course requires additional effort as mostly no solutions out there perfectly fits all the use cases.

On a similar note, it’s worth paying attention to the APIs that we are using: alpha resources are meant to change and we should pay a lot of attention to those. Being bleeding edge has the same risks today as in 2016 for sure and this aspect is often forgot in a community that talks a lot about innovation.

Provisioning tools

EKS

Clearly the big change of 2018 is that AWS finally released EKS in June after announcing the preview last year at re:Invent in Las Vegas (shameless plug: if you are looking for a nice video on Kubernetes on AWS, here is my presentation at last year’s re:Invent).

EKS comes with an HA Kubernetes control plane for a reasonably cheap price (0,20$ per hour at the time of writing). The offering provides a vanilla version of Kubernetes and not a fork which is really a great thing as it allows to use lots of other tools that are supposed to work with Kubernetes that are available in the opensource world. EKS has some things that are not ideal though: it is currently still stuck at version 1.10 while Kubernetes is already at version 1.12 (to be fair, GKE has also not completed the rollout to 1.11, so it’s partially still using 1.10).

The update policy is also a bit weird, with the control plane that can get updated without notice, potentially causing unplanned incompatibility with the nodes of the cluster. It’s worth remembering that EKS is only a partially managed solution as the worker nodes of the clusters have to be self managed and are not managed by AWS, similarly to what is offered with ECS and differently from GKE on Google Cloud.

Possibly, the “Fargate” version of EKS will allow to run containers using the AWS API without having to deal with the nodes, but this is still not even in preview.

Additionally, EKS still provides as API server endpoint with self signed SSL certificates which is unfortunate, giving that AWS is providing since a long time a Certificate Management service and that now even Kops supports real certs.

Kops (v1.10)

Kops has matured a lot over the past years and has been adopted by many companies. The project currently has 461 contributors on GitHub, which is quite a massive number of contributors for a provisioning project.

The project still has the old downsides of 2016, namely lots of code and still ships with a single master by default. It looks like that the project is lagging down a bit in terms of Kubernetes release (it’s currently shipping 1.10 while 1.12 is available) and it’s the goal of the project to always be not more than one release behind Kubernetes. It currently also comes with a very opinionated version of the world. It installs its docker versions on the node, etcd is mostly run on the masters even though there is the option to run it outside, and so on.

What is interesting to notice though is that Kops quickly became a good playground for experiments: it contains work on etcd manager which is a tool for managing etcd and backing it up, the Clusterbundle, project from Google to package cluster components and an experimental upgrade functionality that is aware of Stateful workloads that unfortunately was never merged.

More provisioners

In 2018 there are, as I said, more and more provisioners. While I don’t want to spend too much time talking about those, most of the new ones are based on kubeadm. The Heptio Quickstart is probably the easiest way to get a cluster up and running on AWS, interestingly even easier than creating and EKS cluster, even today.

Among the other provisioners, it’s worth nothing Kubicorn, which used to also serve as a playground for experiments for the Cluster API.

Cluster what?

The Cluster API is a community driven effort to have an API fully describing all the resources of the cluster, including the cluster itself, machines, etc. It derives from ideas that can already be found in Kops: kubeadm as a replacement for the node agent, the API server component was also in development inside Kops, with an object definition that fully describe the cluster (nodegroups, etc.). It’s a nice approach to a more Kubernetes native way of managing some cloud resources, but it’s indeed a big rewrite that is still in early stage.

As we know, rewriting code is not always a good idea and while I don’t want to judge the project (again, is in very early stage), we will have to expect mistakes and bugs, so it’s nothing worth considering for production or even testing setups for the moment, but definitely worth keeping an eye on for the future.

Federation

Long story short: the big promise of federation never delivered its promise. As of October 2018, there is “no clear path to evolve the API to GA” and there is an effort to implement a dedicated federation API As part of the Kubernetes API that is still in early stages and still mostly unused as far as I can tell.

The work on Federation v2 is ongoing though (more info here) and while it introduces some additional complexity (at last for what I can understand) it will develop further in 2018 and 2019 and shows us where the ideas is going to go and if it will be useful for the community.

Where are we going

Having a look at the Kubernetes community, it’s interesting to look at what the members are focusing on in 2018. Most of the attention is not anymore on how to provision clusters, but on other topics and definitely one of the hot ones is the service mesh.

Without going into many details, it’s interesting to see how the topic is dominating some conferences like KubeCon and how different vendors are fighting to propose their solution. Currently the most talked ones seem to be Istio and Linkerd, but given the huge change in this area, this could be outdated quite soon. As cloud aficionados, we should keep an eye on how those will be evolving in the near future, but they don’t seem to me to be really fully ready for production usage.

Both Istio and Linkerd have a similar architecture with an additional control plane to the Kubernetes ones and injected proxy to achieve smart routing, better security and out of the box better observability. That comes of course with additional complexity, another control plane and additional proxies in front of your applications and many configuration files and Custom Resource Definitions. Those tools will have to prove over time that they really add a lot of value to justify the additional complexity that they bring.

Another interesting topic is the focus on making Kubernetes a lot simpler for developers. Kubernetes is in fact often criticized to be too complex for developers (and I totally agree with that) and some efforts are trying to build higher level abstraction that don’t necessarily require dealing with tons of low level yamls.

One of them is Knative from Google which was recently announced and that currently requires Istio. The idea is really to create an abstraction layer that will make developers focus on services instead than all the details of how they work on a container orchestration system. The solution seems to be, in this first iteration, a bit complex and with a lot of moving part which will probably make sense if operated by a cloud provider, but that could result too complex for other users.

There are few other examples of higher level abstractions out there, like the StackSet controller from Zalando that simplifies the UX for the user quite a bit while providing traffic switching functionalities, but that still exposes the same PodSpec and Ingress and for that reason is not a full attempt at building a very high level abstraction, but more at simplifying the interaction and composability of the well known Services, Deployments, Ingresses.

What is needed

As we’ve seen, Kubernetes improved on a lot of things, but it’s still not done nor perfect. There are still plenty of bugs to fix and things to be improved. Contributing upstream it’s not a perfect experience: sometimes it takes ages to get things merged, even if everything is just ready.

That said, as members of the community, we should be contributing to the project as much as we can (and want) to make the project even better than it is today.

There is of course something else that is important: we need to share our horror stories. Kubernetes is often criticized as “hipster technology” and figuring out all the details related to run it in production is a fundamental step to increase its adoption while learning from the process. It’s fundamental to understand how to operate Kubernetes and build systems on top of it and to share those learnings with the rest of the community. Here a few links of presentations I know where this was done:

Conclusion

That’s it for this view on the last 2 years of Kubernetes on AWS. I hope you enjoyed this post and I hope that the next year will bring a lot of new topics and a lot of stability to our systems!

Things I've been doing recently

This blogpost is not the normal writeup about how I got into a new job, about the things that have been awesome or that suck. It’s not about love for my employer or previous ones and there is definitely no hate at all. So what’s left? This one is really about a bunch of things that I’ve been doing recently and in a somewhat different fashion:

  • I switched my default working mode to “pair by default”. This was needed for me to get used to work with new colleagues, know them better and build a connection with remote members of the team. It turned out to be much better than that. It increases my concentration, improves the quality of the things I do and reduce risks. Everybody says that, but very few are using this as default way of working and I was definitely not doing it in my previous job, not because of any blocker, just because that was the way we were working (and it was fine). Well, I am enjoying this so much I’m myself very surprised about it.

  • I take time in isolation every day. Seems like a contradiction, right? You cannot be all the time pairing. It is tiring and I need some time for myself as well. To solve that, I seek every single day some time to work in complete isolation, not long, maybe one hour. We have some nice little rooms in our office that are definitely not good if you suffer claustrophobia, but they work beautifully to regain full focus to crash a single specific task alone. It feels good in there.

  • I learn new things. We have mandatory (!) personal development days, which means time dedicated to learn new things during working hours. During those days, I do only things I enjoy. If I haven’t written much code during the week (it happens, YAMLs are my life), I do coding. I can watch a long video or learn a new technology. What is important though is that it has to be something I enjoy, because I feel I learn much more when things are fun.

  • I draw a lot of technical stuff. I do that because I realized that if I don’t draw things, I have problems visualizing problems and understanding how to solve them. Sometimes I even draw YAMLs! Where I work right now I’m not surrounded by whiteboards as I used to be for a long time and I really miss that… but guess what? You can just take some paper and make nice drawings on that. I’ve been using this technique for slides lately, instead of spending hours with digital graph tools, I just grab a bunch of sharpies, paper and start drawing, then I snap pictures… and the slides are made!

  • I don’t allow myself to keep working hard when tired. Would you drink alcohol and drive at the same time? I bet you wouldn’t, cause it is irresponsible. The same applies to most of the tasks at work and being tired. Now, if you read my tweets you know I’ve been through some serious emotional storm lately due to the tragic loss of a dear friend. Well, sometimes my mood is not perfect or I feel I get tired easily due to some stress that is sticking around… guess what I do when this happen? I go home and relax! Of course this doesn’t mean that I work only few hours, but I try to apply reasonable approaches in which I try not to push myself too much over my limits.

  • I trust (and I feel trusted). I mostly never need to know what my colleagues are doing and they don’t ask too much either. This doesn’t mean lacking care, more letting control go. We can work async. I can write this blogpost. They can take their time to do things. None of us need to know, we only make sure we enjoy working together and that we are more or less on track with what matters which is to keep doing the best for the company while becoming better at it.

  • I take time to enjoy life. Taking a long lunch break to meet a friend from time to time isn’t a bad idea, it’s actually really fun! I’ve been meeting friends and building better relationships and if I can do this during the day, well that’s great.

End of my random selection of things. That was not an exhaustive list and it don’t know if there is any benefit in this but I hope nonetheless that you will find this useful!

EKS "Review"

NOTE: those are just early impressions on a very promising product that I am very happy to see available and I am sure it will get better over time compared to what it is today.

Today I spent some time playing with EKS (Elastic Container Service for Kubernetes), the newly announced Kubernetes service from AWS and I decided to write some notes down, not only for my team, but for whoever is interested in knowing more details on EKS.

I started by playing with the official tutorial that can be found here. I was right from the beginning pretty surprised by the lack of automation of the whole process: the UI or CLI only help you getting a very basic control plane up and running and nothing more while leaving a lot of steps to be executed manually.

During the time I was waiting for the cluster to be ready, I didn’t get so much feedback on what was going on. In general, I’d love to see AWS give a shot at a more user friendly experience: what I mean is, for example, to put a link to the documentation or tutorial directly in the UI such that users can jump to interesting material while the cluster is still creating. In the end, EKS was not only built for people that already know Kubernetes and how to operate it, but for the other AWS users out there that are interested in containers and container orchestration and that don’t want to operate a complex system like Kubernetes.

The getting started guide is unfortunately not super friendly from that point of view, but there are many other tutorials on the web that can get you up and running with the basics of Kubernetes.

Something that surprised me as well is that there is no button to download the kubeconfig. This is the configuration file needed for kubectl (which i pronounce koob-cee-tee-el because I can’t unlearn this way of pronouncing it) to connect to the cluster, which contains the name of the clusters and other information.

This file would be pretty easy to generate based on the information that are available in the UI and it would really be trivial to make it available for download such that we could easily get started. In a similar way, it is not exactly super easy to get started given that we have to download kubectl and Heptio’s authenticator. As a comparison, GKE on Google Cloud Platform, has a way to easily get both kubectl and the relevant configuration via the official gcloud tool which is indeed very handy.

In my case, the cluster creation took quite a while, around 15 minutes. That’s much more than I was used to on AWS with tools like kops which takes around 5 minutes to get a cluster up and running with very similar characteristics. It’s true though that this is not an operation that we are going to be doing really often, so it doesn’t really make a huge difference.

After the cluster creation, what we really get is a Highly Available control plane that we don’t see and can’t modify. This is in fact hosted somewhere in the AWS infrastructure, but we have no access to it. What is needed to be done to finish the cluster setup is to create worker nodes to attach to the cluster to actually run some applications. In order to do that, AWS provides a CloudFormation template that allows to spin up nodes easily. The output of the CloudFormation template is an AWS IAM Role that needs to be put in a ConfigMap. Why do we need that?

Well, the nodes need to somehow join the cluster and EKS is using the IAM identity and Heptio’s AWS authenticator to allow the nodes to do that. That’s very AWS native and, in my opinion, quite nice compared to other setups that use shared tokens or just certificates to do that.

Following all those steps, we finally have a working cluster. Was this painful? A bit, especially given that there are plenty of tools out there that made the experience of creating Kubernetes clusters really easy.

In that regards, I tried as well eksctl from the folks at WeaveWorks and the whole experience was so much better!

You can easily create a cluster with:

eksctl create cluster

While the whole process is still not super fast, you will get a cluster in one simple command and this includes the worker nodes. In my case, the command failed to setup a kubeconfig because it tried to use my local kubectl which has a weird version (because it’s a version I compiled by hand, but that’s a story for another time). The rest though, worked pretty smoothly.

The setup we get once we have an EKS cluster running is really simple. Here a list of pods that were running in my cluster (all in the kube-system namespace):

NAME                       READY     STATUS    RESTARTS   AGE
aws-node-6vrp2             1/1       Running   1          6h
aws-node-k2dc2             1/1       Running   1          6h
kube-dns-7cc87d595-cj9js   3/3       Running   0          6h
kube-proxy-8nccp           1/1       Running   0          6h
kube-proxy-lshj8           1/1       Running   0          6h

This is very minimal and has very little overhead, but it is also not a really complete setup. This is in fact lacking support for Ingress and does not have anything in place to setup IAM roles for the pods. This is a bit surprising as AWS talked multiple time about AWS IAM in Kubernetes and I assumed that they would start with Kube2IAM and then improve the setup with something different.

The decision for AWS IAM was probably to let the user roll their own solution from the ones that are available as opensource projects while leaving the door open to develop a better solution together with the community. AWS is already working on it and some details of a possible future implementation are available in this Google document that was discussed in the last sig-aws meeting.

Apart from some rough edges, I’m happy to finally be able to use EKS and I’m looking forward to see it available in other regions. From my experience using Kubernetes since version 1.0, it is not super trivial to operate the control plane, especially at scale and with an HA setup and that’s why such a proposition from AWS was so requested from AWS customers and hopefully we will go towards a more managed solution in which managing the cluster will just be a detail that we will not have to care about.

Junior Engineers

Yesterday night I went to see Erlend Øye live in Berlin. He did a great show, during which he played with some young musicians from the south of Italy where he is living (Syracuse). Some of them were really new to playing live: Erlend told us that one of them did not play ever live before the start of the tour and another one literally said “I wrote this song when I decided to start writing songs… last December”. Erlend left the stage, sat of the floor and listened to this not so tall guy play a very beautiful song of his.

When he finished, he got a long applause from the audience which he indeed deserved. Erlend came back on stage and said “a star is born” and “Prince was 1,5 meter tall” meaning that being short, Italian or having started only 6 months ago writing songs is not an impediment to do great stuff and it shouldn’t be.

What does this have to do with the title of this post? Well, if it’s not clear, this has many analogies with the work environment. Think about giving opportunities to junior members of your team, about hiring juniors to give a breath of fresh air to your songs (ehm codebase)… doesn’t it all make sense?

And if you are a senior engineer, you could think about mentoring those juniors, give them the opportunity to go on stage and demonstrate their skills (and of course allow for mistakes) like Erlend did. And don’t be the only one under the main lights on the stage, let them take ownership of some topics if they feel ready for it.

And go see Erlend Øye, it’s worth it.

Marco

Marco era un pezzo di me. Marco mi ha introdotto a Linux, prima che fosse davvero cool, quando Ubuntu non esisteva. Ricordo ancora i miei primi sguardi a Red Hat con lui. Se non fosse per lui, probabilmente starei ancora usando Windows 98 o qualcosa del genere.

Con Marco ho condiviso una casa e dei momenti unici quando vivevamo entrambi a Milano. Con Marco, ho giocato a fare l’adolescente, il maturo, il nerd, il birraio. Ho condiviso passioni e alcuni dei momenti più belli della mia vita. Abbiamo perfino comprato due magliette identiche con il famoso Blue Screen of Death!

Di Marco mi rimarranno sempre i suoi CD con scritto “Mack”, il sentirsi su Soulseek, le giornate passate su IRC, il Solero Smoover, la Pazzeria, Via Koch e il ricordo di tanti istanti indimenticabili. Con Marco ho condiviso anche alcuni momenti davvero stupidi, come in ogni buona amicizia che si rispetti. Stupidi come quando l’ho riportato dai genitori piuttosto ubriaco o come quando ha riportato a me a casa piuttosto ubriaco. Ci volevamo bene e ci aiutavamo.

La scorsa notte Marco se n’è andato e con lui se ne va un pezzo di me. Non avrei mai pensato quale sarebbe stata l’ultima cosa che ci saremmo detti. Io gli ho mandato una foto di una tazzina con scritto Danesi, trovata a Berlino. Scherzavamo spesso sul “Danesi”, che usava(mo) come sinonimo di caffè. Lui ieri mi ha risposto con il suo solito “Buahahahahaha”. Apparentemente felice, positivo.

Tre ore dopo, sarebbe morto, ma io lo ricorderò per sempre così: felice, che ride.

Quasi paradossale quel che ci dicemmo qualche giorno fa, parlando di linguaggi di programmazione, di Go e di Rust. Marco mi disse “Avrei bisogno di tempo infinito, Rafè.”.

Ora ce l’hai Marco. Non ti dimenticherò mai.