Core Competencies for Devops Engineers

Published

Devops jobs are in demand and probably will be for some time. I've done a lot of interviews for this position and while candidates might have competencies in a few areas, they often miss out on many other key areas.

"Devops Engineer" is a bit of a misnomer - DevOps is a mindset that an engineering team has, and every team member should be a part of it. DevOps Engineers are expected to facilitate everything a team needs to do DevOps, but they are often expected to in isolation, which can be a frustrating experience.

Skills

0. Cloud Providers, Linux, and Git

This is a bare minimum for being a DevOps engineer and every other skill depends on it. You should know AWS (at least the EC2 and Networking part), GCP or Azure, but AWS is the industry standard at the moment. Whatever your cloud of choice is, you should:

  • Be able to deploy an environment from scratch including networking, NAT's, static IP's, etc
  • Understand how their authentication (IAM) system works
  • Understand volumes and their different types
  • Be able to use their storage service (S3, GCS), and not get this confused with volumes
  • Understand their cloud offerings for managed services such as CloudSQL, RDB, EKS and GKE. It's not always the best option to use a managed-service, but it should always be on the table.
  • Be able to deploy load balancers with TLS termination

1. Infrastructure-as-Code tools like Terraform/OpenTofu

When you are working with infrastructure either in cloud, on-prem or hybrid, manually creating servers and configuring networks is a non-starter. Even using CLI tools and scripts to automate this would be considered out-of-date. Today, Terraform and its fork OpenTofu are absolutely key to managing large infrastructure deployments. They are stateful and idempotent, which means that if you run the same terraform apply repeatedly, your end-state will be exactly the same. It will also show you what it's about to do before it does it, and lets you plan large changes.

I personally like to use Terragrunt on top of Terraform to add a little bit more don't-repeat-yourself to the configuration. It's not strictly necessary, but I've found it helps when modules depend on other modules being set up and provisioned (for example, module 1 might set up Elasticsearch and module 2 might configure Elasticsearch's backups and auth).

2. Docker

Docker is not the new hotness, it is an industry standard and has been for at least 6 years now depending on who you ask. Isolating dependencies and runtime environments and packaging it into a single Docker image artifact has been absolutely key for consistent deploying to development, staging, and production environments.

You should know how to build a reasonably secure and tidy Docker image, with no stray commands and unnecessary layers. Don't run apps as root, don't bundle unnecessary dev tools into a production image, and definitely don't package credentials into your image.

3. Container Orchestration (Kubernetes)

Kubernetes is the industry standard for container orchestration. If you know Docker Swarm or Nomad, take the time to brush up on Kubernetes using GKE or EKS (or locally, MicroK8s or k3s). The basics are setting up a deployment with a pod or two, but you should also know about:

  • StatefulSets
  • Services
  • Ingresses / Ingress Controllers
  • Persistent Volume Claims

Kubernetes has a bad rap for being overly complicated, but I think its as complicated as it needs to be, this kind of stuff is just hard and can have a lot of nuance. It's also not that hard.

Also take a look at learning the basics of Helm. Helm templates out Kubernetes resources so you can package your whole app in a Helm chart. Bonus, Terraform can be used to deploy Helm charts.

4. Scripting with Bash, Python/Ruby

You're not expected to be a software engineer as a Devops Engineer, but you should definitely be a competent script-writer. Bash (or ash/zsh) are expected, and Python or Ruby are bonuses. As great as Terraform might be, there will always be a need to write one-off-scripts to accomplish some task. For example, during security audits, you'll be asked to do things like export out a list of all users in each production database, and their roles. Not something you want to do manually once a month!

5. Monitoring and Alerting

The basics of monitoring are outage detection, but you also want to keep an eye on key metrics such as disk usage, CPU usage, etc. The tools of the trade are Grafana and Prometheus, but there are other tools like InfluxDB and even paid services like DataDog or SignalFx that you can use. InfluxDB and Prometheus are time-series databases primarily used for alerting (for example, alert if the sum of response_codes over 499 is greater than 0 in the last 10 minutes), and Grafana is an amazing UI on top of these databases. Prometheus is the tool of choice for Kubernetes deployments, but whatever gets the job done is an equally good tool to use.

PagerDuty and OpsGenie are popular alerting tools. Usually you configure Prometheus to send alerts to AlertManager which then sends alerts to PagerDuty/OpsGenie, which will give you a message on your phone. It's not fun, but you want to know when your site is having trouble long before your customers do.

6. Configuration Management tools (Ansible, Puppet, Chef)

This one used to be more important, but Docker and Kubernetes have made it less important to be able to provision live-running servers. Even so, there is a time and a place for it. Configuration management tools let you install packages and apply configuration onto a fleet of servers in an idempotent, reproducible way. There is some overlap with using Terraform, but sometimes config management tools are a better choice.

You should know how to:

  • Install a package
  • Copy a file onto a server
  • Ensure a service is running
  • Configure local firewalls with IPTables or equivalent

My personal choice is Ansible, and while I've used Puppet and Chef in the past, their popularity seems to be waning.

7. CI/CD

Automatically running tests, building application packages, and deploying them are all key features of present-day engineering teams. Jenkins is the popular choice, but there is also CircleCI, Travis, and my personal favorite, Gitlab. Gitlab's CI CD jobs are extremely well defined and it can take care of a lot of things for you, such as artifact caching and Docker registry authentication. It also has direct integration with Kubernetes.

You should be able to:

  • Write CI/CD jobs in code, not manually configuring them in a UI
  • Run tests automatically
  • Run docker builds and pushes
  • Execute deployments

This should also be implemented in a secure manner - no long-lived super-admin passwords for Jenkins just waiting to be compromised.

8. Security

Securing the infrastructure and the app (to the extent that you can) are key roles for Devops engineers. The basics are ensuring least-privilege - developers shouldn't be able to SSH into production servers, container registries should be properly authed, and passwords should be cycled regularly (and ideally automatically).

This might seem like overkill but in 2023, having fixed passwords is largely unnecessary and a major security risk. GKE and EKS both offer ways to tie IAM users with service-accounts, eliminating the need to pass cloud credentials into the app. CloudSQL and RDS databases also offer the ability to log in with IAM credentials, so rather than passing in an admin password that is inevitably shared to the whole company, both apps and individuals can log in using their cloud credentials.

For other databases and systems such as Elasticsearch, Hashicorp's Vault can configure temporary passwords. Apps can log into Vault with their service-account token and request a short lived (maybe 10 hour) credential, and then when the expire time is getting close, request a new one and repeat. This can also be abstracted away by running Envconsul or consul-template as a parent process to your app, so the app doesn't need to be aware of Vault at all.

Containers and systems need some kind of vulnerability scanning. AWS and GCP offer solutions to these, as does Gitlab and many other (mostly paid) services. You will be asked for this at some point so you might as well get it out of the way.

9. Database Administration (a little)

You're not expected to be a full DBA, but you should have some concept about how to set up and administer Postgres, MySQL, Elasticsearch or Cassandra/ScyllaDB. Administration mostly comes in the form of adding and removing users as well as doing backup and restores. You'll want automation around backing up and restoring to a new cluster as well.

Conclusion

You don't have to be an expert in each one of these points, but you should have some level of knowledge of all of them. The more the better. I know mentioning specific tools might date this article a bit, but I don't see them going away anytime soon and if they are, they'll be replaced with something similar in spirit. If you're in the market for a DevOps Engineering role, good luck and I hope this list helps you!