The more and more we move our lives onto the web, the more important it becomes to make sure we keep our information safe. I’m sure you’ve had your fair share of issues with security on the internet, most likely a compromised password or a very believable phishing email. Fortunately, there are measures we can take to make sure we’re being as safe with our secure data as possible.

Most people are familiar with the concept of encryption – scrambling and encoding our information in a way that makes it very, very difficult to revert it back to its original form without the right key. Say you want to pass a message to a friend, and you want to make sure no one else who might come across it is able to read it. So, you meet up in secret and create a new alphabet with symbols or shapes to represent the letter. Now, when you pass notes, only the two of you will be able to read it. You’ve created a method to encrypt your data that can only be decrypted by someone with the correct key.

But what if you can’t meet up in person? Or someone tries to impersonate your friend to get the key and intercept your messages? You’d have to develop methods of protecting your information and verifying the identity of the people you share the key with. This is where the idea of the public key infrastructure (PKI) can help.

PKI is a method of securing the transfer of your information online. To take a quick step back, it’s important to understand what security measures are supposed to provide you in order to be considered reliable. There’s a security model called CIAAN. It represents the core principles of cyber-security:

  1. Confidentiality – The assurance that your data is private and only authorized people can have access to it.
  2. Integrity – The knowledge that the data you’re seeing or sending isn’t being changed or altered in any way in transmission.
  3. Access Control – Only people with the right permissions or privileges can access data.
  4. Authentication – A means of identifying different parties online be that people or non-person entities like other computers.
  5. Non-repudiation – A proof of ownership of different kinds of data.

You’ll see different versions of CIAAN online with different words or ones with more or less letters but they all essential equate to the same thing: making sure your information is secure. PKI should be able to provide all of the above principles and, as you’ll see, it does.

Let’s take a look at a very common example of keys. There are two types of keys for encryption; a public key and a private key. Say you have a mailbox; it has a locked slot on the top that can be used to place letters inside and a locked door to take the letters out. Since it’s your mailbox, you have a key for each of those locks. But you want your friends to be able to send you letters so you take the key for the incoming mail slot and you give it out to everyone, maybe even leaving the key with the box so anyone can leave a letter. This is your public key. However, you don’t want anyone else to open the door to get the mail out, so you keep that key to yourself. This is your private key. In PKI the idea here is that anyone can have access to your public key, and it can be used to encrypt messages to you. But the only way the message can be decrypted is with your private key. No one else is able to decrypt the message.

To stretch the metaphor further, you open your mailbox and you see a letter from your friend, Bob, but it seems strange, and you question whether or not the message actually came from him. How can you ensure the authentication of the letter? In PKI we have what are called signatures, a way for Bob to sign his letter so that you can be sure that he was the one who wrote it. Well, Bob himself has two different kinds of keys: a private signing key and a public verification key. Bob will encrypt his message with his private signing key and if you can successfully decrypt the resulting signature with his public verification key, which is available to anyone, then you know for sure the message came from Bob.

When Bob encrypts his message, the signature itself is unique to the message. This is because the signature is created by using something called a hashing algorithm on the message. What this basically means is the message is taken and, based on what data is inside of it, some code is produced. When talking about signatures, that code is called the message digest. If even a single letter is changed in the message, the code changes as well. To give a brief and over-simplified example, if I take the word “hello” and I turn it into numbers based on where each letter is in the alphabet, I get “8 5 12 12 15”. If I add those numbers to each other I get the number 52. If you saw the number 52 it’s very unlikely you’ll be able to figure out that this means the word “hello”. And if try and change just one letter around to “hpllo” the new code is 63. You can take Bob’s message and add all the letters together and make sure it’s the same number that Bob says it should be. In practice, this is all done by your computer for you when checking a signature.

However, the security of that entirely depends on Bob’s private signing key staying private. If it becomes compromised, we won’t know that Bob is, in fact, the owner of the message and non-repudiation can no longer be maintained. A classic example of this is something called the man in the middle attack. Suppose we have a third party, Eve. Eve can make her own signing pair and send the public verification key to you pretending to be Bob and to Bob pretending to be you. Now, Eve can intercept the messages passing between them and decrypt, read, and possibly alter the messages before sending them on their way.

How do we make sure the public verification key can be trusted? We need a way to distribute public keys such that the identity of the owner is known and verified. This is where digital certificates come in. A digital certificate contains a lot of information about the identity of a person (or machine) such as their public keys, the name the certificate is issued to, the certificate version, the unique serial number, the algorithm used to sign the certificate, and other identifiers.

You can think of a digital certificate in the same way you might a driver’s license or passport. Anyone can claim to be “Bob Smith”, but his license confirms his identity and you trust the license since it was issued by a trusted source. The trusted source the certificates are issued by is what is called a certificate authority (or a CA). As long as you trust the CA, and the CA trusts the certificate, you can be sure that it’s valid and authentic.

A CA can be self-signed meaning it’s the root certificate or trust anchor, so it wasn’t issued by another CA. The root CA can issue other intermediate or subordinate CAs. These CAs can issue their own certificates much like the root CA can. This takes some of the load off the root and allows for some added security. Each of those intermediates can issue other intermediates and so on. Visually, it’s starting to look a lot like a tree with the root at the top and all the branches of intermediate CAs and their certificates. This whole tree is called the certificate hierarchy or the hierarchy of trust. The user certificates like Bob’s can’t issue certificates and are called end user or end entity certificates because they’re at the end of the chain. In order for you to trust Bob’s certificate you need to trust that whole chain all the way back to the root. If any links in the chain are compromised, so is Bob.

But what happens if someone’s keys are compromised? The certificate can’t be trusted anymore and needs to be revoked. This is done using a certificate revocation list (or a CRL). This list contains every certificate that is no longer valid whether it be because it expired or because of a compromise. The CA will release a CRL periodically but usually will release one immediately if there is a compromised certificate. The CA checks this list every time someone tries to request a certificate. You can imagine that list gets pretty long over time. So, we use what’s called a CRL distribution point, a shorter list usually specified in the certificate itself. Now, if the certificate is compromised, you only need to check the CRL distribution point instead of the entire list of bad certificates. This way the PKI can run more efficiently.

The certificates, keys, and CRLs are stored in a directory. This service has a global listing of data which can be accessed by different users and applications. The directory contains objects and their attributes (e.g. name, address, email, etc.). Public components like the public verification key or public key are not encrypted, stored in clear text that can be read by anyone. We know the information is good because it’s all been signed by the CA. This directory is one of the most important parts of the PKI since it contains all the certificates and CRLs.

All these aspects together form the fundamentals of a PKI. There are many more intricacies and policies that go into it, but this should, hopefully, give some context and provide a basis of understanding. You see it working every day, even if you haven’t noticed. See the little lock icon next to a URL you type into your browser? That’s your computer telling you that the certificate for that site is valid and you can trust that your data is safe. The PKI allows users to be confident in knowing that their data is secure and so are their communications.