SSH Best Practices Overview
Discussion and demo from SSH Access experts Ev Kontsevoy CEO Gravitational, Jack Naglieri CEO Panther labs and Gus Luxton Gravitational as they discuss the lessons learned around SSH access from their time at Facebook, AirBnB, and Rackspace. Learn how to SSH properly as well as the open source offerings of Teleport and Panther’s log analysis offerings. In this webinar, Gravitational CEO Ev Kontsevoy hosts a conversation with Gus Luxton, Gravitational DevOps Engineer, and Jack Naglieri, CEO of Panther Labs, about SSH, why certificate authorities are a must-have, how to audit that activity, and what to do with those audit logs once you have them.
Key Topics on SSH Best Practices:
- Use certificates for SSH, not public/private key pairs.
- Use a proxy or SSH bastion
- Derive your identity from a centralized database (SSO)
- Collect and centralize important security data
- Detect threats and take action
Expanding your knowledge on SSH Best Practices
Learn more about SSH Best Practices
Introduction - Industry Best Practices for SSH Access
(The transcript of the session)
Ev: Shall we get started?
Jack: Let’s do it.
Ev: Let’s do it. All right. Hello, everyone, and thank you for joining us. I am Ev Kontsevoy, CEO of Gravitational, and today, we are going to be talking about SSH access, and just generally about industry best practices for accessing computing infrastructure. Please feel free to ask questions using Zoom Q&A tool during the webinar. Notice there’s a little Q&A button you can click at any time, and that will direct your questions to our guests at the end. And speaking of guests, we’re lucky to have two experts in our virtual room today. Jack and Gus here both used to work on securing production environments for famous and large Silicon Valley companies for Airbnb and for Facebook. And today, they have agreed to sit down with us and chat about how to stage properly. Both of them are now involved in developing open source tools, next-generation tools, that are very relevant to this space, and we’ll talk about those as well. But before we jump into technical details and demos, let’s have our guests introduce themselves. Gus and Jack, what did you work on at Facebook and Airbnb? Gus, you go first.
Gus: Sure. Thanks, Ev. Well, hi everyone, I’m Gus Luxton, DevOps engineer here at Gravitational. One of my responsibilities here and what I do now at Gravitational is I assist people with designing and architecting and setting up Teleport deployments. And as a result, I’ve learned a lot about how people use SSH, what works well, what doesn’t work well, that kind of thing. I also used to work on the production engineering or SRE team at Facebook. And I’ve seen how SSH can scale out when you have thousands of employees accessing hundreds of thousands of servers. So I’m definitely keen to pass some of that knowledge along.
Jack: Hey. Thanks, Ev. My name is Jack Naglieri. I’m the founder and CEO of Panther Labs. Prior to Panther, I was an engineering manager at Airbnb on the security team, and we were responsible for collecting security logs at scale for the purposes of detection response. And with that is SSH logs. So we want to get as much helpful signal as we can to determine that our pressure environments are safe, and our systems are being used responsibly and nothing is happening that could potentially lead to a breach. And that leads really well to what Panther is, which is a way to take that security data, collect it, aggregate it, centralize it, and take action on this data. So I’ll talk a bit about that later on.
Ev: Great. Thank you, guys. I guess all of us are now in the business of spreading industry best practices from Silicon Valley companies like Airbnb and Facebook to the rest of the world. I myself came from Rackspace, by the way, where SSH access across seven different regions in multiple form factors, infrastructural form factors, and data centers was a huge problem. So Gravitational was basically started to solve this problem with open source. And Gus, recently, you’ve written quite comprehensive and extremely popular blog post called How to SSH Properly. Now, if you google “how SSH properly”, you actually find that, and it’s been quite a hit on Hacker News. And today, we are doing this webinar in a response to so much interest in this topic. And from your guys’ perspective, what are the top like three to five things every engineer in an organization can do to improve SSH access using tools they already have today?
Top Recommendations to Improve SSH Access
Gus: Yeah, absolutely. So number one recommendation from me would be, use SSH certificates. Don’t use public keys for authentication. The reason for that public keys — private keys, more specifically, private keys get stored on people’s laptops, and they can be easily lost. People have left laptops lying around, USB keys. They can be stolen. They can be compromised. And by default, if you issue an SSH private key, it’s valid forever. So if it gets compromised, you have to be able to go and remove that from every single server where the public part of that key has been deployed, just to make sure that you’re not at risk anymore. And that doesn’t scale well at all. If you have many, many servers, you’ve got to make sure that you’ve got rid of that key from absolutely everywhere. And that’s very difficult to do. So certificates can be issued, and they can be made to expire within as little as a minute, for example. So once they’re expired, they’re no use to you anymore, but you have to get a new one. And that issuing process can make sure that you should still have access to the infrastructure. So that’s number one.
Gus: Number two would be, use a proxy server or an SSH bastion server. Don’t allow access from just anywhere. SSH is an incredibly powerful tool, and it will allow you, if you want, to access from anywhere on the planet that you can get an internet connection. But you should be wise, and you should limit your attack surface. If you control where people can connect from, you make sure that the logging associated with those connections can all be easily centralized because you’re pushing all your connections through a narrow kind of bastion or proxy to make sure it’s monitored. You got a good handle on exactly who’s doing what and when. Number two. And so number three, drive identity from single sign-on or a centralized database of some kind. Get your user data and the teams that those users are a member of, get them from somewhere external, so that way, you have one source of truth for all of your users. You have one place where you add users when they join, and you have one place to remove them when they leave. That means that you don’t have to have any records left lying around. You don’t have to have extra data. You have one place to onboard and one place to offboard, and that gives you much greater control of your infrastructure.
Ev: So that’s number three?
Gus: Number three. Hand over to Jack.
Ev: Yeah, I like that because it also allows you to just generally minimize the usage of SSH within your organization. You can just define a group of employees who can use SSH, and no one else should even be doing it. Jack, what else do organizations need to consider, from your perspective?
Jack: Yeah. So from the monitoring side, it’s really just collecting and centralizing a lot of this important security data that already exists or is out there. So in a giant fleet of machines, or, for example, in the [inaudible] cluster, there’s a lot of instrumentation that we can enable by default. So for example, Linux, there’s logs about SSH connection. There’s auditd. There’s Syslog. There’s all this really rich information that we can gather, and we can also do it in a very targeted way specifically for authentication events or application-level events, things like that. So it’s really important. I mean, the biggest recommendation I always have when I speak with customers is to collect logs throughout the layers of the stack. So starting from what’s broad, you would say like, “Okay, for our Amazon accounts or GCP accounts or whatever cloud environment we have, collect the API logs from there. Collect the net flow data.” Anything that the cloud allows us to collect, that’s one layer. And then digging deeper is like the network layer. So traffic to our hosts, things like that, and then getting more into the host’s side which is more relevant for this conversation, looking at SSH activity, looking at commands executed, things like that, and then getting even more specific, is instrumentation on applications.
Jack: So you run the application, or maybe a specific scanning tool or things you use. So we want to take all this data, and we want to centralize it in a single place, and we want to be able to tell the story of what’s going on in our environment. But then we also need to retain this data in the event of a breach, so that’s my number four.
Ev: That’s number four.
Jack: Yeah. So number five is kind of the next step after that. So once you’ve instrumented all your systems throughout the layers, and you’re getting all this really helpful data, you want to do detection, and you want to take some action. So you want to be reactive and proactive. So you want to say, “If this activity is occurring in my environment, I want to know about it, and I want to put security controls in places to help that.” And then we can go through popular frameworks that look at, for example, like MITRE ATT&CK is a great common framework that security teams use to sort of track these types of attacking behaviors. So we can model things out like that into our rules, and then we can get things in Slack or PagerDuty, things like that, to take some action when that activity occurs.
Ev: Perfect. Gus, so you covered most of these best practices that we just talked about in your blog post How to SSH Properly, and in that post, you were — again, we’re not trying to sell people on anything, and it’s written really for users of common Linux packages, like how you can implement all of these things with tooling that you already have that comes with your Linux distribution. Obviously, it’s complicated. It takes a lot of work, and if existing tooling is too complicated, so that kind of brings us to this new generation of open source tools we’re all working on right now. So let’s talk about those. Gus, so what are you working on?
Teleport: Part of a New Generation of Open Source Tools
Gus: For sure. So yeah, I work primarily on Teleport, and Teleport is a tool which can help you to make the process of setting up centralized authentication using an SSH certificate authority much easier. So it’s totally possible to set up your own certificate authority with SSH using SSH-Keygen and other tools that you get just by installing SSHD. It does take time, though, and it can be hard to do right. There’s a lot of command line switches needed to get the right functionality to implement expiry to make sure that you have correct metadata associated with your certificates and so forth. And then once you have that, you probably want to be able to automate the renewal of certificates, or the issue of the certificates in the first place. And that takes time and effort as well. And one of the things that Teleport tries to do and does well, in my opinion, is it helps you to do that automatically. So it sets up the certificate authority for you. It comes with sane, sensible default settings out of the box and attempts to make sure that you get exactly what you’re looking for the first time round.
Gus: And so on top of that, once you have the certificates, you need a process where people can add hosts to your estate, for example, because certificates don’t just authenticate users. They also authenticate hosts to make sure that people trust the hosts that they’re looking into. So it’s entirely possible to do this with the tooling that you have already. You can write scripts. You can use things like Ansible SaltStack, any kind of automation or configuration management to assist you with this. But it’s all development work that would need to be done just to be able to get to a baseline state of everything being happy and good. So Teleport assists with that too. Teleport can automate the adding of hosts. It can make that simpler for you, and it can integrate, again, as we mentioned, with those external databases. So you can derive identity from a centralized database like GitHub or another SSO provider. If you want to limit the principles, the unit users that people can use to actually log in to machines, again, you can do that with SHH out of the box, too. You can use authorized principles file, and you can make a list of what certificates should be allowed to log in as what users. SSH will enforce that for you, and it can totally do it. That, in fact, is exactly what we did at Facebook during the first iteration of this as well. We had principles for web servers, and we had principles for root everywhere and this kind of thing. So we would specify exactly what group people would be allowed to connect to based on the principles that they had in their certificates, but again, you have to set that up. And it’s something that you can’t just do immediately. It needs a bit of work to do it, and Teleport makes that simpler. It’s got built-in role-based access control and other primitives that you can continue, and you can make it either very simple or very complex, depending on exactly how you want it to work.
Gus: And the other part is that from the perspective of the developer or a user who uses SSH to connect to systems on a regular basis when they need to, obviously, if doing things properly — if doing things right makes it harder for you, then developers aren’t going to want to do that, in my experience. It’s very easy to become resistant to things that make your workflow harder. You have an existing way that you do things. You like doing things a certain way, and you don’t really want to change from that. I’m completely guilty of exactly the same thing, and I know exactly how that works, so. Teleport, we kind of built it — it’s built by developers. We’re a very developer-oriented company, and it’s built for developers. We just try to get out of the way. We just want to be able to provide the tool which lets you use your existing workflow but in a more secure fashion.
Ev: Thank you, Gus. Jack, what are you working on? Tell us more about Panther, and then we’ll do a demo.
Panther: An Open Source Security Platform
Jack: Yes. So Panther is an open source security platform that enables teams to collect the important logs that I was just talking about and respond to suspicious or malicious activity in their environment. So a big challenge with security teams today is that they’re really understaffed, and they really struggle with just getting started with security monitoring because it is a very important and difficult task. So especially, this compounds with the fact that cloud introduced a huge explosion of data, and these teams need something that can get up and running really quickly and can really adapt with these companies as they scale. To Gus’s point, he said, “Teleport is built by engineers for engineers,” it’s really the same mantra with Panther. We as a company are a combination of security engineers that came from Amazon, Airbnb, companies like that, who had solved these problems internally for those companies, and now, we’ve built an open source solution that other security teams can take and run with. The status quo in the industry today are really tools that were developed in the last 10 to 15 years and haven’t really fundamentally adapted for the cloud yet. And I’m referring to tools like Splunk or Elastic that teams really struggle with deploying and scaling.
Jack: So our whole approach is, Panther is fully serverless, and we take a developer-first approach very similar to what Gravitational does as well. And our whole thing is, because we’re serverless and cloud-centric, we can deliver a platform that can scale super easily to meet those demands, but also can run at a much lower operational and cost overhead. So we support data sources from S3 or SQS. We can actually pull directly from SAS as well. So very popular log types that we often onboard into our environments are like G Suite logs, for example. And in the demo today, I’ll show how Teleport logs can get into Panther, and we can write detections off of certain activity from SSH.
Ev: Yeah, makes sense. I like this theme that both of you guys can identify, that at the end of the day, everyone can do these things themselves, but it takes time. It takes expertise. You need to know the switches. You need to manage a ton of different config files, and some of this tooling is obsolete and just hard to use. It’s just unpleasant. So making things easy, that’s basically what both of us are in the business of. All right. Enough talking. Maybe we should show the audience how all of this might look in practice, how it actually looks and feels like. So how can someone go download Teleport, set it up, and point the audit logs at Panther for threat detection and response? So we are showing a diagram here. So this is how a demo environment that we are about to show you looks like. So you have your users on the left. They might be using SSH or, actually, Kubernetes because Teleport provides the same proxy that supports all of this infrastructure protocols. And then as they go through the proxy, they connect to virtual private environment, VPC, or just the LAN in a traditional data center where there’s a certificate authority that automatically issues certificates and different clusters of servers for different environments, staging production, and then the logs are being fed to Panther. In this case, Teleport sends the logs to S3, and Panther is picking them up. So let’s start the demo. Gus, show us the light.
Teleport and Panther Demo
Gus: Absolutely. I’ll do my best. So I’ll share my screen here. So I’ve got a couple of windows here, a terminal window on the right and a Teleport cluster on the left. I’m going to show you kind of how — so we mentioned kind of beforehand that Teleport, it’s SSH-based underneath. So although you can use open-sourced tools to do these things yourself, you can use SSH-keygen and do it yourself, but Teleport can do it all for you. However, the way that it integrates — the reason why it’s so [inaudible] compatible with SSH themselves. Like if I look at a certificate here that was issued by Teleport, what we can see is that this certificate is an SSH cert-v01. It has a public key. It has a signing CA. It has an ID associated with it. It’s valid for a short period of time. It has a list of principles and uses and various extensions associated with it too. So should you be allowed to forward the SSH agent, should you be allowed to forward ports? Are you allowed interactive sessions, for example? That kind of thing. So one of the reasons — I’m going to address this very quickly, somebody asked why you want to use certificates rather than private keys. And part of the reason is because of this extra metadata that you can associate with certificates. You can add extensions on, and you can add list of principles you should be allowed to log in as, and arbitrarily, complex extensions to those as well. And when you have a big estate, that can be used to drill down access to a really fine level. So that’s one of the good reasons to do that.
Gus: So I’m going to show you logging into a Teleport cluster and kind of how that works. Teleport does support local users. You can add them into Teleport itself. But it also supports SSO providers. So the open source version of Teleport supports GitHub. I’m using Auth0 here as an example of this sort of thing. It works equally well with GitHub. It’s just configured slightly differently. So if I sign in with Auth0, I pick that I want to sign in with Google, and then pick my Gravitational identity, we’ll log me into the cluster. So we’ve got here a cluster setup. We have the Panther demo cluster. We have two nodes. We’re running Teleport version 4.3.2 here. And if I go to log on to a node, we can see a list of nodes that I have access too. These nodes are in EC2, so they have EC2 style host names. We can see the internal IP address of each of them and the port 188.8.131.52 where the Teleport node service is running. So when I drop down on the connect menu, we can see that my Teleport role here — when I log in, my role is giving me access to EC2 user. So if I click on this I can start a session as EC2 user. And I can run anything that I would want to via the terminal here and access into the machine. So I can look at the contents of the Teleport directory. I can look at logs. I can look at anything else that I need to do.
Gus: And each of these events gets stored within Teleport as a kind of JSON-formatted event with the Teleport user ID. So this is my email address from my Google account. So we can see exactly which user took these actions, what time it was. What was the source address? Where did they connect to? What port did they connect to? What user were they logged in as at the time? I was logged in here as EC2 user. What was the “pid” of the process that they used? What did they run? What host was it on, that kind of thing. So that’s very much the kind of principle. Teleport logs all of this information, but as you can see kind of with the way that this goes currently, there’s a lot of entries in this audit log, and it can be quite hard to distill them down, and it can be kind of hard to see. You can see a list of interactive sessions. How long did they last? If I go to play one back, for example, I can see exactly what happened during a given time. But this information by itself can be pretty hard to digest, and it can be pretty tough to figure out exactly what’s going on. So one of the things that you can do is you can obviously push these audit logs off to somewhere else, so that’s where something like Panther would come in, for example, Jack.
Ev: Machines do the work.
Gus: Very much so.
Jack: I do have a confession. It was actually me running in there.
Gus: [laughter] Well, thanks for admitting to it.
Jack: I’m the culprit. Let me share my screen, and we can dive a little bit into that.
Ev: So these logs that Gus just showed, generated by Teleport, all the user activity is being pushed to AWS S3, and from there, it goes to Panther. And Jack is now going to show us what you can do with all that audit information.
Jack: Yeah. So it’s a common pattern to take this data from all of your systems. So the benefit with something like Teleport is that you’re gaining that centralization out of the box which is amazing. Because that in itself is a really hard thing to do. Because you think about traditional aggregation of Syslog data, that’s just very complicated and very difficult to scale. So this is already a huge improvement. So we can browse the log like Gus was saying in the browser. We can see the JSON of what I did. And this is helpful, but we want to — what if we wanted to get a Slack ping when this happens, right? This is where something like Panther would come in. So I’ll switch over to the Panther dashboard that I have, and Panther has a couple different components. One is around log analysis which is relevant for this conversation. And this dashboard just gives us a sense of how many alerts have been generated in the last couple days, and on the event side, how many logs have we been getting from our sources? So you can see Teleport is listed here, and then if we just hover up here, we can get a sense of like hour-by-hour, how much data we’re getting. What we can do with Panther is we can actually write rules in Python that analyzes JSON and look for certain patterns in those fields. And the way that we’re able to do that is by parsing out all the logs as they come in.
Jack: Luckily, Teleport uses JSON, but not every log format does. So this step of normalization and conversion into maps and things like that are helpful for connection purposes but also for the sustainability of search. So when this data comes into Panther, we actually put it into a very structured data warehouse. And this data warehouse can scale to terabytes, petabytes, etc. It can really grow with the demands of any customer. So for the purpose of detection, we have some Teleport audit logs — rules, rather, that we’ve written that are also open source, and let’s look at the Nmap activity. So what we’ve done is we wrote a rule that says, “Detect network scans.” And those unfamiliar. There’s multiple options when it comes to network scanning. You can use Nmap. You can use ARP scan. You can use other tools. You can download from the internet, and we want to be able to detect all these with Panther. So what we have is if we click this edit button, we’re taken to our whole editor which gives us a way to control how the rule behaves. So this is what log types do we want the rule to act on? Do we want to do thresholding? Do we want to [inaudible] some types to this rule? There is a lot of settings that the [inaudible] here. And then we have an actual Python library in the browser. And so the engineers were thinking, “Oh well, I don’t want to put code in the browser.” That’s perfectly fine. We actually have a way to programmatically upload so you can do your development within your CI/CD pipelines. So we support both. And there is a lot of benefits, actually, to having it in the browser, especially if you have to make a really quick fix or if you just want to do some experiments.
Jack: And then under that, we actually have a way of testing that our rule really works with production. And for those who have done DevOps in the past and are familiar with Chef and Puppet, this is a very similar idea, or just more generally this is best practice in software engineering. So what we can do is we can create these sample Teleport events. Let’s say if I run Nmap with all these arguments, then our rule should fire, and our rule is really just looking at these commands right now, fping, Nmap, ARP Scan. We could adjust this to look for curl, things like that. At the end of the day, the goal is really to prevent a huge front alert. So we always try to have these tests and have these capabilities that leads to the most accurate types of detections. And we could test these in the browser. We can say, “Cool, if someone runs Nmap in a Teleport post, then we’ll get an alert,” which is helpful. And then we have a way of configuring based off of the severity of that rule. Right now, it goes to Slack. And again, this is an open source rule, so if you went over to this archetype right now, went through analysis, you could see the Teleport rules that we support, but let’s actually take a look at one of the generated alerts.
Jack: So when I ran that Nmap a second ago here, so I did Nmap on docs that run Panther.io which is just some documentation. And then we see there’s a couple of reports. We see it resolves to actually GitBook, so you could figure out that’s the whole reason for our hosting site, for our documentation. And then the analyst or the engineer gets a run book of how to really handle this alert. You also get a status associated with it, and if you’re working on this [inaudible] if it’s a false positive, you can close it out, etc. And we see the actual alerts that triggered this alert to happen. So we see the arguments of docs that are on Panther. And I always see nmap/bin/nmap executed this. And then we see some other commands that are prefixed with P_, and these are enrichments that we’re actually adding to all the logs coming in, and for certain log types, you might see P domain names or P IP addresses, things like that. We’re actually pulling out that content from anywhere within the record and enabling people to do investigations on those indicators that compromised systems and that, which leads me to the next part, which is, once we’ve done collection and detection, the file part is really investigation. So our open source really supports those first two really, really well. And then for investigation, we have an enterprise version of Panther that allows you to run SQL in the browser over all this data. So if you have been collecting your Teleport logs for the last five years, and you want to search back to a very specific point in time, we have a very robust data warehouse that lets users do that at scale without running into problems of waiting for days or hours for results.
Jack: So when you click this View and Data Explore button, it’s only two events in this case, but we get a way to browse it in SQL. So now, I have this SQL editor, and I can select different fields. I can do DISTINCTs. I can do filters, and anything else here, and this is auto-populated, this query, right when you click this button. And then the final thing is we can take this a step further and say, “This is great to understand what happened at the time of alert. But what just happened more generally on that box?” So we can look at the actual raw logs which is everything from Teleport, and then we can essentially get a trace of all the commands that this user did, which is me, within a very relevant time period, which is super important for an investigation. So this is the activity within that window, and if I just look for Nmap, I can see this what led me to do Nmap, but I can also see that I did other things too. I did a wget for some bitly address, which is really suspicious, right? So as an investigator, you want to know what else did this attacker do that we didn’t catch. And the fact that we have this such rich data from Teleport is really helpful in answering these critical questions, and it really plays into this just greater security idea of taking in as much context as possible. So yeah, that’s essentially it from the Panther side.
Ev: Awesome. Awesome. Thank you, Jack. So let’s go over those takeaways that we want everyone to remember from this presentation, and then we’ll jump straight to Q&A. So what Gus and Jack are telling us is that on the access management side, you have to use SSH certificates. Do not use keys. And we already have one or two questions here, like clarifying, “Why exactly certificates?” Well, the short answer, because certificates have metadata, and metadata allows you to have role-based access control. Key is a binary thing. And by the way, no, standard SSH keys do not have expiration date. X.509s — I’m sorry, that’s certificate, pardon. The second thing, you have to be using proxies for SSH. Also, sometimes, they’re called bastion hosts or jump hosts. Do not allow people to connect to machines directly. And, finally — and I, personally, think it might be the most important one, too — is that you have to use multi-factor authentication with single sign-on. Teleport doesn’t do, really, authentication and production. That’s not how we recommend a — you have products like Okta. You have Active Directory. You have Duo security. Those things are connected to your database with all of your users, and they will tell Teleport if someone is allowed to log in. Teleport just simply issues the certificates. So that is how SSH needs to be done. Your engineers need to be using SSO.
Ev: Now, Jack’s advice on detection and response comes down to two things: collect access and activity logs into a centralized place from all of your infrastructure. And don’t just collect access, collect everything from your entire applications stack, because that gives you this kind of three-dimensional view of what is actually happening. And look, it is going to be a lot of data, so essentially, what Panther is doing for you here is solving the scalability problem for managing audit information. And the second piece, which is equally important, once you have all this data, don’t just sit on it; define rules and set alerts that allow you to take action when something is happening that you don’t want to happen. So keep this in mind. You will have access to this slide later on. You can download it later.
Q & A
Ev: But now, let’s take a look at Q&A. I see there is a lot of duplicate questions here. Let me compress some of them, and I may be able to answer some as well just to save time. But then some of these will have to go to our experts. So first, I see two very similar questions. One is, “How does Teleport work with Duo second-factor authentication?” Also, “How does Teleport work with government authentication cards, like CAC cards?”
Ev: The answer is that you already have, in both situations, an identity manager with a sign-on. Even with CAC cards, you have Okta. I know government uses Okta, so let users go in through standard Okta process using second-factor, and then you configure Teleport to point to Okta, so users will get SSH certificate issued based on the identity that Okta is managing. And the same is true with Duo security. You configure Teleport to get identity from Duo. Another question here, and I think we answered this again, about why exactly people should not be using public/private keys. Again, they have no metadata, which doesn’t allow you to have meaningful role-based access control which prevents you, actually, from implementing principle of least privilege because that’s just a key. And the second thing is, keys do not expire. Standard SSH keys do not have expiration. Now, let’s look at more detailed questions for our experts. Jack, does Panther have a plugin for Elasticsearch?
Jack: Was this for us, or for —?
Ev: It’s for you.
Jack: Okay. No, we don’t. So our backend is actually based entirely off of AWS server list. But we actually also have a plugin for Snowflake. So if you’re a Snowflake customer, all of your Panther logs can feed into your Snowflake cluster.
Ev: Thank you. Gus, does Teleport log with a different credential so the same authority cannot delete logs?
Gus: Yeah. Absolutely. You can do that with Teleport as well. So if you’re using — for example, if you’re writing logs into S3 or to another kind of centralized storage provider, or using DynamoDB to store them, or Google Firebase, anything like that, you can absolutely — because we’re using the AWS SDK, Teleport’s written in Go. And the AWS SDK for Go is what we use to do authentication. So we follow the standard AWS authentication method, the credential chain. So we read from .aws/ credentials in your home directory. We can read from a machine-level EC2 role, for example. So if you have a role which gives a machine-only permission, you could create, for example, a write-only role for your off servers. So all your Teleport logs get centralized on your off server, and then you give the off server only permission to write to a given bucket. So it just pushes the logs out to S3. It can’t even see what’s in the bucket. It can’t modify them once they’re there. It can write, but it can’t append. It can’t modify. It can’t delete. It can’t do anything. So for kind of a non-repudiation standpoint, from a kind of, “We want to make sure that these logs get there,” and then they can’t be tampered with, yeah, you can absolutely configure Teleport to do that as well by giving it credentials which are only allowed write to a given bucket and not read or delete or anything like that.
Ev: So just a new question popped up. Can they handle SCP with Teleport and audit that information with Panther? The answer is absolutely. scp is a standard SSH command. So basically, the demo could have used the SCP, and you’ll have seen that. Also, I have a very interesting question. I like this a lot because it points to the future that humans should not be tinkering with infrastructure, or robots should be managing themselves. The question was asked during Jack’s portion of the demo, but I think it applies to both products. What’s the recommendation? How do we handle privileged or service accounts? How do we audit them? How do we provide credentials for them? So who wants to answer this one?
Gus: I mean, I can absolutely take that. So from a Teleport side, yeah, we absolutely support the use of kind of service accounts. For example, if you want to delegate the right — say you have a CI server like Jenkins or Drone, and you want to allow Teleport to have credentials to log into machines through there, but again, using a limited privilege role, that kind of thing, you absolutely can do that as well. So you can set up a kind of limited role which only allows a Jenkins user to have access to a small subset of machines, for example, and then you can issue either a slightly longer [inaudible] certificate, something a week or 24 hours or however long, as long as you like, but you can then pass that to Teleport, or you can pass that to your CI server, Jenkins, Drone, whatever, or any other automated service that you want to use to log in to host, and it can use that. It comes out just like a private key file, but with a certificate attached to it. So you’re essentially using that embedded identity to then connect as the automated process or something. And again, it’s subject to the same auditing and the same requirements as anything else that goes through Teleport. So you can log the sessions. You can log the commands that were run. You can see everything that was done, and those go into the audit log the same as an interactive — the same way that an interactive session would, and then they could be exported out to another tool like Panther and then have analysis run on them as well in the same way.
Gus: So we’re kind of, again, with a kind of centralized model, put all your connections through one place. So my point two earlier. We kind of encourage that model as well for CI and processes like that. And you can issue those tokens or credentials as often as you like as well. So we would actually recommend rather than issue a key and make it valid for a month, for example, I would recommend you set up a process to issue that every 12 hours, every 6 hours, every whatever and just refresh it in some kind of credential secret storage and have the CI server pick up the credentials whenever it needs to use them and go and write them, and then they’ll be invalid after that period of time. That’s definitely a best practice in line, again, with point number one that I spoke about, the idea of using certificates rather than keys. Have a short-lived certificate and use that for authentication. Definitely much more secure that way.
Ev: Yep. Thank you, Gus. Also, there was a question, if Teleport is AWS-dependent. And I think it probably applies to Panther as well. So Teleport is actually a single binary. It’s literally just one file you download. You can run it on any infrastructure you want. Jack, what about Panther?
Jack: Yeah. So Panther is built on top of AWS server list tech, so it’s very easy to deploy as well. So we do everything in cloud formation, and what you get is this massively scalable first platform out of the box. It gets set up in about 20 minutes.
Ev: Awesome. Jack, there was another question for you here.
Jack: Yeah. I was actually going to say, there’s a question clarifying around how logs get into Panther.
Jack: Yeah, I can give some detail on that. So to the point of — I think in the more traditional sense, the way log aggregation used to work is you’d have Syslog systems go to aggregators which would go to something like Splunk or Elastic, right? But in a more cloud-native world, that doesn’t really exist the same way. So because we’re using tools like Fluentd and we can write directly to cloud services, the idea of a log server doesn’t exist anymore. So we put it into things like S3 or SQS or EventBridge or we just pull the data directly from the API. And the way that that’s set up is with really granular access, so we create IM roles in these accounts that have the data that only give Panther access to that specific data needed. And then, as needed, it’s written into the buckets or into queues, we pull it out, we normalize it, and then it goes into the Panther data lake. And then also, on the theme of who’s privileged, Panther, in our enterprise version, we have RBAC. So what you can do is you can create custom roles that give very specific privileges to your users. So this is really helpful if you want to share the Panther dashboard out to other teams who may have caused those alerts. So it allows multiple teams, and you just get value, not just security teams.
Ev: Yep. Yep. And we have one last question that Gus would like to answer. The question goes like this, “Do any recommended port trigger or knock from within a server-side to open/close door on SSH for larger periods of time?”
Gus: Yeah. So I mean, port triggering or port knocking is an interesting concept. I think my personal opinion — and I mean, you take this with a grain of salt. But my opinion of this would be, in an environment where you have a traditional SSH server which might allow access via passwords or similar, then port knocking can be very valuable. Because obviously, if you have an SSH server open on the internet, anyone knows you’re SSH server will go live, and instantly, you’ll have people trying to log into it, trying to spam password attempts, trying to do all those kind of things. And there’s various ways around that: implement firewall blocks, have things like Fail2ban, DenyHosts, those kinds of things. With something more like Teleport, where you kind of pushing all your access through a bastion server, and that bastion server only allows authentication using certificates, I don’t think the port knocking is as important as it would be in a password-based environment because it’s all very well — I mean, people can knock on your server all they like. I think as long as they don’t have a valid certificate, they’re never going to get anywhere.
Gus: Now, this doesn’t obviously account for things like zero days. It doesn’t account for vulnerabilities which are discovered within either open SSH or go SSH libraries or similar. But I think, if port knocking is something that you can do easily, I mean, you generally need to install extra software to be able to do it. You need a port-knocking tool or some kind of utility. Teleport doesn’t support it out of the box. It doesn’t have an ability to turn it on and let you use it. So it could be a tool that you use. But my personal opinion, honestly, would be, I’m not sure it’s really so important when you have a system that only allows authentication by a certificate. That’s a lot of data that you’ve got to try and get to be able to get access to anything, and I think it’s relatively good without it.
Recommended Next Steps
Ev: Thank you. So thank you, guys. And thank you, audience, for your fantastic questions. Looks like we are about done. So let’s thank our experts again for showing us the light, showing us how to do this properly. Everyone else, again, thank you for attending. And if you’d like to learn more about doing SSH properly, please check out our blog at gravitational.com/blog, and Gus’s famous article is on your screen, the URL. You can visit. It’s right there. Also, to learn more about Panther, go to https://runpanther.io. And on behalf of Panther and Gravitational, enjoy the rest of the day while you shelter in place and be safe.