{"_id":"55ca5587f3fb961900ef6672","initVersion":null,"user":{"_id":"55c50f4a7c199a2f00665cbf","username":"","name":"Buchi Reddy Busi Reddy"},"__v":0,"project":"55c505b41469ad2500fa2ab7","hidden":false,"createdAt":"2015-08-11T20:05:27.952Z","fullscreen":true,"htmlmode":false,"html":"","body":"[General](http://docs.neptune.io/page/frequently-asked-questions#general) \n[Getting Started](http://docs.neptune.io/page/frequently-asked-questions#getting-started)  \n[Data Model](http://docs.neptune.io/page/frequently-asked-questions#data-model)  \n[Rules](http://docs.neptune.io/page/frequently-asked-questions#rules) \n[Triggers](http://docs.neptune.io/page/frequently-asked-questions#triggers) \n[Actions](http://docs.neptune.io/page/frequently-asked-questions#actions) \n[Agent](http://docs.neptune.io/page/frequently-asked-questions#agent) \n[Security](http://docs.neptune.io/page/frequently-asked-questions#security) \n[Notifications](http://docs.neptune.io/page/frequently-asked-questions#notifications) \n[Import Export](http://docs.neptune.io/page/frequently-asked-questions#import-export) \n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"General\"\n}\n[/block]\n## **What is Neptune?**\n\nNeptune offers remediation-as-a-service for your DevOps Teams. It fixes server and application alerts automatically for all your cloud or on-premise servers. Think of it as If this Then That for DevOps.\n\nNeptune seamlessly integrates with your existing monitoring and alerting tools like Nagios, NewRelic, PagerDuty etc., and lets you run automated actions in response to alerts. The actions can either automatically fix the problems, auto-scale #servers, or just collect diagnostics and snapshots to help you troubleshoot alerts faster. Previously, our founders architected auto-remediation tools for AWS to manage thousands of servers. Now, we are bringing one such tool for everyone. \n\n## **What can I do with Neptune?**\n\n Neptune makes alert remediation fun and easy! \n\n You can start by simply hooking up your monitoring tool and installing the agent on a few servers. To remediate an alert, you can either pick and customize an industry best practice runbook template from our public runbook library or write your own custom remediation script. \n\n Here are some sample use cases:\n * In response to a disk full alert, Neptune can automatically run a disk cleanup script.\n * In response to a high throughput or a CPU alert, Neptune can automatically scale up servers or worker dynos to handle increased load \n * In response to a high memory alarm, Neptune can automatically capture top snapshot, and a JVM thread dump\n * In response to an application error rate alert, Neptune can automatically email the  runbook to the on-call engineer on duty\n\n## **What are key benefits of Neptune? **\n\n1. Avoid outages and increase your uptime\nNeptune can fix problems even before your customers notice them. In fact it fixes the alerts even before it reaches your on-call engineer 50% of the time. So your uptime  for apps and infrastructure increases drastically (This is how Amazon and Facebook achieve those 99.99999% availability levels)\n\n2. Happier engineers \nEngineers are your most valuable resource. You can’t retain them if you make them work on mundane maintenance and alert fixing work at midnight. With Neptune, you can avoid those 2:00 AM wake up calls for your engineers so that they stay productive and focus on right things. \n\n3. Improve IT Operations efficiency and streamline your IT operations\nEnsure knowledge doesn’t stay in the mind of a single engineer. Codify the remediation steps in a runbook, so that even a new engineer knows how to fix alerts on day one\n\n## **How do I know if I need a product like Neptune?**\n\nYou can signup for a free trail and get a super-quick FREE alert analytics report. This report will help you understand \n\n1. Your alert patterns and statistics\n2. Top alerts that are occurring most frequently\n3. How much time is your team spending on resolving alerts (MTTR)\n\nEven if you get less number of alerts, you might be wasting your valuable engineering resources to fix alerts at midnight and on weekends. Secondly, your apps usually go down at critical moments when you can’t afford to be down for 1-2 hours. So you should think about automating with Neptune to increase your uptime and make your engineers more productive.\n\n## **Who will benefit from Neptune?**\n\nAny DevOps team managing a single server or thousands of servers. \n\nSpecifically, it’ll be most beneficial if you already have an Ops team in place for on-call rotation or getting a ton of alerts per week.\n\n## **How do you run remediation actions? Do you use SSH or Agent-based approach?**\n\nWe don’t require SSH-Access, instead we use agent-based approach. \n\nWe offer two action channels:\n1. API actions:  Calling cloud APIs via REST calls or CLI scripts\n2. Agent based actions: We require you to install an agent on few of your servers. Agent need not installed on all your servers. Agent by default will run as a regular user (not as root user). You can 1. Change the run-as-user for the agent 2. Configure specific permissions of the run-as-user 3. Put the agent in specific user group. See security FAQ.\n\n## **What kind of actions can I perform with Neptune?**\n\nNeptune supports three types of actions:\n1. Auto-fix-it actions: Automatically run a disk cleanup script in response to a disk full alert, automatically scale up dynos whenever the load is too high.\n\n2. Collect snapshots or diagnostics actions (for root cause analysis): In response to high memory alert, automatically capture snapshots thread dumps, heap dumps, top or process-level snapshots, memory graph snapshots from monitoring tools\n\n3. Email the playbook the engineer on duty action: Streamlines IT operations and makes on-boarding new engineers fast and easy since there is a runbook defined in place for every alert.\n\n## **Does Neptune replace my existing monitoring or alerting tools?**\n\nNo - Neptune operates like an add-on. It doesn’t replace your existing monitoring or alerting tools. Instead, it is aimed to integrate nicely with those tools to give you a seamless experience.\n\n## **What kind of infrastructure can I manage with Neptune?**\n\nNeptune can manage any server infrastructure, whether it’s on cloud or on-premise (Linux or Windows). \n\nIt seamlessly works with many infrastructure providers including AWS, Rackspace, Heroku, OpenStack, Azure, Digital Ocean, Linode, Google Cloud, and SoftLayer.\n\n## **What monitoring tools do you currently support?**\n\nWe currently support NewRelic, DataDog, Nagios, Pingdom, Sensu, AWS CloudWatch, Scout, SignalFx, Zabbix and PagerDuty. \n\nWe are constantly adding more, contact us at [support](mailto:support@neptune.io) if you don’t see your tool listed here.\n\n\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Getting Started\"\n}\n[/block]\n## **How do I get started?**\n\nIt’s incredibly simple to setup and you can get started in less than 15 min. \n\nStep 1. Add your monitoring tool’s read-only API keys\nStep 2. Install a lightweight agent on few servers (need not be all servers).\n\nThat’s it! See getting started guide. \n\n## **I’m interested; can I get a quick demo?**\n\nSure - send a quick note to [support](mailto:support@neptune.io). We’ll be happy to show you a demo. \n\n## Can I get someone on your team to help with my setup? \nSure - send a quick note to [support](mailto:support@neptune.io). One of our team members will be happy to help. \n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Data Model\"\n}\n[/block]\n## ** What is a Rule? **\nRule is composed of a trigger and an action. Neptune will automatically perform the configured action whenever trigger is raised. \n \n## ** What is a Trigger? ** \nTrigger is usually an alert coming from your existing monitoring tool. We also support custom triggers like webhooks or scheduled cron triggers. \n\n## ** What is an Action? ** \nAn Action is executed when a trigger gets raised as defined in the rule. We currently support various actions types including Execute Script Action (on a single host or cluster of hosts), REST API Action, CLI Action, and Email Runbook actions.\n\n## ** What is an Agent? ** \nAgent is a piece of software that sits on a server and is responsible for executing scripts on that server and sending the results back to Neptune. Agent doesn’t require opening any incoming firewall ports – it only requires outbound connection for port 443. \n\n## ** What is an Alarm? ** \nAlarm is a violation condition defined in monitoring tool that requires some human or machine intervention. For example, disk exceeds 99%, memory is 95% etc. Monitoring tool automatically generates an incident whenever alarm is triggered. \n\n## ** What is an Incident? ** \nIncident is an instance of an alarm configured in monitoring tool. \n\n## ** What is a Runbook? ** \nRunbook (aka playbook) is a series of prescriptive steps to remediate an incident. \n\n## ** What is an incident dashboard? ** \nIncident dashboard shows all the open incidents and interesting analytics about resolved incidents. The analytics include alerts 1/ that are occurring most often, and 2/ that are causing most damage aka long resolution times (MTTR).\n\nFor open incidents, it provides an option to troubleshoot or fix the open incidents in real time. For resolved incidents that repetitive and are taking long time to fix, it provides a single click option to automate them.\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Rules\"\n}\n[/block]\n## **Can I turn off a rule?**\n\nYes – you can turn off a rule in rule dashboard. Similarly, you can turn it on again. If the rule is turned off, Neptune still captures the alarm but it won’t perform any actions (so you’ve alarm history at your disposal).\n\n## **Can l limit the number of times an action is performed for a given rule?**\n\nYes. This avoids too much automation concern where Neptune is not really fixing the problem. Sometimes if not prevented, this could cause more problems. Neptune allows you to configure these action limits in the rule configuration. Once the limits are reached, the default behavior is to escalate to human. \n\nRule action limits are applied at the rule + host level. Let’s assume you’ve a disk space rule that cleans up disk for all your web servers. Assume you’ve set up limit as up to 3 actions over 30 min. This means Neptune never executes an action more than 3 times in last 30 min for a single host, while it could execute the same action more than 3 actions in 30 min on distinct hosts. \n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Triggers\"\n}\n[/block]\n## **What types of triggers do you support?**\n\nTrigger is typically an alert coming from monitoring tool. \n\nWe also support webhook triggers and cron triggers. Webhook triggers allow you to send custom JSONs. \n\n## **What is a webhook trigger?**\n\nWebhook triggers allow you to send a custom JSON as a part of URL. For example, you could send a simple HTTP post with JSON body to a URL to trigger a Neptune action. This allows you to standardize all such repetitive operations in one place. \n\nFor example: \nCalling below URL could perform restart apache action on that host. \n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"curl -X POST http://www.neptune.io/trigger/channel/webhook/api_key/trigger_guid \\n-d '{ “hostname”: “hostxxx”, “application”: “apache-webserver” }'\",\n      \"language\": \"shell\"\n    }\n  ]\n}\n[/block]\n## **What is a Cron Trigger?**\n\nCron trigger allows you to run scheduled actions as a managed service. So, your cron scripts are not isolated to single host instead they can be run on any machine in a cluster. This is dead simple option to eliminate single point failure for your cron jobs. For example, you can run database backup job every day at midnight on any one of the database nodes. \n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Actions\"\n}\n[/block]\n## **What types of actions do you support?**\n\nWhenever trigger is raised, Neptune runs the configured action. \n\nWe support various types of actions: 1/ Execute Script on a single host or cluster of hosts (Auto-fix-it or Collect snapshots) 2/ REST API call 3/ CLI action 4/ Capture graph snapshots actions 5/ Email runbook action.\n\n## **What is an execute script action?**\nWe support various types of actions:\n\nExecute Script: run a script using the Neptune agent. We support two types of execute script actions: \n\na)  Auto-fix-it action: e. g:  Perform disk cleanup automatically whenever disk is full\nb) Collect snapshots/diagnostics action: e.g : Capture thread dump whenever memory utilization is too high\n\n## **What is a REST API action?**\n\nNeptune supports some basic REST API actions out of the box. \n\nWe currently support AWS EC2 API actions including stop/reboot/terminate/start your AWS EC2 instances. Azure integration is in progress. \n\n## **What is a CLI action?**\n\nNeptune lets you run the typical CLI commands or scripts in response to a trigger. We currently support AWS, and Heroku. Further support is being added for Digital Ocean and other cloud providers. For example, whenever the load is too high, you can write a custom auto scaling CLI action and Neptune will run that action automatically \n\n## **What is a graph snapshot action?**\nNeptune can automatically capture graph snapshots for various metrics from your monitoring tool whenever an alert is raised. For example, whenever there is high memory alarm, Neptune can automatically collect snapshot graphs from NewRelic for three most important metrics that you use to troubleshoot the problem faster: 1. Memory utilization 2. CPU load 3. Error Rate. So, you have all these graphs at your disposal right after the alarm is triggered.\n\n## **What is an Email Runbook action?**\n\nEmail Runbook action will just send an email runbook or playbook to the on-call engineer to resolve the alert. This bridges the gap between Dev and Ops teams and makes onboarding new engineers fast and easy because they don’t have to chase a bunch of wiki pages to figure out how to resolve an alarm. It’s already in their inbox as soon as an alert is fired.\n\n## **Where are Runbooks stored?**\n\nTwo options: 1. Store in Neptune 2. Store in your own github repo. \n\nIf you are looking for simplicity, you can store runbooks on our platform. \n\nIf you are conscious about security, you connect Neptune with your own private github repo with read-only access to runbooks.\n\n## **Do you support versioning for Runbooks?**\n\nYes – simply create a github repo and give us read only API key to read those run books. You not only have versioning, but you can do code reviews and all the fun stuff! \n\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Agent\"\n}\n[/block]\nPlease see [Agent FAQs](doc:agent-faqs)\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Security\"\n}\n[/block]\n## **How secure is Neptune?**\n\nNeptune is secure by default. \n\nFirstly, we don’t store any customer’s data except for metadata related to alerts or incidents that we receive from your monitoring/alerting tools. We understand that sometimes alerts and remediation scripts may contain sensitive information about your IT Operations. We take security very seriously and leverage industry standard best practices to protect your data. For example, all the communication happens only on an encrypted SSL channel. \n\nWe don’t require SSH-access to your servers; instead our architecture leverages an agent-based approach. By default agent runs as a regular user (not root user), and you have the full power at your disposal to control exactly what commands or actions can be executed by the agent. We make this process simple and easy. \n\nFinally, each customer has a dedicated action queue, which no one else has access to. Neptune sends actions to action queue, and agents running on your server will execute an action if an action is tagged for a particular host. Any communication between agent, and Neptune is authenticated via API key and happens only over an encrypted SSL channel. Our agents don’t require you to open any ports in your firewall, and they only perform outbound connections. See security FAQ for more details. \n\n## **How reliable is Neptune?**\n\nBuilt by a team that architected and operated auto-remediation tools for Amazon Web Services, Neptune is designed to be highly scalable, available and reliable. We understand that we kick in when you are experiencing problems or outages. With that in mind, we’ve designed our architecture to be completely fault-tolerant so that we never miss any alarm. We also ensure there are no single point failures in our system by introducing several levels of redundancy in our compute, storage, database and queuing layers.\n\n## **How do I control access to actions performed by Neptune? **\n\nYou’ve several knobs at your disposal:\n1) Restrict agent to run as a specific user (you get this by default)\n2) Add Neptune agent to specific group to give more permissions\n3) Run the agent itself as a different user that you already control\n4) Restrict the agent to only run specific commands \n\nSee agent administration documentation for more info. \n\n## **What will do you with my read-only API keys from integration providers?**\n\nFirstly, we just need read only API keys. In most cases, we use these keys to perform REST API calls to load your existing alarms and incidents only. This will allow us to provide seamless integration experience with monitoring and alerting tools that you are already using.\n\nIn cases where we need more permissions (AWS IAM access keys), you can give controlled permissions to that application only. For example, if you want to do auto scaling, we will need keys just to perform auto-scaling actions on those services. But if you want to run custom scripts for automating alerts, read-only keys are more than sufficient.\n\n## **Is there a security white paper? **\n\nYes – contact us at [support](mailto:support@neptune.io)\n\n## **Is it possible for an attacker to perform DDoS attack on webhook trigger endpoints? **\n\nWe’ve throttling mechanisms in place in our platform on a per-customer basis so that an attack won’t impact other customer. We’ll throttle limits for anyone sending too many events into our system. In addition, we also have IP whitelisting feature, where Neptune will only webhook alerts coming from specific IP addresses. \n\n## **What happens if Neptune API key is compromised? **\n\nYou’ll need to revoke API key. We can quickly revoke the key and generate a new key for you. As soon as the key is revoked, an attacker will no longer be able to send events into Neptune.\n\nIn the worst case, he can pump in events into our system but he won’t be able to control or change what actions will be performed. At any point, you’ll have full control on what exact actions will be performed by Neptune. Only those actions specified in your own github Runbooks can be executed. Also, no one can modify the content of the Runbooks because you own the Runbooks in your repository.\n\n## **How is github based runbook integration more secure? **\n\n1. Neptune has read-only access to github repo. So, no one else except you will be able to edit your run books. \n2. At best, Neptune or an attacker will only have access to the pointer to the github runbook that needs to be executed on your severs. They will be never be run stuff like “rm –rf ” because you don’t have such runbook in your github repo!\n\nWhenever Neptune needs to perform any action, it will directly refer to a pointer to a file stored in your github repo, and content will be fetched from github repo before it’s executed. We’ll need read-only access to your github repo to access your run books. This solves problems for most customers. But if you are extremely paranoid about security, we also support use cases where entire repository is accessible only within your firewall for a few customers.\n\n## **Can you support use cases where entire runbook repository is stored and accessed within our firewall? **##\nYes – this would require custom setup. Contact us if you need this. \n\n## **Do you support SSO-based authentication? **\n\nYes – currently, we support Google OAUTH and OKTA. SAML 3.0 support is on our roadmap. Let us know this does not fit your bill.\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Notifications\"\n}\n[/block]\n## **What notification channels do you support?**\n\nWe support Email and Slack notification channels. For each channel, you can configure Neptune to send all alerts notifications, upon errors only or disable any notifications. \n\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Import Export\"\n}\n[/block]\n## **Can I import my existing alerts and incidents into your systems?**\n\nYes – it’s done automatically. As soon as you provide API key, we automatically import your incidents and show you interesting analytics \n\n## **Do you have a reporting SDK to export data out of Neptune?**\n\nNot yet - it’s on our roadmap. Let us know if you really need this, we will be happy to make something work for you!","slug":"frequently-asked-questions","title":"Frequently Asked Questions"}

Frequently Asked Questions


[General](http://docs.neptune.io/page/frequently-asked-questions#general) [Getting Started](http://docs.neptune.io/page/frequently-asked-questions#getting-started) [Data Model](http://docs.neptune.io/page/frequently-asked-questions#data-model) [Rules](http://docs.neptune.io/page/frequently-asked-questions#rules) [Triggers](http://docs.neptune.io/page/frequently-asked-questions#triggers) [Actions](http://docs.neptune.io/page/frequently-asked-questions#actions) [Agent](http://docs.neptune.io/page/frequently-asked-questions#agent) [Security](http://docs.neptune.io/page/frequently-asked-questions#security) [Notifications](http://docs.neptune.io/page/frequently-asked-questions#notifications) [Import Export](http://docs.neptune.io/page/frequently-asked-questions#import-export) [block:api-header] { "type": "basic", "title": "General" } [/block] ## **What is Neptune?** Neptune offers remediation-as-a-service for your DevOps Teams. It fixes server and application alerts automatically for all your cloud or on-premise servers. Think of it as If this Then That for DevOps. Neptune seamlessly integrates with your existing monitoring and alerting tools like Nagios, NewRelic, PagerDuty etc., and lets you run automated actions in response to alerts. The actions can either automatically fix the problems, auto-scale #servers, or just collect diagnostics and snapshots to help you troubleshoot alerts faster. Previously, our founders architected auto-remediation tools for AWS to manage thousands of servers. Now, we are bringing one such tool for everyone. ## **What can I do with Neptune?** Neptune makes alert remediation fun and easy! You can start by simply hooking up your monitoring tool and installing the agent on a few servers. To remediate an alert, you can either pick and customize an industry best practice runbook template from our public runbook library or write your own custom remediation script. Here are some sample use cases: * In response to a disk full alert, Neptune can automatically run a disk cleanup script. * In response to a high throughput or a CPU alert, Neptune can automatically scale up servers or worker dynos to handle increased load * In response to a high memory alarm, Neptune can automatically capture top snapshot, and a JVM thread dump * In response to an application error rate alert, Neptune can automatically email the runbook to the on-call engineer on duty ## **What are key benefits of Neptune? ** 1. Avoid outages and increase your uptime Neptune can fix problems even before your customers notice them. In fact it fixes the alerts even before it reaches your on-call engineer 50% of the time. So your uptime for apps and infrastructure increases drastically (This is how Amazon and Facebook achieve those 99.99999% availability levels) 2. Happier engineers Engineers are your most valuable resource. You can’t retain them if you make them work on mundane maintenance and alert fixing work at midnight. With Neptune, you can avoid those 2:00 AM wake up calls for your engineers so that they stay productive and focus on right things. 3. Improve IT Operations efficiency and streamline your IT operations Ensure knowledge doesn’t stay in the mind of a single engineer. Codify the remediation steps in a runbook, so that even a new engineer knows how to fix alerts on day one ## **How do I know if I need a product like Neptune?** You can signup for a free trail and get a super-quick FREE alert analytics report. This report will help you understand 1. Your alert patterns and statistics 2. Top alerts that are occurring most frequently 3. How much time is your team spending on resolving alerts (MTTR) Even if you get less number of alerts, you might be wasting your valuable engineering resources to fix alerts at midnight and on weekends. Secondly, your apps usually go down at critical moments when you can’t afford to be down for 1-2 hours. So you should think about automating with Neptune to increase your uptime and make your engineers more productive. ## **Who will benefit from Neptune?** Any DevOps team managing a single server or thousands of servers. Specifically, it’ll be most beneficial if you already have an Ops team in place for on-call rotation or getting a ton of alerts per week. ## **How do you run remediation actions? Do you use SSH or Agent-based approach?** We don’t require SSH-Access, instead we use agent-based approach. We offer two action channels: 1. API actions: Calling cloud APIs via REST calls or CLI scripts 2. Agent based actions: We require you to install an agent on few of your servers. Agent need not installed on all your servers. Agent by default will run as a regular user (not as root user). You can 1. Change the run-as-user for the agent 2. Configure specific permissions of the run-as-user 3. Put the agent in specific user group. See security FAQ. ## **What kind of actions can I perform with Neptune?** Neptune supports three types of actions: 1. Auto-fix-it actions: Automatically run a disk cleanup script in response to a disk full alert, automatically scale up dynos whenever the load is too high. 2. Collect snapshots or diagnostics actions (for root cause analysis): In response to high memory alert, automatically capture snapshots thread dumps, heap dumps, top or process-level snapshots, memory graph snapshots from monitoring tools 3. Email the playbook the engineer on duty action: Streamlines IT operations and makes on-boarding new engineers fast and easy since there is a runbook defined in place for every alert. ## **Does Neptune replace my existing monitoring or alerting tools?** No - Neptune operates like an add-on. It doesn’t replace your existing monitoring or alerting tools. Instead, it is aimed to integrate nicely with those tools to give you a seamless experience. ## **What kind of infrastructure can I manage with Neptune?** Neptune can manage any server infrastructure, whether it’s on cloud or on-premise (Linux or Windows). It seamlessly works with many infrastructure providers including AWS, Rackspace, Heroku, OpenStack, Azure, Digital Ocean, Linode, Google Cloud, and SoftLayer. ## **What monitoring tools do you currently support?** We currently support NewRelic, DataDog, Nagios, Pingdom, Sensu, AWS CloudWatch, Scout, SignalFx, Zabbix and PagerDuty. We are constantly adding more, contact us at [support](mailto:support@neptune.io) if you don’t see your tool listed here. [block:api-header] { "type": "basic", "title": "Getting Started" } [/block] ## **How do I get started?** It’s incredibly simple to setup and you can get started in less than 15 min. Step 1. Add your monitoring tool’s read-only API keys Step 2. Install a lightweight agent on few servers (need not be all servers). That’s it! See getting started guide. ## **I’m interested; can I get a quick demo?** Sure - send a quick note to [support](mailto:support@neptune.io). We’ll be happy to show you a demo. ## Can I get someone on your team to help with my setup? Sure - send a quick note to [support](mailto:support@neptune.io). One of our team members will be happy to help. [block:api-header] { "type": "basic", "title": "Data Model" } [/block] ## ** What is a Rule? ** Rule is composed of a trigger and an action. Neptune will automatically perform the configured action whenever trigger is raised. ## ** What is a Trigger? ** Trigger is usually an alert coming from your existing monitoring tool. We also support custom triggers like webhooks or scheduled cron triggers. ## ** What is an Action? ** An Action is executed when a trigger gets raised as defined in the rule. We currently support various actions types including Execute Script Action (on a single host or cluster of hosts), REST API Action, CLI Action, and Email Runbook actions. ## ** What is an Agent? ** Agent is a piece of software that sits on a server and is responsible for executing scripts on that server and sending the results back to Neptune. Agent doesn’t require opening any incoming firewall ports – it only requires outbound connection for port 443. ## ** What is an Alarm? ** Alarm is a violation condition defined in monitoring tool that requires some human or machine intervention. For example, disk exceeds 99%, memory is 95% etc. Monitoring tool automatically generates an incident whenever alarm is triggered. ## ** What is an Incident? ** Incident is an instance of an alarm configured in monitoring tool. ## ** What is a Runbook? ** Runbook (aka playbook) is a series of prescriptive steps to remediate an incident. ## ** What is an incident dashboard? ** Incident dashboard shows all the open incidents and interesting analytics about resolved incidents. The analytics include alerts 1/ that are occurring most often, and 2/ that are causing most damage aka long resolution times (MTTR). For open incidents, it provides an option to troubleshoot or fix the open incidents in real time. For resolved incidents that repetitive and are taking long time to fix, it provides a single click option to automate them. [block:api-header] { "type": "basic", "title": "Rules" } [/block] ## **Can I turn off a rule?** Yes – you can turn off a rule in rule dashboard. Similarly, you can turn it on again. If the rule is turned off, Neptune still captures the alarm but it won’t perform any actions (so you’ve alarm history at your disposal). ## **Can l limit the number of times an action is performed for a given rule?** Yes. This avoids too much automation concern where Neptune is not really fixing the problem. Sometimes if not prevented, this could cause more problems. Neptune allows you to configure these action limits in the rule configuration. Once the limits are reached, the default behavior is to escalate to human. Rule action limits are applied at the rule + host level. Let’s assume you’ve a disk space rule that cleans up disk for all your web servers. Assume you’ve set up limit as up to 3 actions over 30 min. This means Neptune never executes an action more than 3 times in last 30 min for a single host, while it could execute the same action more than 3 actions in 30 min on distinct hosts. [block:api-header] { "type": "basic", "title": "Triggers" } [/block] ## **What types of triggers do you support?** Trigger is typically an alert coming from monitoring tool. We also support webhook triggers and cron triggers. Webhook triggers allow you to send custom JSONs. ## **What is a webhook trigger?** Webhook triggers allow you to send a custom JSON as a part of URL. For example, you could send a simple HTTP post with JSON body to a URL to trigger a Neptune action. This allows you to standardize all such repetitive operations in one place. For example: Calling below URL could perform restart apache action on that host. [block:code] { "codes": [ { "code": "curl -X POST http://www.neptune.io/trigger/channel/webhook/api_key/trigger_guid \n-d '{ “hostname”: “hostxxx”, “application”: “apache-webserver” }'", "language": "shell" } ] } [/block] ## **What is a Cron Trigger?** Cron trigger allows you to run scheduled actions as a managed service. So, your cron scripts are not isolated to single host instead they can be run on any machine in a cluster. This is dead simple option to eliminate single point failure for your cron jobs. For example, you can run database backup job every day at midnight on any one of the database nodes. [block:api-header] { "type": "basic", "title": "Actions" } [/block] ## **What types of actions do you support?** Whenever trigger is raised, Neptune runs the configured action. We support various types of actions: 1/ Execute Script on a single host or cluster of hosts (Auto-fix-it or Collect snapshots) 2/ REST API call 3/ CLI action 4/ Capture graph snapshots actions 5/ Email runbook action. ## **What is an execute script action?** We support various types of actions: Execute Script: run a script using the Neptune agent. We support two types of execute script actions: a) Auto-fix-it action: e. g: Perform disk cleanup automatically whenever disk is full b) Collect snapshots/diagnostics action: e.g : Capture thread dump whenever memory utilization is too high ## **What is a REST API action?** Neptune supports some basic REST API actions out of the box. We currently support AWS EC2 API actions including stop/reboot/terminate/start your AWS EC2 instances. Azure integration is in progress. ## **What is a CLI action?** Neptune lets you run the typical CLI commands or scripts in response to a trigger. We currently support AWS, and Heroku. Further support is being added for Digital Ocean and other cloud providers. For example, whenever the load is too high, you can write a custom auto scaling CLI action and Neptune will run that action automatically ## **What is a graph snapshot action?** Neptune can automatically capture graph snapshots for various metrics from your monitoring tool whenever an alert is raised. For example, whenever there is high memory alarm, Neptune can automatically collect snapshot graphs from NewRelic for three most important metrics that you use to troubleshoot the problem faster: 1. Memory utilization 2. CPU load 3. Error Rate. So, you have all these graphs at your disposal right after the alarm is triggered. ## **What is an Email Runbook action?** Email Runbook action will just send an email runbook or playbook to the on-call engineer to resolve the alert. This bridges the gap between Dev and Ops teams and makes onboarding new engineers fast and easy because they don’t have to chase a bunch of wiki pages to figure out how to resolve an alarm. It’s already in their inbox as soon as an alert is fired. ## **Where are Runbooks stored?** Two options: 1. Store in Neptune 2. Store in your own github repo. If you are looking for simplicity, you can store runbooks on our platform. If you are conscious about security, you connect Neptune with your own private github repo with read-only access to runbooks. ## **Do you support versioning for Runbooks?** Yes – simply create a github repo and give us read only API key to read those run books. You not only have versioning, but you can do code reviews and all the fun stuff! [block:api-header] { "type": "basic", "title": "Agent" } [/block] Please see [Agent FAQs](doc:agent-faqs) [block:api-header] { "type": "basic", "title": "Security" } [/block] ## **How secure is Neptune?** Neptune is secure by default. Firstly, we don’t store any customer’s data except for metadata related to alerts or incidents that we receive from your monitoring/alerting tools. We understand that sometimes alerts and remediation scripts may contain sensitive information about your IT Operations. We take security very seriously and leverage industry standard best practices to protect your data. For example, all the communication happens only on an encrypted SSL channel. We don’t require SSH-access to your servers; instead our architecture leverages an agent-based approach. By default agent runs as a regular user (not root user), and you have the full power at your disposal to control exactly what commands or actions can be executed by the agent. We make this process simple and easy. Finally, each customer has a dedicated action queue, which no one else has access to. Neptune sends actions to action queue, and agents running on your server will execute an action if an action is tagged for a particular host. Any communication between agent, and Neptune is authenticated via API key and happens only over an encrypted SSL channel. Our agents don’t require you to open any ports in your firewall, and they only perform outbound connections. See security FAQ for more details. ## **How reliable is Neptune?** Built by a team that architected and operated auto-remediation tools for Amazon Web Services, Neptune is designed to be highly scalable, available and reliable. We understand that we kick in when you are experiencing problems or outages. With that in mind, we’ve designed our architecture to be completely fault-tolerant so that we never miss any alarm. We also ensure there are no single point failures in our system by introducing several levels of redundancy in our compute, storage, database and queuing layers. ## **How do I control access to actions performed by Neptune? ** You’ve several knobs at your disposal: 1) Restrict agent to run as a specific user (you get this by default) 2) Add Neptune agent to specific group to give more permissions 3) Run the agent itself as a different user that you already control 4) Restrict the agent to only run specific commands See agent administration documentation for more info. ## **What will do you with my read-only API keys from integration providers?** Firstly, we just need read only API keys. In most cases, we use these keys to perform REST API calls to load your existing alarms and incidents only. This will allow us to provide seamless integration experience with monitoring and alerting tools that you are already using. In cases where we need more permissions (AWS IAM access keys), you can give controlled permissions to that application only. For example, if you want to do auto scaling, we will need keys just to perform auto-scaling actions on those services. But if you want to run custom scripts for automating alerts, read-only keys are more than sufficient. ## **Is there a security white paper? ** Yes – contact us at [support](mailto:support@neptune.io) ## **Is it possible for an attacker to perform DDoS attack on webhook trigger endpoints? ** We’ve throttling mechanisms in place in our platform on a per-customer basis so that an attack won’t impact other customer. We’ll throttle limits for anyone sending too many events into our system. In addition, we also have IP whitelisting feature, where Neptune will only webhook alerts coming from specific IP addresses. ## **What happens if Neptune API key is compromised? ** You’ll need to revoke API key. We can quickly revoke the key and generate a new key for you. As soon as the key is revoked, an attacker will no longer be able to send events into Neptune. In the worst case, he can pump in events into our system but he won’t be able to control or change what actions will be performed. At any point, you’ll have full control on what exact actions will be performed by Neptune. Only those actions specified in your own github Runbooks can be executed. Also, no one can modify the content of the Runbooks because you own the Runbooks in your repository. ## **How is github based runbook integration more secure? ** 1. Neptune has read-only access to github repo. So, no one else except you will be able to edit your run books. 2. At best, Neptune or an attacker will only have access to the pointer to the github runbook that needs to be executed on your severs. They will be never be run stuff like “rm –rf ” because you don’t have such runbook in your github repo! Whenever Neptune needs to perform any action, it will directly refer to a pointer to a file stored in your github repo, and content will be fetched from github repo before it’s executed. We’ll need read-only access to your github repo to access your run books. This solves problems for most customers. But if you are extremely paranoid about security, we also support use cases where entire repository is accessible only within your firewall for a few customers. ## **Can you support use cases where entire runbook repository is stored and accessed within our firewall? **## Yes – this would require custom setup. Contact us if you need this. ## **Do you support SSO-based authentication? ** Yes – currently, we support Google OAUTH and OKTA. SAML 3.0 support is on our roadmap. Let us know this does not fit your bill. [block:api-header] { "type": "basic", "title": "Notifications" } [/block] ## **What notification channels do you support?** We support Email and Slack notification channels. For each channel, you can configure Neptune to send all alerts notifications, upon errors only or disable any notifications. [block:api-header] { "type": "basic", "title": "Import Export" } [/block] ## **Can I import my existing alerts and incidents into your systems?** Yes – it’s done automatically. As soon as you provide API key, we automatically import your incidents and show you interesting analytics ## **Do you have a reporting SDK to export data out of Neptune?** Not yet - it’s on our roadmap. Let us know if you really need this, we will be happy to make something work for you!