{"__v":19,"_id":"55d7c2a59510f00d007ec6ff","category":{"__v":3,"_id":"55d7c2939510f00d007ec6fe","pages":["55d7c2a59510f00d007ec6ff","55d7c45f9510f00d007ec705","55e3ddd48435a40d0025f009"],"project":"55c505b41469ad2500fa2ab7","version":"55d3b644f77e6d0d00b1b273","sync":{"url":"","isSync":false},"reference":false,"createdAt":"2015-08-22T00:30:11.015Z","from_sync":false,"order":5,"slug":"runbooks","title":"Runbooks"},"parentDoc":null,"project":"55c505b41469ad2500fa2ab7","user":"55d3eb3196dc260d00cdba70","version":{"__v":6,"_id":"55d3b644f77e6d0d00b1b273","project":"55c505b41469ad2500fa2ab7","createdAt":"2015-08-18T22:48:36.632Z","releaseDate":"2015-08-18T22:48:36.632Z","categories":["55d3b645f77e6d0d00b1b274","55d3b645f77e6d0d00b1b275","55d3b645f77e6d0d00b1b276","55d3b645f77e6d0d00b1b277","55d3b645f77e6d0d00b1b278","55d3b645f77e6d0d00b1b279","55d3b645f77e6d0d00b1b27a","55d3b645f77e6d0d00b1b27b","55d3b645f77e6d0d00b1b27c","55d3b645f77e6d0d00b1b27d","55d7c2939510f00d007ec6fe","56fac9925df15a20002972a2","56fb2f7668e1d30e00a0b672","583498d411e8af2500f6b334","58e52a180ab7b03b00f4a97a"],"is_deprecated":false,"is_hidden":false,"is_beta":true,"is_stable":true,"codename":"","version_clean":"1.1.0","version":"1.1"},"updates":[],"next":{"pages":[],"description":""},"createdAt":"2015-08-22T00:30:29.186Z","link_external":false,"link_url":"","githubsync":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":0,"body":"Our philosophy is that a runbook is the key bridge between Dev and Ops. Today developers write code, test, deploy and then move onto another project or feature.  The entire burden of setting up alarms and troubleshooting them falls squarely on the Ops teams. So, When an alert triggers, many Ops teams grapple in the dark without right context around the alert. For lack of knowledge, many a times, restarting server or process is used as the first option, and in other situations, incidents get immediately escalated to Dev teams.\n\nThe situation gets especially worse when a new engineer comes on board and he has to troubleshoot issues. Even in case of vacation-coverage situations, the on-call engineer doesn't have a good understanding of the troubleshooting process\n\nThis is where a *runbook* can immensely help :\n1. It acts as a bridge between Dev and Ops, so that both of them together setup alarms, and Dev teams provide input as to the conditions which can trigger those alarms, and which key metrics or diagnostics should be collected to better troubleshoot various incidents.\n\n2. We have spoken with over 200 DevOps teams and collected the best practice runbooks for various alert and automation use-cases. We open-sourced these runbooks in our public [github](https://github.com/neptuneio/Community-runbooks) repository. We encourage you to use them and customize to suit your needs. *Please do contribute to the public runbooks by sending a pull request anytime*\n\nIf you want to manage your runbooks on github, find the instructions here.  [Private Github runbooks](doc:private-github-runbooks) \n\n## Add a new runbook\n\nThe only way to create a runbook on neptune is while creating a rule.  In the Action part of create rule you can write the script in the editor and save it with a preferred name. This runbook will be executed whenever the rule gets triggered. The saved runbook will also be available for linking to other rules. \n\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/VNhVlcqHTCyJ79gWMONO_Screen%20Shot%202015-09-02%20at%2016.29.30.png\",\n        \"Screen Shot 2015-09-02 at 16.29.30.png\",\n        \"1389\",\n        \"700\",\n        \"#824c06\",\n        \"\"\n      ]\n    }\n  ]\n}\n[/block]\n## Link an existing runbook\n\nYou can link an existing runbook to a rule by simply selecting it from the drop down.\nYou can make changes to the script and save a copy or overwrite the existing file.\n[block:image]\n{\n  \"images\": [\n    {\n      \"caption\": \"Select public runbook\",\n      \"image\": [\n        \"https://files.readme.io/uyTnYzPFQ9WOF23OcvdB_select_runbook.png\",\n        \"select_runbook.png\",\n        \"1378\",\n        \"733\",\n        \"#3b74d3\",\n        \"\"\n      ]\n    }\n  ]\n}\n[/block]\n## Versioning Runbooks \n\nBy leveraging source control systems like Github, you'll get the added benefit of versioning your runbooks - so you can sleep peacefully that your engineers are only executing *only vetted runbooks* as opposed to adhoc runbooks. \n\nNeptune integrates with your existing runbook repositories from Github or Github Enterprise repos. To load runbooks from Github, refer he[](http://docs.neptune.io/docs/private-github-runbooks)re.\nTo load runbooks from Github Enterprise, refer [here](http://docs.neptune.io/docs/private-github-enterprise-runbooks). \n\n## Language support\n\nThe runbooks can be written in any language including shell, bash scripts, python, ruby, perl, powershell etc. As long as the host on which agent is installed can execute those commands, neptune agent can execute those scripts. \n\nWhen writing scripts in a specific language, make sure you start the first line of the script with an appropriate shebang. For example, at the beginning of the script, enter something like #!/bin/bash, #!/bin/python etc. \n\n## Using variables in Runbooks\n\nNeptune exposes some variables that can be used in your runbooks. It parses the alerts from various monitoring and alerting tools and makes certain variable available by default. While we don't expose all the variables from the incoming incident, we expose most of the popular fields as variables. If you don't see a variable that you are looking to use, just let us know. We'll be happy to expose that as we are constantly adding new fields here every now and then. The key idea is that once a variable (VAR) is exposed, you can use variable inside the runbook as \"$VAR\" and $VAR will be replaced with the appropriate value. \n\nThe variables exposed by Neptune varies by the monitoring tool. The below provides exhaustive list of variables exposed by Neptune.\n\n**NewRelic**\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/fe61d78-Screen_Shot_2017-03-08_at_3.50.10_PM.png\",\n        \"Screen Shot 2017-03-08 at 3.50.10 PM.png\",\n        2382,\n        600,\n        \"#ececec\"\n      ]\n    }\n  ]\n}\n[/block]\n**Librato**\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/ede79f2-Screen_Shot_2017-03-08_at_3.50.47_PM.png\",\n        \"Screen Shot 2017-03-08 at 3.50.47 PM.png\",\n        2380,\n        766,\n        \"#ececec\"\n      ]\n    }\n  ]\n}\n[/block]\n** PaperTrail ** \n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/a35c5e2-Screen_Shot_2017-03-08_at_3.54.18_PM.png\",\n        \"Screen Shot 2017-03-08 at 3.54.18 PM.png\",\n        2380,\n        760,\n        \"#eeeeee\"\n      ]\n    }\n  ]\n}\n[/block]\n** LogEntries ** \n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/0780964-Screen_Shot_2017-03-08_at_3.51.30_PM.png\",\n        \"Screen Shot 2017-03-08 at 3.51.30 PM.png\",\n        2370,\n        702,\n        \"#ededed\"\n      ]\n    }\n  ]\n}\n[/block]\n** DataDog ** \n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/86babc2-Screen_Shot_2017-03-08_at_3.54.18_PM.png\",\n        \"Screen Shot 2017-03-08 at 3.54.18 PM.png\",\n        2380,\n        760,\n        \"#eeeeee\"\n      ]\n    }\n  ]\n}\n[/block]\n** Pingdom ** \n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/f212624-Screen_Shot_2017-03-08_at_3.54.35_PM.png\",\n        \"Screen Shot 2017-03-08 at 3.54.35 PM.png\",\n        2368,\n        736,\n        \"#efefef\"\n      ]\n    }\n  ]\n}\n[/block]\n** Nagios **\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/ab44c93-Screen_Shot_2017-03-08_at_3.55.01_PM.png\",\n        \"Screen Shot 2017-03-08 at 3.55.01 PM.png\",\n        2368,\n        664,\n        \"#ececec\"\n      ]\n    }\n  ]\n}\n[/block]\nIf this list is missing a variable you are looking for, just send a note to [support:::at:::neptune.io](mailto:support@neptune.io).","excerpt":"","slug":"runbooks-overview","type":"basic","title":"Overview"}
Our philosophy is that a runbook is the key bridge between Dev and Ops. Today developers write code, test, deploy and then move onto another project or feature. The entire burden of setting up alarms and troubleshooting them falls squarely on the Ops teams. So, When an alert triggers, many Ops teams grapple in the dark without right context around the alert. For lack of knowledge, many a times, restarting server or process is used as the first option, and in other situations, incidents get immediately escalated to Dev teams. The situation gets especially worse when a new engineer comes on board and he has to troubleshoot issues. Even in case of vacation-coverage situations, the on-call engineer doesn't have a good understanding of the troubleshooting process This is where a *runbook* can immensely help : 1. It acts as a bridge between Dev and Ops, so that both of them together setup alarms, and Dev teams provide input as to the conditions which can trigger those alarms, and which key metrics or diagnostics should be collected to better troubleshoot various incidents. 2. We have spoken with over 200 DevOps teams and collected the best practice runbooks for various alert and automation use-cases. We open-sourced these runbooks in our public [github](https://github.com/neptuneio/Community-runbooks) repository. We encourage you to use them and customize to suit your needs. *Please do contribute to the public runbooks by sending a pull request anytime* If you want to manage your runbooks on github, find the instructions here. [Private Github runbooks](doc:private-github-runbooks) ## Add a new runbook The only way to create a runbook on neptune is while creating a rule. In the Action part of create rule you can write the script in the editor and save it with a preferred name. This runbook will be executed whenever the rule gets triggered. The saved runbook will also be available for linking to other rules. [block:image] { "images": [ { "image": [ "https://files.readme.io/VNhVlcqHTCyJ79gWMONO_Screen%20Shot%202015-09-02%20at%2016.29.30.png", "Screen Shot 2015-09-02 at 16.29.30.png", "1389", "700", "#824c06", "" ] } ] } [/block] ## Link an existing runbook You can link an existing runbook to a rule by simply selecting it from the drop down. You can make changes to the script and save a copy or overwrite the existing file. [block:image] { "images": [ { "caption": "Select public runbook", "image": [ "https://files.readme.io/uyTnYzPFQ9WOF23OcvdB_select_runbook.png", "select_runbook.png", "1378", "733", "#3b74d3", "" ] } ] } [/block] ## Versioning Runbooks By leveraging source control systems like Github, you'll get the added benefit of versioning your runbooks - so you can sleep peacefully that your engineers are only executing *only vetted runbooks* as opposed to adhoc runbooks. Neptune integrates with your existing runbook repositories from Github or Github Enterprise repos. To load runbooks from Github, refer he[](http://docs.neptune.io/docs/private-github-runbooks)re. To load runbooks from Github Enterprise, refer [here](http://docs.neptune.io/docs/private-github-enterprise-runbooks). ## Language support The runbooks can be written in any language including shell, bash scripts, python, ruby, perl, powershell etc. As long as the host on which agent is installed can execute those commands, neptune agent can execute those scripts. When writing scripts in a specific language, make sure you start the first line of the script with an appropriate shebang. For example, at the beginning of the script, enter something like #!/bin/bash, #!/bin/python etc. ## Using variables in Runbooks Neptune exposes some variables that can be used in your runbooks. It parses the alerts from various monitoring and alerting tools and makes certain variable available by default. While we don't expose all the variables from the incoming incident, we expose most of the popular fields as variables. If you don't see a variable that you are looking to use, just let us know. We'll be happy to expose that as we are constantly adding new fields here every now and then. The key idea is that once a variable (VAR) is exposed, you can use variable inside the runbook as "$VAR" and $VAR will be replaced with the appropriate value. The variables exposed by Neptune varies by the monitoring tool. The below provides exhaustive list of variables exposed by Neptune. **NewRelic** [block:image] { "images": [ { "image": [ "https://files.readme.io/fe61d78-Screen_Shot_2017-03-08_at_3.50.10_PM.png", "Screen Shot 2017-03-08 at 3.50.10 PM.png", 2382, 600, "#ececec" ] } ] } [/block] **Librato** [block:image] { "images": [ { "image": [ "https://files.readme.io/ede79f2-Screen_Shot_2017-03-08_at_3.50.47_PM.png", "Screen Shot 2017-03-08 at 3.50.47 PM.png", 2380, 766, "#ececec" ] } ] } [/block] ** PaperTrail ** [block:image] { "images": [ { "image": [ "https://files.readme.io/a35c5e2-Screen_Shot_2017-03-08_at_3.54.18_PM.png", "Screen Shot 2017-03-08 at 3.54.18 PM.png", 2380, 760, "#eeeeee" ] } ] } [/block] ** LogEntries ** [block:image] { "images": [ { "image": [ "https://files.readme.io/0780964-Screen_Shot_2017-03-08_at_3.51.30_PM.png", "Screen Shot 2017-03-08 at 3.51.30 PM.png", 2370, 702, "#ededed" ] } ] } [/block] ** DataDog ** [block:image] { "images": [ { "image": [ "https://files.readme.io/86babc2-Screen_Shot_2017-03-08_at_3.54.18_PM.png", "Screen Shot 2017-03-08 at 3.54.18 PM.png", 2380, 760, "#eeeeee" ] } ] } [/block] ** Pingdom ** [block:image] { "images": [ { "image": [ "https://files.readme.io/f212624-Screen_Shot_2017-03-08_at_3.54.35_PM.png", "Screen Shot 2017-03-08 at 3.54.35 PM.png", 2368, 736, "#efefef" ] } ] } [/block] ** Nagios ** [block:image] { "images": [ { "image": [ "https://files.readme.io/ab44c93-Screen_Shot_2017-03-08_at_3.55.01_PM.png", "Screen Shot 2017-03-08 at 3.55.01 PM.png", 2368, 664, "#ececec" ] } ] } [/block] If this list is missing a variable you are looking for, just send a note to [support@neptune.io](mailto:support@neptune.io).