Quantcast
Channel: Daniel Berman is Product Evangelist at Logz.io
Viewing all 198 articles
Browse latest View live

The ELK Stack as an Apache Log Analyzer

$
0
0

It’s no secret that Apache is the most popular web server in use today. Netcraft has Apache usage at 47.8% as of February 2015, and according to w3techs, Apache is used by 55% of all of the websites they monitor (with NGINX trailing behind at 27%).

Why is Apache so popular? It’s free and open source – and open source is becoming vastly more popular than proprietary software. It’s maintained by a bunch of dedicated developers, it provides security and is well suited for small and large websites alike, and it can be easily set up on all major operating systems and is extremely powerful and flexible. Does that sound about right?

The big “but” here is that this popularity does not necessarily reflect the challenges facing organizations running business-critical apps on Apache, one of these being log analytics. Being able to gain insight into Apache access and error logs is crucial for analyzing crashes, load times, and other data on app performance. But in production environments in which huge amounts of requests are sent to the web server every second, extracting actionable data from thousands of log files is virtually impossible.

This tutorial will show you one easy way to do just that — by describing how to ship and analyze Apache logs using Logz.io, our predictive, cloud-based log management platform that’s built on top of the open-source ELK Stack (Elasticsearch, Logstash, Kibana). This tutorial can be used with any on-premise installation of the ELK Stack. I’ll just use Logz.io for simplicty’s sake.

This guide will take you through the steps of using our service on a vanilla Linux environment (Ubuntu 12.04) — setting up your environment, shipping logs, and then creating visualizations in Kibana.

Let’s get started!

Prerequisites

To complete the steps below, you’ll need the following:

A common Linux distribution with TCP traffic allowed to port 5000 An active Logz.io account. If you don’t have one yet, you can create a free account here. 5 minutes of free time! Step 1: Setting up your environment

The first step will help you install the various prerequisites required for shipping logs to Logz.io.

Installing Apache

If you’ve already got Apache up and running, great! You can skip to the next step.

If you’re not sure (yes, this happens!), use the next command to see a list of all your Apache packages:

dpkg --get-selections | grep apache

If Apache is not installed, enter the following commands:

$ sudo apt-get update
$ sudo apt-get install apache2

This may take a few seconds as Apache and its required packages are installed. Once done, apt-get will exit and Apache will be installed.

By default, Apache listens on port 80, so to test if it’s installed correctly, simply point your browser to: http://localhost:80.

Installing Rsyslog

Logz.io uses Rsyslog for shipping logs. If you already have Rsyslog installed, excellent. Before skipping to the next step, however, make sure your installed version complies with Logz.io’s minimal requirement (version 5.8.0 and above):

$ rsyslogd -version

If Rsyslog is not installed, use:

$ sudo apt-get install rsyslog

Another good option for shipping logs to Logz.io is Filebeat. A dedicated tutorial on installing and using Filebeat is forthcoming.

Installing Curl

As the Logz.io automatic installation script uses cURL, you will need to install it before continuing on:

$ sudo apt-get install curl
Step 2: Shipping Apache logs to Logz.io

There are two ways to configure the shipping of your Apache logs to Logz.io — one uses an automated cURL script and the other necessitates some manual spooling and configurations and is better in case you’re shipping larger chunks of data.

In this tutorial, though, we won’t be handling large amounts of data and the automatic script will do us just fine:

curl -sLO https://github.com/logzio/logzio-shipper/raw/master/dist/logzio-rsyslog.tar.gz && tar xzf logzio-rsyslog.tar.gz && sudo rsyslog/install.sh -t apache -a "<token>"

Next, restart Rsyslog:

ship-apache-log-files
Step 3: Verifying the shipping pipeline

Our next step is to make sure our log pipeline is configured correctly.

Place a new HTML file called ‘hello.html’ in the web server’s root directory (Apache’s root directory varies according to your Linux distribution) with some simple static code:

<html>
<h1>Watch out sir! Logs on the way!</h1>
</html>

To make things interesting, let’s simulate some load on the server using ApacheBench (which is bundled with Apache):

$ sudo ab -k -c 350 -n 1000 localhost/hello.html

This will simulate some traffic and create a batch of log entries.

Wait a minute or two, access the Logz.io interface, and open the Kibana dashboard. Then, select the Discover tab and enter ‘200’ in the search field at the top of the page.

Apache access logs are displayed for any request returning a 200 response code.

If you want to play around with more complex data, you can download some sample access logs and upload them to Logz.io using the following cURL command. Be sure to replace the placeholders with your info — the full path to the file and your Logz.io token (which can be found in the Logz.io user settings):

curl -T <Full path to file> http://listener.logz.io:8021/file_upload/<Token>/apache_access

Step 4: Visualizing Apache logs in Kibana

Now that our pipeline is up and running, it’s time to have some fun.

You can begin to use Kibana to search for specific data strings. You can search for specific fields, use logical statements, or perform proximity searches — Kibana’s search options are varied and are covered extensively in our Kibana tutorial.

But how about taking these searches to an entirely new level? Kibana allows you to create visualizations from your search results, meaning that the data you’re interested in is reflected in easy-to-use, easy-to-create, and shareable graphical dashboards.

To create a new visualization from a custom search, first save the search by clicking the “Save Search” icon in the top-right corner in the Kibana “Discover” tab.

Once saved, select the Visualize tab:

You have a variety of dashboard types to select from including pie charts, line charts, and gauge graphs.

You then need to select a data source to use for the visualization. You can choose a new or saved search to serve as the data source. Go for the ‘From a saved search’ option and  select the search you saved just a minute ago.

Please note that the search you selected is now bound to this specific visualization, so when you make changes to this search from now on, the visualization will update automatically (though you can unlink the two, if you like).

You can now use the Visualization Editor to customize your dashboard (more information on this will be published soon) and save the visualization. If you wish, you can also add it to your Kibana Dashboard or even share it by embedding it in HTML or by sharing a public link.

You also have the option of using ELK Apps, which is our free library of pre-made dashboards that have already been fine-tuned by Logz.io to suit specific types of log data.

For Apache logs, there are ten available ELK Apps to use including an “Apache Average Byte”’ app that monitors the average amount of bytes sent from your Apache web server and the extremely popular “Apache Access” app that shows a map of your users, response times and codes, and more.

Installing these visualizations is easy — simply select the ELK Apps tab and search for “Apache” (or click here directly).

To use a specific visualization, simply click the Install button and then the Open button.

The ELK app will then be loaded in the Visualization editor, so you can then fine-tune it to suit your personal needs and preferences and then load it in the Dashboard tab:

What Next?

Once you’ve set up your dashboard in Kibana for monitoring and analyzing Apache logs, you can set up an alerting system to notify you (via either email or Slack) when something has occurred in your environment that exceeds your expectations of how Apache and the serviced apps are meant to be performing. Logz.io’s alerting feature allows you to do just that, and we’ll be diving deeper into the rabbit hole in our next tutorial.

//

Logz.io offers enterprise-grade ELK as a service with alerts, unlimited scalability, and collaborative analytics

Start your free trial!

The ELK Stack as an Apache Log Analyzer was first posted on March 7, 2016 at 11:02 am.
©2016 "Logz.io". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at shani@anymation.co.il

Shipping Logs to Logz.io with Filebeat

$
0
0

Replacing Logstash Forwarder, Filebeat is ELK’s next-gen shipper for log data, tailing log files, and sending the traced information to Logstash for parsing or Elasticsearch for storage.

Logz.io, our enterprise-grade ELK as a service with added features, allows you to ship logs from Filebeat easily using an automated script. Once the logs are shipped and loaded in Kibana, you can use Logz.io’s features to monitor your logs and predict issues.

Here, I will explain how to establish a pipeline for shipping your logs to Logz.io using Filebeat. (Note: You can also ship logs to Logz.io using TopBeat, PacketBeat or WinlogBeat — see this knowledge base article for more information.)

Prerequisites

To complete the steps below, you’ll need the following:

A common Linux distribution, with TCP traffic allowed to port 5000 An active Logz.io account. If you don’t have one yet, create a free account here. 5 minutes of free time! Step 1: Installing Filebeat

I’m running Ubuntu 12.04, and I’m going to install Filebeat 1.1.1 from the repository. If you’re using a different OS, additional installation instructions are available here.

First, I’m going to download and install the Public Signing Key:

curl https://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -

Next, I’m going to save the repository definition to /etc/apt/sources.list.d/beats.list:

echo "deb https://packages.elastic.co/beats/apt stable main" | sudo tee -a /etc/apt/sources.list.d/beats.list

Finally, I’m going to run apt-get update and install Filebeat:

sudo apt-get update && sudo apt-get install filebeat
Step 2: Downloading the Certificate

Our next step is to download a certificate and move it to the correct location, so first run:

wget http://raw.githubusercontent.com/cloudflare/cfssl_trust/master/intermediate_ca/COMODORSADomainValidationSecureServerCA.crt

And then:

sudo mkdir -p /etc/pki/tls/certs

sudo cp COMODORSADomainValidationSecureServerCA.crt /etc/pki/tls/certs/
Step 3: Configuring Filebeat

Our next step is to configure Filebeat to ship logs to Logz.io by tweaking the Filebeat configuration file, which on Linux is located at: /etc/filebeat/filebeat.yml

Before you begin to edit this file, make a backup copy just in case of problems.

Copy and paste the following configuration example into the file:

Defining the Filebeat Prospector

Prospectors are where we define the log files that we want to tail. You can tail JSON files and simple text files. In the example file above, I’ve defined the path for tailing any log file under the /var/log/ directory ending with .log (line 12).

Please note that when harvesting JSON files, you need to add ‘logzio_codec: json’ to the fields object (line 28). When harvesting text lines, you need to add ‘logzio_codec: plain’ to the fields object (line 15).

Two additional properties are important for defining the prospector:

First, the fields_under_root property should always be set to true Second, the document_type property is used to identify the type of log data and should be defined. While not mandatory, defining this property will help optimize Logz.io’s parsing and groking of your data

A complete list of known types is available here, and if your type is not listed there, please let us know.

Defining the Filebeat Output

Outputs are responsible for sending the data in JSON format to Logstash. In the example above, the Logstash host is already defined (line 45) along with the location of the certificate that you downloaded earlier and the log rotation setting (line 48).

Be sure to use the following logz.io token in the required fields (lines 16 and 29):

Step 4: Verifying the pipeline

That’s it. You’ve successfully installed Filebeat and configured it to ship logs to Logz.io!

Make sure Filebeat is running:

$ cd /etc/init.d
$ ./filebeat status

And if not, enter:

$ sudo ./filebeat start

To verify the pipeline, head over to your Kibana and see if the log files are being shipped. It may take a minute or two for the pipeline to work — but once you’re up and running, you can start to analyze your logs by performing searches, creating visualizations, using the Logz.io alerting feature to get notifications on events, and using our free ELK Apps library.

Please note that Filebeat saves the offset of the last data read from the file in the registry, so if the agent restarts, it will continue from the saved offset.

//

Logz.io offers enterprise-grade ELK as a service with alerts, unlimited scalability, and collaborative analytics

Start your free trial!

Shipping Logs to Logz.io with Filebeat was first posted on March 10, 2016 at 8:49 am.
©2016 "Logz.io". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at shani@anymation.co.il

MySQL Log Analysis with the ELK Stack

$
0
0

No real-world web application can exist without a data storage backend, and most applications today use relational database management systems (RDBMS) for storing and managing data. The most commonly used database is MySQL, which is an open-source RDBMS that is the ‘M’ in the open-source enterprise LAMP Stack (Linux, Apache, MySQL and PHP).

Middle- or large-sized applications send multiple database queries per second, and slow queries are often the cause of slow page loading and even crashes. The task of analyzing query performance is critical to determine the root cause of these bottlenecks, and most databases come with built-in profiling tools to help us.

In the case of MySQL, one useful method is to analyze the logs. Queries that take longer to process than a predefined length of time will be logged in the MySQL slow query log. The error log also contains useful information, reporting when MySQL was started or stopped and when critical errors occurred. But as with any type of log data — log analysis and management are tasks that challenge even the most experienced of teams, especially in enterprise production environments that produce log files containing thousands of entries every day.

This is the reason why more and more companies are choosing the open source ELK stack (Elasticsearch, Logstash, and KIbana) as their log analysis platform. It’s fast, it’s free, and it’s simple.

This tutorial will describe how to ship and analyze MySQL logs using Logz.io, our predictive, cloud-based log management platform that’s built on top of the ELK Stack, but you can follow the steps in the tutorial with any on-premise installation of the ELK Stack as well.

Note: The method described for shipping the logs to Logz.io includes the use of rsyslog. There are alternative methods as well, such as using a Docker image or using Filebeat. More on these methods in upcoming articles.

Prerequisites A common Linux distribution with TCP traffic allowed to port 5000 An active Logz.io account. If you don’t have one yet, you can create a free account here 5 minutes of free time! Step 1: Setting up your environment

The first step will help you to install the various prerequisites required to ship logs to Logz.io.

Installing MySQL

If you’ve already got MySQL installed, you can skip to the next step. If not, enter the following command to update your system:

$ sudo apt-get update

And then install MySQL:

$ sudo apt-get install mysql-server

During installation, you’ll need to set the root password. Make note of it, as you’ll be needing it in the following steps.

Installing Rsyslog

Most Unix systems these days come with rsyslog pre-installed, but even so, make sure your installed version complies with Logz.io’s minimal requirement (version 5.8.0 and above):

$ rsyslogd -version

If Rsyslog is not installed or you have an old version, enter the following commands to install version 8:

$ sudo add-apt-repository ppa:adiscon/v8-stable

$ sudo apt-get install rsyslog
Installing cURL

The Logz.io automatic installation script uses cURL, so you will need to install it before continuing:

$ sudo apt-get install curl
Step 2: Shipping logs to Logz.io

Our next step is to set up the shipping pipeline into Logz.io.

Configure MySQL to write log files

First, we need to configure MySQL to write general and slow query log files because those configurations are disabled by default. Then, we need to set the threshold for creating logs.

To do this, first open the ‘my.cnf’ configuration file:

$ sudo vim /etc/mysql/my.cnf

Next, uncomment and edit the following lines:

general_log_file = /var/log/mysql/mysql.log
general_log= 1
log_slow_queries = /var/log/mysql/mysql-slow.log
long_query_time = 1
log-queries-not-using-indexes = 1

Be sure to restart MySQL after making these changes:

$ sudo service mysql restart
Creating some logs

Let’s create some MySQL logs by using sample data. Clone this GitHub repo (you’ll need Git installed), cd into it, and install the database data:

$ git clone https://github.com/datacharmer/test_db.git

$ cd test_db

$ mysql -uroot -p < employees.sql

To create some slow query logs, access MySQL, select the ‘employees’ database, and use the following query:

> SELECT * FROM employees;
Running the installation script

We’re now going to run the Logz.io automatic installation script for MySQL logs:

$ curl -sLO https://github.com/logzio/logzio-shipper/raw/master/dist/logzio-rsyslog.tar.gz && tar xzf logzio-rsyslog.tar.gz && sudo rsyslog/install.sh -t mysql -a "<yourToken>"

Remember to insert your token (which can be found in your settings within the Logz.io user interface).

The script assumes the following locations for the MySQL log files, but they can be overridden:

General log – /var/log/mysql/mysql.log (use –generallog to override) Slow queries log – /var/log/mysql/mysql-slow.log (use –slowlog to override) Error log – /var/log/mysql/error.log (use –errorlog to override) Step 3: Analyzing the logs

Our slow query event will show up in Kibana after a few seconds:

To make sure that the shipping pipeline is working correctly, search for MySQL log file types in Kibana: ‘mysql’, ‘mysql_error’ and ‘mysql_slow_query’.

Once your logs have arrived, you can begin to use Kibana to query Elasticsearch, filter the logs based on your needs, and save your searches to create visualizations.

Creating alerts for slow queries

Setting up an alert for slow MySQL queries is a great way to become proactive in analyzing performance. Logz.io allows you to configure an alert that will automatically send you a notification by either email or any application that can use Webhooks (e.g., Slack, JIRA, etc.).

To set an alert for a specific event, use search and filtering to narrow down to the relevant log and click the “Create Alert” button in the top-right corner.

Your Elasticsearch query is already loaded in the alert settings, and all that’s left for you to do is to decide the threshold for when you want to receive a notification and via what method you will receive it. Read more about creating alerts here on our blog.

Installing a MySQL dashboard

Logz.io allows you to install pre-made visualizations and dashboards easily from the ELK Apps tab. There are currently 15 apps for MySQL logs you can use — just search for “MySQL” and select the app that is most relevant for your environment:

One useful app is the mysql_monitor — it is a dashboard app that will load a complete set of visualizations to help you to monitor your MySQL logs.

Of course, you’ll need to ship a larger amount of logs to really enjoy these visualizations and dashboards. So, what are you waiting for?

Logz.io is a predictive, cloud-based log management platform that is built on top of the open-source ELK Stack and can be used for MySQL log analysis. Start your free trial today!

//

Logz.io offers enterprise-grade ELK as a service with alerts, unlimited scalability, and collaborative analytics

Start your free trial!

MySQL Log Analysis with the ELK Stack was first posted on March 21, 2016 at 10:53 am.
©2016 "Logz.io". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at shani@anymation.co.il

A Fluentd Tutorial: Shipping Logs to Logz.io

$
0
0

Fluentd is an open source data collector developed by Treasure Data that acts as a unifying logging layer between input sources and output services. Fluentd is easy to install and has a light footprint along with a fully pluggable architecture.

In the world of ELK, Fluentd acts as a log collector — aggregating logs, parsing them, and forwarding them on to Elasticsearch. As such, Fluentd is often compared to Logstash, which has similar traits and functions (see a detailed comparison between the two here).

Both Logstash and Fluentd are supported by us at Logz.io, and we see quite a large number of customers using the latter to ship logs to us. This Fluentd tutorial describes how to establish the log shipping pipeline — from the source (Apache in this case), via Fluentd, to Logz.io.

Prerequisites

To complete the steps below, you’ll need the following:

HTTPS traffic allowed to port 8071 An installed cURL and Apache web server An active Logz.io account. If you don’t have one yet, create a free account here. 5 minutes of free time! Step 1: Installing Fluentd

The latest stable release of Fluentd is called ‘td-agent.’ To install it, use this cURL command (this command is for Ubuntu 12.04 — if you’re using a different Linux distribution, click here):

$ curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-precise-td-agent2.sh | sh

The command will automatically install Fluentd and start the daemon. To make sure all is running as expected, run:

$ sudo /etc/init.d/td-agent status
Step 2: Installing the Logz.io plugin

Our next step is to install the Logz.io plugin for Fluentd. To do this, we need to use the gem supplied with the td-agent:

$ /opt/td-agent/usr/sbin/td-agent-gem

To install the Logz.io plugin, run:

$ sudo /opt/td-agent/usr/sbin/td-agent-gem install fluent-plugin-logzio
Step 3: Configuring Fluentd

We now have to configure the input and output sources for Fluentd. In this tutorial, we’ll be using Apache as the input and Logz.io as the output.

Open the Fluentd configuration file:

$ sudo vi /etc/td-agent/td-agent.conf

Define Apache as the input source for Fluentd:

<source>
@type tail
format none
path /var/log/apache2/access.log
Pos_file /tmp/access_log.pos
tag apache
</source>

Note: Make sure you have full permissions to access Apache files. If you do not, Fluentd will fail to pull the logs and send them on to Logz.io.

Next, we’re going to define Logz.io as a “match” (the Fluentd term for an output destination):

<match **.**>
type logzio_buffered
endpoint_url https://listener.logz.io:8071?token=<token>&type=<logtype>
output_include_time true
output_include_tags true
buffer_type file
buffer_path <pathtobuffer>
flush_interval 10s
buffer_chunk_limit 1m # Logz.io has bulk limit of 10M. We recommend set this to 1M, to avoid oversized bulks
</match>

Fine-tune this configuration as follows:

<token> : Use your token in the token placeholder (which can be found in the Logz.io Settings section) <logtype> : Specify the log type (e.g. ‘apache-access’) in the type placeholder. This helps Logz.io to parse and grok your data. A complete list of known types is available here. If your type is not listed there, please let us know. <pathtobuffer> : Enter a path to the folder in your file system for which you have full permissions (e.g. /tmp/buffer). The buffer file helps to aggregate logs together and ship them in bulk.

Last but not least, restart Fluentd:

$ sudo /etc/init.d/td-agent restart

That’s it. After a minute or two, your Apache logs will show up in the Logz.io user interface. To create some log files, run this ab command to simulate traffic (you’ll need to place a file on your web server to use first):

$ sudo ab -k -c 350 -n 1000 localhost/<file.html>

Logz.io is a predictive, cloud-based log management platform that is built on top of the open-source ELK Stack and can be used for log analysis and management. Start your free trial today!

//

Logz.io offers enterprise-grade ELK as a service with alerts, unlimited scalability, and collaborative analytics

Start your free trial!

A Fluentd Tutorial: Shipping Logs to Logz.io was first posted on March 28, 2016 at 9:33 am.
©2016 "Logz.io". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at shani@anymation.co.il

Spreading the Goodness: Sharing Kibana Searches and Visualizations

$
0
0

One of the great features in Kibana is the ability to share data with team members. You can publicly share specific visualizations or even entire dashboards with anyone that needs to see the same information that you’re viewing in Kibana.

Logz.io has taken it up a notch by applying some extra enhancements to this feature and making it enterprise-grade with the ability to use access user tokens that allow you to share Kibana visualizations and dashboards safely and securely with the outside world. The latest addition to this feature allows you to filter the information you wish to share by associating log field filters to specific user tokens.

Using tokens and filters is one method that you can use to enable role-based access to the ELK Stack in your organization by which different team members have access to different information. Logz.io also has a user-management system in place that allows you to manage your organization. More about that in a future post.

This article will show you how to use tokens and filters to spread the Kibana goodness with an added layer of security.

Using user tokens

When sharing Kibana data, there is no established mechanism to make sure that data is safe. Logz.io provides a solid, advanced method of securing this information with access tokens. Using tokens — as opposed to using the regular share URL function in Kibana — will enable you to share visualizations and dashboards with people who are not even Logz.io users.

The access tokens are managed within the Logz.io interface in the Settings section:

Here, you can add and remove new user tokens as you see fit and as the need arises.

When sharing a visualization or dashboard within Kibana, these tokens can then be easily selected and applied to the request URL that you are sharing.

Needless to say, the person with whom you are sharing the data does not have to be a Logz.io user — he will be able to see exactly the same information, even if he has never heard of Logz.io before.

Sharing filtered Kibana information

What if you do not want to share all of the data? There are a number of reasons you might want to filter the data such as wanting to maintain strict security or not wanting to drown teammates in unnecessary log noise. Logz.io enables you to narrow the access granted by a token to a specific field type and value.

For example, say that you have a dashboard setup in Kibana showing various statistics on all the logs coming into your system but would like to only share data relating to Apache logs. To filter the data, first define a new filter type in the Token Filters section on the User Tokens settings page.

When creating a new filter, you are required to enter the following information:

Description – A short description of the filter for display purposes Field – The exact field name you want to filter the data by (e.g. type) Value – The exact value for the field you entered above you want to filter the data by (e.g. apache_access)

Save the filter once done.

Then, click Attach Filter for the token with which you wish to associate the filter, and select the filter.

Don’t forget to click Save to apply the changes.

Back in Kibana, open the visualization or dashboard you wish to share, click the Share icon and select the token you want to use in the request URL. There is no need to define the filter since it is automatically associated with the token.

All that’s left to do is share the URL.

If you think your visualization will be useful to other users, you can contribute it to the Logz.io ELK Apps — a free library of pre-made Kibana searches, alerts, visualizations and dashboards that is tailored for specific log types and use cases. (More information on ELK Apps is here.)

Logz.io is a predictive, cloud-based log management platform that is built on top of the open-source ELK Stack. Start your free trial today!

//

Logz.io offers enterprise-grade ELK as a service with alerts, unlimited scalability, and collaborative analytics

Start your free trial!

Spreading the Goodness: Sharing Kibana Searches and Visualizations was first posted on March 30, 2016 at 8:30 am.
©2016 "Logz.io". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at shani@anymation.co.il

AWS Log Analysis with the ELK Stack

$
0
0

Amazon Web Services log data is an extremely valuable resource that can be used to gain insight into the various services that comprise your environment.

The problem is that AWS-based architectures often use multiple components, each of which can potentially produce a huge amount of log files per second (depending on the environment that you have set up).

Not only that, the monitoring services provided by AWS do not offer a complete logging solution. Making correlations between the various layers of your application is crucial in troubleshooting problems in real-time but the functionalities of CloudTrail and CloudWatch in this context are extremely limited. These services cannot be used to analyze application logs and are missing other features such as querying and visualization capabilities.

This article describes a different approach — defining S3 as the endpoint for all AWS logs and ingesting them into our Logz.io cloud-based ELK Stack (Elasticsearch, Logstash, Kibana), which is the world’s most popular open source log analysis platform. We chose Logz.io for reasons of simplicity, but you can use your own open source ELK Stack in the exact same way.

Prerequisites

To follow the steps, you will need AWS and Logz.io accounts and a basic knowledge of AWS architecture.

Important! Please note that this article assumes you have full permissions and access to your S3 buckets. If not, you will need to configure permissions to allow S3 integration with Logz.io.

Writing Logs to S3

Your first step is to make sure that your AWS instances are writing their access logs to S3. This can be to a separate bucket or a directory within a bucket. These log files will then be pulled into Logz.io for analysis and visualization.

In most cases, logging for your instances is disabled by default. To enable logs, you will need to access the management console for the service and manually enable logs. You will then be required to set the interval for shipping logs to S3 from the instance and the name of the S3 bucket. You can also use the AWS CLI to do the same.

In any case, I recommend referring to the AWS documentation for specific instructions, as they may vary based on the type of instance and service.

Analyzing S3 Bucket Logs

Once you’ve made sure that instance logs are being written to S3 buckets, the process for shipping these logs to Logz.io for analysis is simple.

First, access the Logz.io interface and open the Log Shipping tab.

Then, go to the AWS -> S3 Bucket section, and enter the details of the S3 bucket from which you would like to ingest log files:

Enter the following information:

S3 bucket – the name of the S3 bucket. Prefix – the directory within the bucket that contains the log files. This field is optional and you can leave it empty if you are pulling from the root directory of the bucket. S3 access key – your S3 access key ID. S3 secret key – the S3 secret access key.

Be sure to select the log type (such as “ELB”) — this makes sure that the log files are parsed and enriched as they are ingested into Logz.io.

Click Save to save your bucket configurations. Log files will be shipped into Logz.io and displayed in the Kibana user interface within a minute or two.

If you are using the your own open source ELK Stack, you will need to add relevant input and output configurations to your Logstash instance.

Here’s an example of what that configuration would look like for ELB logs:

input {
s3 {
bucket => "elb-logs"
credentials => [ "my-aws-key", "my-aws-token" ]
region_endpoint => "us-east-1"
# keep track of the last processed file
sincedb_path => "./last-s3-file"
codec => "json"
type => "elb"
}
}

filter {
if [type] == "elb" {
grok {
match => [ "message", "%{TIMESTAMP_ISO8601:timestamp} %{NOTSPACE:loadbalancer} %{IP:client_ip}:%{NUMBER:client_port:int} (?:%{IP:backend_ip}:%{NUMBER:backend_port:int}|-) %{NUMBER:request_processing_time:float} %{NUMBER:backend_processing_time:float} %{NUMBER:response_processing_time:float} (?:%{NUMBER:elb_status_code:int}|-) (?:%{NUMBER:backend_status_code:int}|-) %{NUMBER:received_bytes:int} %{NUMBER:sent_bytes:int} \"(?:%{WORD:verb}|-) (?:%{GREEDYDATA:request}|-) (?:HTTP/%{NUMBER:httpversion}|-( )?)\" \"%{DATA:userAgent}\"( %{NOTSPACE:ssl_cipher} %{NOTSPACE:ssl_protocol})?" ]
}
grok {
match => [ "request", "%{URIPROTO:http_protocol}" ]
}
geoip {
source => "client_ip"
target => "geoip"
add_tag => [ "geoip" ]
}
useragent {
source => "userAgent"
}
date {
match => ["timestamp", "ISO8601"]
}
}

output {
elasticsearch_http {
host => "localhost"
port => "9200"
}
}
Analyzing S3 Access Logs

S3 buckets produce logs themselves each time they are accessed, and the log data contains information on the requester, bucket name, request time, request action, response status, and error code (should an error occur).

Analyzing these logs is useful to help to understand who is using the buckets and how. Access log data can also be useful for security and access audits, and they play a key role in securing your AWS environment.

So, how do you ship these access logs to the Logz.io ELK Stack?

Your first step is to enable access logs for a specific bucket. To do this, select your S3 bucket in the S3 Console and then open the Logging section in the Properties pane:

Enable logging, and select the name of the target bucket in which you want Amazon S3 to save the access logs as objects. You can have logs delivered to any bucket that you own, including the source bucket. AWS recommends that you save access logs in a different bucket so that you can easily manage the logs.

Once saved, S3 access logs are written to the S3 bucket that you had chosen. Your next step is to point Logz.io to the relevant S3 Bucket.

In the Log Shipping tab in Logz.io, open the AWS -> S3 Access configurations:

Add the details of the S3 bucket that you selected as the target bucket as well as your S3 secret and access keys.

Once saved, your S3 access data will begin to appear in Kibana.

Again, if you’re using your own open source ELK Stack, you’ll need to add the correct configuration to your Logstash instance (the same configuration as shown above would apply in this case).

Installing and Using AWS ELK Apps

Once your logs have arrived, you can begin to use Kibana to query Elasticsearch. Querying Elasticsearch is an art in itself, and this tutorial on Elasticsearch queries does a good job at describing the main query types and options.

You can then create visualizations to visualize the data you’re interested in.

To hit the ground running, you can install an AWS-specific ELK App. ELK Apps is our free library of pre-made searches, visualizations, and dashboards tailored for specific log types. There are countless ELK Apps for AWS services including for S3, CloudTrail, and VPC Flow.

To install an ELK App, open the ELK Apps tab in Logz.io, and use the search box in the top-right corner of the page (more information about the library is here:

Click Install for any one of the available apps and then Open to have it displayed in Kibana. You can, of course, customize the app to suit your individual environment and personal preferences.

Logz.io is a predictive, cloud-based log management platform that is built on top of the open-source ELK Stack and AWS. Start your free trial today!

//

Logz.io offers enterprise-grade ELK as a service with alerts, unlimited scalability, and collaborative analytics

Start your free trial!

AWS Log Analysis with the ELK Stack was first posted on April 13, 2016 at 1:13 pm.
©2016 "Logz.io". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at shani@anymation.co.il

How to Analyze Salesforce Data with the ELK Stack

$
0
0

The ELK Stack (Elasticsearch, Logstash, and Kibana) is most commonly used for analyzing log files produced by networks, servers, and applications. Powerful indexing and searching capabilities coupled with rich visualization features make ELK the most popular open-source log analytics platform.

If you’re an IT manager or a DevOps engineer, you’re probably already using ELK to analyze and manage your company’s logs. But ELK is extremely useful in other use cases as well. In a previous post, we described how marketers can use ELK to analyze server logs for technical SEO, and others are using the stack for business intelligence, log-driven development, and various other purposes as well.

This article introduces yet another interesting use case — ingesting data from Salesforce for sales and marketing analysis. If you’re managing your company’s Salesforce account, you also know that analyzing Salesforce data to provide your CEO with insights he can use is not an easy task. This article provides a way to solve this challenge. Specifically, we will be showing how to create a pipeline from a Salesforce account, into Logstash, and then into Logz.io — our ELK as a service platform. Of course, you can perform the exact same procedure using your own hosted ELK stack.

By ingesting Salesforce data into ELK, sales and marketing teams can go back in time and perform deep analyses of cross-team metrics. For example, you could measure the performance of your sales representatives over time or get an overall picture of lead generation efforts and the numbers of closed deals over a specific length of time.

Step 1: Creating a connected app in Salesforce

Before we begin to install and configure Logstash, we need to retrieve some credentials from Salesforce that will allow us to access the data and stream it into Elasticsearch. This involves the creation of a new connected app and the generation of a user token.

In Salesforce, open the Setup page. In the Build section on the left, select Create → Apps.

Here, you’ll see a list of the Apps in your Salesforce organization. In the Connected Apps section at the bottom of the page, click New to create a new connected app:

Enter the following information (see the Salesforce documentation for a description of all of the available settings):

Connected App Name – name of the new connected app (e.g. ELK) API Name – name of the API to be used by the connecting program (e.g. ELK) Contact Email – an email address that Salesforce can use to contact you

Select the Enable OAuth Setting checkbox, and enter the following information:

Callback URL – You can enter ‘http://localhost/’ since the Logstash plugin will be using password authentication Selected OAuth Scopes – Add Full Access (full) to the Selected OAuth Scopes list

When done, click Save. The new app is added and displayed.

Make note of the following fields for use later on when configuring Logstash: Consumer Key and Consumer Secret (click to reveal).

The last piece of the puzzle needed for authentication and for our Logstash configuration is your Salesforce security token. This is given to the administrator of the Salesforce account, but a new token can be created (by the correct user) after resetting it.

Installing Logstash and the Salesforce plugin

We can now begin setting up Logstash.

Logstash, the ‘L’ in the “ELK Stack,” is used at the beginning of the logging pipeline — ingesting and collecting data before sending it to Elasticsearch. Although log files are the most common use case, any other type of event can be forwarded into Logstash and transformed using plugins. We will first install Logstash itself and then the community-maintained plugin for Salesforce.

To install Logstash, first download and install the public signing key:

Then, add the repository definition to your /etc/apt/sources.list file:

$ echo "deb http://packages.elastic.co/logstash/2.2/debian stable main" | sudo tee -a /etc/apt/sources.list

Update your system so that the repository will be ready for use and then install Logstash with:

$ sudo apt-get update && sudo apt-get install logstash

Next, install the Logstash plugin for Salesforce:

$ cd /opt/logstash
$ sudo bin/plugin install logstash-input-salesforce
Configuring Logstash

Now that we have the correct packages installed, we need to configure Logstash to receive input from Salesforce and then forward it to Elasticsearch.

Logstash configuration files are written in JSON-format and reside in /etc/logstash/conf.d. The configuration consists of three sections: inputs, filters, and outputs. (This is also discussed in our guide to Logstash plugins.)

Let’s create a configuration file called ‘salesforce.conf’:

$ sudo vi /etc/logstash/conf.d/salesforce.conf

First, enter the input:

input {
   salesforce {
      client_id => 'consumer_key' #Salesforce consumer key
      client_secret => 'consumer_secret' #Salesforce consumer secret
      username => 'email' #the email address used to authenticate
      password => 'password' #the password used to authenticate
      security_token => 'security_token' #Salesforce security token
      sfdc_object_name => 'object' #Salesforce object to track
   }
}

You can track any Salesforce object that you want, but you need to copy the input configuration for each. The most common objects tracked are ‘Opportunity,’ ‘Lead,’ ‘Account,’ ‘Event,’ and ‘Contact.’ Check the list of available Salesforce API objects to see what objects can be ingested by the Logstash plugin.

By default, the input will ingest all of the available fields for the Salesforce object that you had selected. You have the option to define the specific fields that you would like to see with the ‘sfdc_fields’ parameter:

sfdc_fields => ['Id','Name','Type']

For a full list of the available parameters for this Logstash plugin, see Elastic’s documentation.

Next, enter a filter as follows (enter your Logz.io token in the relevant placeholder):

filter {
   mutate {
      add_field => { "token" => "<yourToken>" }
      add_field => { “type” => “SalesForce” }
   }
}

The filter in this case is adding two fields to our messages — one is our Logz.io token, and the other a type field called ‘SalesForce.’ This will help to differentiate the data coming from Salesforce from other input sources.

Last but not least, define the output to Logz.io as follows:

output {
   tcp {
      host => "listener.logz.io"
      port => 5050
      codec => json_lines
     }
}

If you were using your own Elasticsearch instance, you would need to define the output accordingly:

output {
   elasticsearch {
      index => "salesforce"
      index_type => "lead"
      host => "localhost"
   }
}

That’s it. Once you’re done, restart Logstash:

$ sudo service logstash restart

Salesforce data should begin to show up in the Kibana interface integrated into Logz.io almost immediately (if not, double-check the credentials that you had used in the Logstash configurations. Those are crucial for connecting to Salesforce).

Analyzing the data

Once the data is indexed in Elasticsearch and displayed in Kibana, you can begin to analyze it.

If you’re pulling data from various input sources, your first step is to isolate the data from Salesforce. Use this simple search query that will isolate all of the messages coming into your system with the ‘SalesForce’ type:

type:SalesForce

Your next step is to figure out what information you’d like to inspect and visualize. Of course, how you analyze the data depends on your objectives and the specific Salesforce object that you’ve decided to track.

In this specific configuration, we’ve ingested existing data for the Salesforce “Lead” object. So, say you’d like to see a visualization that depicts the status of your leads. Cross-reference the above saved search with the ‘status’ field to create a new pie-chart visualization:

Or, you could create a bar chart that shows the number of conversions over time (cross-reference the search with the ‘ConversionDate’ and ‘IsConverted’ fields):

The sky’s the limit. Once you have a few visualizations lined up in Kibana, combine them into one comprehensive dashboard that you can use to see how your sales and marketing teams are performing.

That wasn’t hard, was it?

Logz.io is a predictive, cloud-based log management platform that is built on top of the open-source ELK Stack. Start your free trial today!

//

Logz.io offers enterprise-grade ELK as a service with alerts, unlimited scalability, and collaborative analytics

Start your free trial!

How to Analyze Salesforce Data with the ELK Stack was first posted on April 19, 2016 at 11:14 am.
©2016 "Logz.io". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at shani@anymation.co.il

Announcing the Logz.io Add-On for Heroku Apps

$
0
0

We’re happy to announce that Logz.io is now available as an add-on for Heroku applications.

Logz.io provides the ELK Stack (Elasticsearch, Logstash, Kibana) — the world’s most popular open source log analytics platform — as an easy, secure, and scalable service on the cloud with a bunch of enterprise-grade features such as archiving, alerts, security, and more.

Heroku makes developing and deploying applications on the cloud extremely simple by doing all the worrying about the underlying infrastructure of your applications for you. Supporting applications written in Ruby, Python, node.js, PHP, Java, Scala, and Go, Heroku has a rich ecosystem of third-party add-ons that add more functionality to the provided services.

Heroku users can now install Logz.io on their application and leverage the full power of the ELK Stack to manage and analyze their logs.

How to install Logz.io on your Heroku app

In Heroku, all logs created by your app and Heroku components are aggregated and collected into a single channel by a log delivery system called Logplex, which can be accessed via any of the logging add-on providers supported by Heroku or a custom log drain.

The simplest way to install the Logz.io add-on is via the Heroku dashboard:

Log into the Heroku dashboard. Go to the Add-Ons page, and in the Logging section, select the Logz.io add-on.

On the Logz.io Add-On page, click the Install Logz.io button at the top-right corner of the page. When prompted, select the application on which you want to install the add-on and click Submit. You’ll then be taken to the application page and asked to select the plan name. Logz.io is currently free of charge, so simply click Provision. Logz.io is then installed on your application. To access Logz.io, simply click the Logz.io icon displayed in your application’s add-ons list. The Logz.io interface opens up, with the Kibana Discover tab displayed:

Once your pipeline into the Logz.io ELK is established, you can start to query the data as well as create visualizations and dashboards. For an idea of how the ELK Stack can be used for log analysis, check out this video, our collection of ELK Stack guides, and our Learn section.

Logz.io is a predictive, cloud-based log management platform that is built on top of the open-source ELK Stack. Start your free trial today!

//

Logz.io offers enterprise-grade ELK as a service with alerts, unlimited scalability, and collaborative analytics

Start your free trial!

Announcing the Logz.io Add-On for Heroku Apps was first posted on April 20, 2016 at 8:23 am.
©2016 "Logz.io". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at shani@anymation.co.il

Heroku Log Analysis with the ELK Stack

$
0
0

Heroku is a cloud PaaS that allows developers to create and deploy applications without having to worry about all the underlying infrastructure. Heroku supports applications written in Ruby, Python, node.js, PHP, Java, Scala, and Go and packs scores of third-party add-ons and complementary services.

The rich and ever-growing ecosystem, scalability, and overall simplicity of Heroku are all reasons why the platform is rapidly gaining popularity with companies of all sizes and independent developers alike.

Heroku’s logging system is extremely robust, aggregating application, system, and API logs into one central stream and allowing you to hook into this stream using either add-on logging services or logging hooks called “drains.”

Following our recent announcement of the Logz.io add-on for Heroku, we will now describe in this tutorial how to ship and analyze Heroku application logs using the enterprise-grade ELK Stack provided by Logz.io — our predictive, cloud-based log management platform that’s built on top of ELK.

Prerequisites

To follow this tutorial, make sure that your environment has the following requirements:

A Heroku account. If you don’t have one, sign up here. A Logz.io account. If you don’t have one yet, you can create a free account here. 5 minutes of free time! Step 1: Getting ready

Let’s first set up our environment using the Heroku CLI and creating our sample PHP application. If any of the following components are already installed, skip to the next step.

Installing PHP

The application used in this article is a PHP application. If you intend on using a different application or already have PHP installed (Mac OS X comes with a built-in PHP stack), skip this step:

$ sudo apt-get update
$ sudo apt-get install PHP5
Installing Git

We’ll need Git to clone our sample application and then deploy it to Heroku:

$ sudo apt-get install git
Installing the Heroku Toolbelt

The Heroku Toolbelt allows you to use the Heroku CLI, which can be used in turn for managing your applications and added tools:

$ wget -O- https://toolbelt.heroku.com/install-ubuntu.sh | sh
Step 2: Creating a new application

Our next step is to create a new application that will generate some logging data that we can ship and analyze. I chose to demonstrate the process using a PHP application, but you can, of course, use any application that you like.

First, log into Heroku with:

$ heroku login

The Heroku toolbelt CLI tool will now be installed. This might take a few minutes.

You will be required to enter your account credentials — the e-mail that you used to register to Heroku and the password you defined.

We’re now going to clone a new application from GitHub. For the purpose of convenience, I’m going to use Heroku’s “Getting Started with PHP application” guide:

$ git clone https://github.com/heroku/php-getting-started.git

To create the application, cd into the folder and run:

$ cd php-getting-started
$ heroku create <yourAppName>

Note, you can leave the app parameter out, and Heroku will pick a random name for your application for you.

Next, deploy your application using the git push command:

$ git push heroku master

Your application is now deployed. To make sure an instance is running, run:

$ heroku ps:scale web=1

Visit the app at the URL displayed at the end of the deployment, for example:

https://logzioapp.herokuapp.com/

Step 3: Analyzing Heroku logs

We’ve created and deployed a new PHP application on Heroku. Now, let’s see how to handle logging.

In Heroku, all logs created by your app and Heroku components are aggregated and collected into a single channel by a log delivery system called Logplex, which can be accessed via any of the logging add-on providers supported by Heroku or a custom log drain.

Using logs –tail

The easiest way to take a look at your Heroku events is to use the following log command:

$ heroku logs --tail

Refresh your app in the browser to see the fresh logs added.

Using the ELK Stack

These logs will multiply and get more complicated as you develop your application, gaining real time and actionable insights will become a business problem in no time.

The ELK Stack (an acronym for Elasticsearch, Logstash, and Kibana) is the world’s most popular open source log analytics platform and can now be installed on your Heroku application using the Logz.io add-on.

Logz.io provides the ELK Stack as an easy, secure and scalable service on the cloud, with a bunch of enterprise-grade features such as archiving, alerts, security and more. To send logs to Logz.io, we’re going to install our add-on, which is currently in Beta mode and completely free of charge.

You can do this via the Heroku dashboard if you like, but we’re going to use the following command in CLI:

$ heroku addons:create logzio:test

Logz.io is added, and is now displayed in the Heroku dashboard under your application:

It’s time to access your ELK stack — to do this, simply click Logz.io in the add-ons list and a new window is opened with your first Heroku logs displayed in Kibana:

Once your pipeline into the Logz.io ELK is established, you can start to query the data, create visulizations and dashboards. For an idea of how the ELK Stack can be used for log analysis, check out this video.

Logz.io is a predictive, cloud-based log management platform that is built on top of the open-source ELK Stack and can be used in Heroku log analysis as described. Start your free trial today!

//

Logz.io offers enterprise-grade ELK as a service with alerts, unlimited scalability, and collaborative analytics

Start your free trial!

Heroku Log Analysis with the ELK Stack was first posted on April 25, 2016 at 8:41 am.
©2016 "Logz.io". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at shani@anymation.co.il

Introducing the Logz.io Docker Log Collector

$
0
0

Docker environments produce a large number of log messages, system events, and statistical data that, taken together, can provide an accurate picture of how our containers are performing.

Docker-specific constraints however, make extracting insights from this valuable data a real challenge.

To overcome this challenge, Logz.io has released a new Docker log collector that collects logs and monitoring statistics from your Docker environment and continuously streams them into the Logz.io cloud-based, enterprise-grade ELK Stack.

The challenge of Docker logging

The reason that Docker log analysis and management is difficult stems from the very fact that Docker is a distributed system. A typical Docker setup consists of multiple containers and produces numerous types of logs including Docker container logs and Docker service logs. Also, containers are not static beings. They are constantly on the move — starting, restarting, and dying. When a container shuts down, any saved files are lost — making logging to a file in the container extremely risky. Add the fact that sometimes there are multiple services being executed within a single container — each of which is producing its own set of logs — and you can understand why Docker logging is quite the task.

Unfortunately, the tools currently used by the Docker community offer only partial solutions.

Docker drivers can output container logs to a specified endpoint such as Fluentd, Syslog, or Journald. Logspout can route container logs to Syslog or a third-party module such as Redis, Kafka, or Logstash. Despite the risks specified above, the good old method of application logging, in which an application (Java, PHP, etc.) writes application-specific log messages to a file within a container is still very much in use. Data volumes is another method that allows you to share data between a host machine and a dedicated container that is storing data.

All of these methods require additional setup and can ship only container logs.

The Logz.io approach

This post introduces a new Logz.io Docker image that provides a unified and comprehensive logging solution for Docker environments.

Wrapping docker-loghose and docker-stats, and running as a separate container, this log collector fetches logs and monitors stats from your Docker environment and ships them to the Logz.io ELK Stack.

The log collector ships the following types of messages:

Docker container logs — logs produced by the container themselves, the equivalent for the output of the ‘docker logs’ command) Docker events — Docker daemon “admin” actions (e.g., kill, attach, restart, and die) Docker stats — monitoring statistics for each of the running Docker containers (e.g., CPU, memory, and network)

Note: To follow the procedure below and analyze Docker logs in this manner, you will need to install Docker and create a free Logz.io account (you can do that here).

Running the Docker container

The first step is to pull the Logz.io Docker Image:

$ docker pull logzio/logzio-docker

Next, run the container. The most important parameter in the following command is the token parameter (-t) as this defines the Logz.io endpoint to where you are shipping the data (you can locate your token in the Settings section of the Logz.io user interface):

$ docker run -v /var/run/docker.sock:/var/run/docker.sock logzio/logzio-docker -t UfKqCazQjUYnBNcJqSryIRyDIjExjwIZ

$ docker run -d --restart=always -v /var/run/docker.sock:/var/run/docker.sock logzio/logzio-docker -t UfKqCazQjUYnBNcJqSryIRyDIjExjwIZ

There are several additional options you can use when running the image.

You can select which type of information to ship. By default, the container is configured to send all three types of information specified above. However, you can limit this as follows:

Pass the –no-logs flag if you do not want Docker logs to be shipped Pass the –no-dockerEvents flag if you do not want Docker events to be shipped

You can also create a whitelist or blacklist of containers and images for which you want to ship logs.

If you want to ship the logs of only a specific container/image, add these parameters:

--matchByName REGEX
--matchByImage REGEX

If you would like to refrain from shipping the logs for a specific container/image, add:

--skipByName REGEX
--skipByImage REGEX

There are additional configuration options available, and I recommend you refer to the Docker Hub for more information.

Analyzing the data

Once you run the container, Docker data will begin shipping to the Logz.io ELK Stack.

Access the Logz.io user interface, and open Kibana. If you have other shipping pipelines active and sending logs into Logz.io, the best way to filter the logs is by searching for the three different log types using the OR logical statement:

type:docker-logs OR type:docker-events OR type:docker-stats

Add some fields from the list of fields on the left. This will help you to read the various entries and understand the available information indexed by Elasticsearch.

For example, start by adding the “image,” “name,” and “type” fields.

Expand the entries and take a look at the data as ingested into Elasticsearch. Select the JSON tab to view all of the available data in JSON-format:

{
  "_index": "logz-dkdhmyttiiymjdammbltqliwlylpzwqb-160501_v1",
  "_type": "docker-events",
  "_id": "AVRsEEG-hmInuJ9vaOze",
  "_score": null,
  "_source": {
    "image": "sha256:2359fa12fdedef2af79d9b836a26175808d4b1433b5e7022d2d73c72b2a43b60",
    "action_type": "attach",
    "type": "docker-events",
    "execute": "bash ",
    "tags": [
      "_logz_http_bulk_json_8070"
    ],
    "@timestamp": "2016-05-01T11:20:04.859+00:00",
    "name": "tiny_williams",
    "host": "c9d742de8d0a",
    "from": "linode/lamp",
    "id": "9ee3562bbd6104711b2faf7c588e9127299b9db6a843e50327d99545ec63476a"
  },
  "fields": {
    "@timestamp": [
      1462101604859
    ]
  },
  "highlight": {
    "type": [
      "@kibana-highlighted-field@docker-events@/kibana-highlighted-field@"
    ]
  },
  "sort": [
    1462101604859
  ]
}

Once you have a general idea of what information is available, you can start to think about how to aggregate and visualize the data — which is the next step required to be able to create correlations between the various container messages.

Visualizing the data

Visualizations are one of the most popular features in the entire ELK Stack, and the ability to create graphical depictions of your Docker container data is extremely useful. You could for example, use the docker-stats logs to create a chart of CPU usage over time for each of your running containers. Or, you could create a table listing the five containers that send and receive the biggest amounts of data. The sky’s the limit.

To illustrate this point, we’re going to create a new table that lists the containers that are consuming the most resources.

To do this, first enter a search for the “docker-stats”:

type:docker-stats

Save the new search by clicking the Save Search icon in the top-left corner of the page, and then select the Visualize tab in Kibana.

For the visualization type, select the Data Table, and use your newly-saved search as the source for the new visualization.

Our next step is to configure the various metrics and aggregations for the graph’s X and Y axes.

Using the “docker-stats” search as our data source, we’re going to configure our table columns by defining metrics aggregated by “Sum” for each of the three resource types: network, CPU, and memory.

Next, we’re going to cross-reference this information with the names of the top five containers.

Hit the Play button to see the end result.

You can save the new visualization for future use or create a dashboard with additional visualizations.

A Bonus Docker Dashboard!

To make the deal even sweeter, we at Logz.io have put together a Docker dashboard that contains a number of useful visualizations.

To install the dashboard, select the ELK Apps tab in the Logz.io user interface and search for Docker (ELK Apps is a free gallery of pre-defined and customized Kibana searches, visualizations and dashboards). Click the Install button for the Docker dashboard ELK App.

Logz.io is a predictive, cloud-based log management platform that is built on top of the open-source ELK Stack. Start your free trial today!

//

Logz.io offers enterprise-grade ELK as a service with alerts, unlimited scalability, and collaborative analytics

Start your free trial!

Introducing the Logz.io Docker Log Collector was first posted on May 3, 2016 at 11:10 am.
©2016 "Logz.io". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at shani@anymation.co.il

The Animosity Index: What Big Data Reveals About #TheDonald

$
0
0

There is now little doubt that the 2016 U.S. presidential election campaign has been one of the most controversial and heated campaigns to date. There are a number of reasons for this, including the candidates themselves (some more colorful than others), the rhetoric used by them, and the overall political atmosphere and circumstances in which the campaign is taking place.

With emotions running high, it’s no surprise that passionate political debate is rampant on all social media outlets. Thanks (or no thanks) to the digital social revolution, opinions that used to be bottled up within confined media channels or the four walls of one’s private life are out there for all to see.

Wouldn’t it be intriguing to get insights into this public sentiment?

We thought so. To see the overall trends in what people were saying about the candidates and the issues that matter to them, Logz.io created the the 2016 U.S. Election Real-Time Dashboard by ingesting Twitter data into our machine-learning log analysis platform. The data was ingested into Elasticsearch and then displayed in a Kibana dashboard.

Here is one of the visualizations in the dashboard:

We wanted to provide the world with a unique view of the public sentiment expressed towards the candidates — and some of the most hotly debated topics in the campaign — by analyzing the data streamed from Twitter with the open source technology upon which our platform is based. With visualized trends and sentiment, this dashboard offers a unique view of the public sentiment regarding the candidates and some of the most hotly debated issues in the campaign.

Aggregating more than 4.6 million tweets in one month and then analyzing them to ascertain where tweeps stand is a technological, social, and political exercise that should interest political pundits, the campaigns themselves, and, of course, techies such as ourselves.

After opening, just refresh the window to see updated metrics!

How did we gather the data?

Any Big Data analysis requires a fully secure, scalable, and reliable storage and indexing platform.

Our tool of choice is the ELK Stack (Elasticsearch, Logstash, Kibana) — the world’s most popular open-source log analysis platform. But instead of ingesting log files, we fed the system with tweets using Twitter’s streaming API. On top of the aggregated data, we created a series of graphic visualizations that best depict the Twitter trends.

A more technical description of how we executed the data aggregation and analysis will be explored in a future article.

What data did we analyze?

The graphs in the dashboard depict public sentiment, as expressed on Twitter, regarding the four leading presidential candidates (in alphabetical order): Hillary Clinton, Ted Cruz, Bernie Sanders, and Donald Trump. As we approach the general election and the nominated candidates are officially selected, we will update the dashboard.

The Legend in the dashboard explains the various indexes that we are measuring and how they are being measured. The selected keywords tracked include both the personal Twitter handles of these candidates and their most commonly-used hashtags. We cross-referenced the tweets with additional relevant keywords for each visualization.

For example, for the “Animosity Index,”  we cross-referenced tweets to each of the candidates with the words “f— you” (we’re not kidding!).

The timeframe of the analysis analysis is the prior seven days.

What data are we showing?

So, what information are you actually looking at?

Mentions Over Time — The number of mentions over time The Lying Index — The percentage of times the words “lying” and “liar” are mentioned in conjunction with each candidate The Animosity Index — The percentage of times the words “f— you” are mentioned in conjunction with each candidate Top Election Topics — The percentages of tweets that are talking about a given topic. We selected the top five trending topics according to the Associated Press. The Honesty Index — The percentage of times the words “honest” and “honesty” are mentioned in conjunction with each candidate The Trump Geo Index — The locations of people who are mentioning either Trump’s name or his handle in tweets (due to Twitter API limitations, not all tweets are shown)

We’re always testing and refining our parameters to return the most accurate visualizations possible. Feel free to comment below with feedback on what you would like to see!

Our conclusions

Everyone are entitled to their own interpretations, and hey — we’re not political experts. We just collect and analyze big data. But still, there are some pretty obvious conclusions one can ascertain:

Donald Trump is the candidate most associated with both “honesty” and “lying,” likely meaning that his supporters view him as being the most honest while his opponents think he is a liar. The greatest numbers of tweets are almost always about Donald Trump, meaning that he is the most effective at generating publicity (or trolling, if you prefer). As a result, he also generates the most animosity. The top election issue that is being discussed on Twitter is immigration — which is Trump’s biggest issue — followed closely by the economy. Terrorism is a more-distant third.

What do you see in the data? We invite your comments below. And as the election continues through the ongoing news coverage, the conventions, and the debates, check our real-time dashboard to get more insights into what is occurring.

Editor’s note: Here is our follow-up technical documentation on how we built the dashboard.

Want to analyze and visualize your own data? Logz.io is a predictive, cloud-based log management platform that is built on top of the open-source ELK Stack. Start your free trial today!

//

Logz.io offers enterprise-grade ELK as a service with alerts, unlimited scalability, and collaborative analytics

Start your free trial!

The Animosity Index: What Big Data Reveals About #TheDonald was first posted on May 9, 2016 at 7:36 am.
©2016 "Logz.io". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at shani@anymation.co.il

Analyzing Twitter with the ELK Stack

$
0
0

Yesterday, we shared the Logz.io 2016 U.S. Election Real-Time Dashboard, a series of Kibana visualizations that depict public sentiment towards the presidential candidates and the topics that are being hotly debated in the media. These visualizations were created based on data streamed via the Twitter API and then analyzed by Logz.io — our ELK as a service platform.

It’s no secret that the ELK Stack is the world’s most popular open-source log analysis platform. But a fact that is less well-known is that companies worldwide are using ELK to do a lot more than just log analysis. In fact, not a week goes by without us hearing stories from our customers about new use cases, whether for technical SEO, log-driven development, or business intelligence.

This article describes the technical story of how we created the election dashboard. Specifically, we will show how to ship Twitter data to Logz.io with the Logstash Twitter plugin and then create visualizations in Kibana. Please note that to follow the steps outlined here, you need both a Twitter and a Logz.io account.

Creating a Twitter App

To establish a connection with Twitter and extract data, we will need Twitter API keys. To get your hands on these keys, you will first need to create a Twitter app.

Go to the Twitter apps page, and create a new app. You will need to enter a name, description, and website URL for the app. Don’t worry about the particulars, your entries here will not affect how the data is shipped into Elasticsearch.

Once created, open the app’s Keys and Access Tokens tab, and click the button at the bottom of the page to generate a new access token:

Keep this page open in your browser because we will need the data to set up the feed in Logstash.

Installing Logstash

Our next step is to install Logstash.

Logstash, the “L” in the “ELK Stack,” is used at the beginning of the log pipeline to ingest and collect logs before sending them on to Elasticsearch for indexing. Log analysis the most common use case, but any type of event can be forwarded into Logstash and parsed using plugins.

To install Logstash, first download and install the public signing key:

$ wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

Then, add the repository definition to your /etc/apt/sources.list file:

$ echo "deb http://packages.elastic.co/logstash/2.2/debian stable main" | sudo tee -a /etc/apt/sources.list

Finally, update your system so the repository is ready for use, and install Logstash with:

$ sudo apt-get update && sudo apt-get install logstash
Configuring Logstash

Now that Logstash is installed, we need to configure it to receive input from Twitter and then forward it to the Logz.io-hosted Elasticsearch.

Logstash configuration files are written in JSON-format and reside in /etc/logstash/conf.d. The configuration consists of three sections: inputs, filters, and outputs.

Let’s create a configuration file called ‘twitter.conf’:

$ sudo vi /etc/logstash/conf.d/twitter.conf

First, enter the input:

input {
  twitter {
      consumer_key => "consumer_key"
      consumer_secret => "consumer_secret"
      oauth_token => "access_token"
      oauth_token_secret => "access_token_secret"
      keywords => ["keyword","keyword","keyword"]
      full_tweet => true
  }
}

Be sure to update the consumer_key, consumer_secret, oath_token, and oath_token_secret values with the values from the Twitter Stream App that was created in the first step.

You can choose any keyword you like, but you must maintain this specific syntax. For the U.S Elections dashboard, we used the most commonly used hashtags for the leading candidates in both parties (Clinton, Trump, Sanders, Cruz) as well as their Twitter handles, but you could, of course, enter any keyword that you want to track.

Next, enter a filter as follows (enter your Logz.io token in the relevant placeholder):

filter {
        mutate {
                add_field => { "token" => "<yourToken>" }
        }
}

Last but not least, define the output to the Logz.io ELK Stack as follows:

output {
        tcp {
        host => "listener.logz.io"
        port => 5050
        codec => json_lines
        }
}

If you’re shipping to a local instance of Elasticsearch, your Logstash configuration would look like this:

output {
    elasticsearch {}
}

Once done, restart Logstash:

$ sudo service logstash restart

Data from Twitter should start showing up in the Kibana interface integrated in Logz.io almost immediately.

Logstash configuration options

There are additional configurations that you can apply to Logstash to tweak the Twitter input into Elasticsearch. For example, you can configure Logstash to exclude retweets using ignore_retweets and setting it to true:

ignore_retweets => true

All the various configuration options are available here.

Analyzing Trends

While we can query Elasticsearch as soon as data begins to appear, it’s best to allow the feed from Twitter to run for a day or two to have a larger pool of data from which to pull.

Searching

You can begin to use the Kibana integrated within the Logz.io user interface to search for the data you’re looking for. If you’re tracking public sentiment regarding your company’s brand, for example, you could query the brand name itself and check the correlation with sentimental expressions.

Querying options in Kibana are varied — you can start with a free-text search for a specific string or use a field-level search. Field-level searches allow you to search for specific values within a given field with the following search syntax:

<fieldname>:search

For example, you could search all the ingested tweets for mentions of your brand using the Twitter ‘text’ field (this field represents the actual tweet text):

text:<yourBrand>

Or, you could try something slightly more advanced using logical statements or proximity searches. We cover these search options in this Kibana tutorial.

For the U.S Elections dashboard, we used free-text searches, regular expression searches, and logical statements to make correlations between the various candidates and specific sentiments.

For example, we used the following regular expression query to create the Animosity Index (the number of times that people tweet a certain phrase to each candidate):

text:/.*fuck you.*/
Visualizing

Once you have narrowed the available down to the information that interests you, the next step is to create a graphical depiction of the data so that you can identify trends over time.

As an example, I’ll describe how we created the Mentions Over Time visualization, showing mentions of the four candidates over time.

For this visualization, we used the Line Chart visualization type.

Using the entire data pool as our search base, we configured the following settings:

Y-Axis – Count aggregation X-Axis – Date Histogram using the @timestamp field on an hourly interval, together with a Split Lines with filters (trump, clinton, cruz, sanders)

Another example is creating a map depicting the geographic locations of tweeps.

Using a saved search, open the Tile Map visualization and then select the Geo Coordinates bucket type and ‘coordinates.coordinates’ in the field drop-down:

It’s important to point out that the data ingested into Elasticsearch via the Twitter API is not 100% complete. Some fields have null values and the values of others depend on how the original tweets were composed. In this case, the ‘coordinates.coordinates’ field reflects Twitter users who used Twitter’s location feature.

These are just simple examples of what can be done with your Twitter data in Kibana. We would love to hear in the comments below what you thought of this article and what additional ways you are using the ELK Stack.

Logz.io is a predictive, cloud-based log management platform that is built on top of the open-source ELK Stack. Start your free trial today!

//

Logz.io offers enterprise-grade ELK as a service with alerts, unlimited scalability, and collaborative analytics

Start your free trial!

Analyzing Twitter with the ELK Stack was first posted on May 10, 2016 at 10:54 am.
©2016 "Logz.io". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at shani@anymation.co.il

Network Analysis with Packetbeat and the ELK Stack

$
0
0

Packetbeat is an open-source data shipper and analyzer for network packets that is integrated into the ELK Stack (Elasticsearch, Logstash, and Kibana). A member of Elastic’s family of log shippers (Filebeat, Topbeat, Libbeat, Winlogbeat), Packetbeat provides real-time monitoring metrics on web, database, and other network protocols by monitoring the actual packets being transferred across the wire.

Monitoring data packets with the ELK Stack can help to detect unusual levels of network traffic and unusual packet characteristics, identify packet sources and destinations, search for specific data strings in packets, and create a user-friendly dashboard with insightful statistics. Packet monitoring can compliment other security measures (such as the creation of SIEM dashboards) and help to improve your response times to malicious attacks.

In this article, I will demonstrate most of the above mentioned benefits. Specifically, we will use Packetbeat to monitor the HTTP transactions of an e-commerce web application and analyze the data using the Logz.io cloud-based, enterprise ELK Stack.

Installing and configuring Packetbeat

Our first step is to install and configure Packetbeat (full installation instructions are here):

$ sudo apt-get install libpcap0.8

$ curl -L -O https://download.elastic.co/beats/packetbeat/packetbeat_1.2.2_amd64.deb sudo dpkg -i packetbeat_1.2.2_amd64.deb

Open the configuration file at /etc/packetbeat/packetbeat.yml:

$ sudo vim /etc/packetbeat/packetbeat.yml

The Sniffer section of the configuration file determines which network interface to “sniff” (i.e., monitor). In our case, we’re going to listen to all the messages sent or received by the server:

interfaces:
    device: any

In the Protocols section, we need to configure the ports on which Packetbeat can find each protocol. Usually, the default values in the configuration file will suffice, but if you are using non-standard ports, this is the place to add them.

My e-commerce application is serviced by an Apache web server and a MySQL database, so my protocols are defined as follows:

dns:
  ports: [53]

  include_authorities: true
  include_additionals: true

http:
  ports: [80, 8080, 8081, 5000, 8002]

mysql:
  ports: [3306]

The Output section is the next section we need to configure. Here, you can define the outputs to use to export the data. You can output to Elasticsearch or Logstash, for example, but in our case, we’re going to output to a file:

### File as output
  file:
    path: "/tmp/packetbeat"
    filename: packetbeat
    rotate_every_kb: 10000
    number_of_files: 7

An output configuration to Elasticsearch would look something like this:

output:
    elasticsearch:
         hosts: ["192.168.1.42:9200"]

And last but not least, we’re going to configure the Logging section to define a log file size limit that once reached, will trigger an automatic rotation:

logging:
      files:
           rotateeverybytes: 10485760

Once done, start Packetbeat:

$ sudo /etc/init.d/packetbeat start
Installing and configuring Filebeat

Packetbeat data can be ingested directly into Elasticsearch or forwarded to Logstash before ingestion into Elasticsearch. Since we do not yet have a native log shipper for Packetbeat, we’re going to use Filebeat to input the file exported by Packetbeat into Logz.io.

First, download and install the Public Signing Key:

$ curl https://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -

Then, save the repository definition to /etc/apt/sources.list.d/beats.list:

$ echo "deb https://packages.elastic.co/beats/apt stable main" |  sudo tee -a /etc/apt/sources.list.d/beats.list

Now, update the system and install Filebeat:

$ sudo apt-get update && sudo apt-get install filebeat

The next step is to download a certificate and move it to the correct location, so first run:

$ wget http://raw.githubusercontent.com/cloudflare/cfssl_trust/master/intermediate_ca/COMODORSADomainValidationSecureServerCA.crt

And then:

$ sudo mkdir -p /etc/pki/tls/certs
$ sudo cp COMODORSADomainValidationSecureServerCA.crt /etc/pki/tls/certs/

We now need to configure Filebeat to ship our Packetbeat file into Logz.io.

Open the Filebeat configuration file:

$ sudo vim /etc/filebeat/filebeat.yml
Defining the Filebeat Prospector

Prospectors are where we define the files that we want to tail. You can tail JSON files and simple text files. In our case, we’re going to define the path to our Packetbeat JSON file.

Please note that when harvesting JSON files, you need to add ‘logzio_codec: json’ to the fields object. Also, the fields_under_root property must be set to ‘true’. Be sure to enter your Logz.io token in the necessary namespace.

A complete list of known types is available here, and if your type is not listed there, please let us know.

prospectors:
paths:
        - /tmp/packetbeat/*
      fields:
        logzio_codec: json
        token: UfKqCazQjUYnBN***********************
      fields_under_root: true
      ignore_older: 24h
Defining the Filebeat Output

Outputs are responsible for sending the data in JSON format to Logstash. In the configuration below, the Logz.io Logstash host is already defined along with the location of the certificate that you downloaded earlier and the log rotation setting:

output:
  logstash:
    # The Logstash hosts
    hosts: ["listener.logz.io:5015"]
    tls:
      # List of root certificates for HTTPS server verifications
      Certificate_authorities: ['/etc/pki/tls/certs/COMODORSADomainValidationSecureServerCA.crt']
logging:
  # To enable logging to files, to_files option has to be set to true
  files:
    # Configure log file size limit.
    rotateeverybytes: 10485760 # = 10MB

Like before, be sure to put your Logz.io token in the required fields.

Once done, start Filebeat:

$ sudo service filebeat start
Analyzing the data

To verify the pipeline is up and running, access the Logz.io user interface and open the Kibana tab. After a minute or two, you should see a stream of events coming into the system.

You may be shipping other types of logs into Logz.io, so the best way to filter out the other logs is by first opening one of the messages coming in from Packetbeat and filtering via the ‘source’ field.

The messages list is then filtered to show only the data outputted by Packetbeat:

To help to identify the different types of messages, add the ‘type’ field from the list of available fields on the left. In our case, we can see Apache, MySQL and DNS messages.

I’m going to focus on HTTP traffic by entering the following query:

type:http

Our next step is to visualize the data. To do this, we’re going to save the search and then select the Visualize tab in Kibana.

We’re going to create a new line chart based on the saved search that depicts the amount of HTTP transactions over time.

The specific configuration of this visualization looks like this:

Hit the Play button to see a preview of the visualization:

Save the visualization.

Another way to use Kibana to visualize Packetbeat data is to create a vertical bar chart stacking the different HTTP codes over time.

The specific configuration of this visualization looks like this:

The end result:

As this image shows, this visualization helps to identify traffic peaks in conjunction with HTTP codes.

After saving the visualization, it’s time to create your own personalized dashboard. To do this, select the Dashboard tab, and use the + icon in the top-right corner to add your two visualizations.

If you’re using Logz.io, you can use a ready-made dashboard that will save you the time spent on creating your own set of visualizations.

Select the ELK Apps tab:

ELK Apps are free and pre-made visualizations, searches and dashboards customized for specific log types. (You can see the library directly or learn more about them.) Enter ‘Packetbeat’ in the search field:

Install the HTTP dashboard, and then open it in Kibana:

In just a few seconds, you can have your own network monitoring dashboard up and running, giving you a real-time picture of the packets being transmitted over the wire.

Logz.io is a predictive, cloud-based log management platform that is built on top of the open-source ELK Stack. Start your free trial today!

//

Logz.io offers enterprise-grade ELK as a service with alerts, unlimited scalability, and collaborative analytics

Start your free trial!

Network Analysis with Packetbeat and the ELK Stack was first posted on May 16, 2016 at 11:09 am.
©2016 "Logz.io". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at shani@anymation.co.il

Homebrewed APM with Docker and the ELK Stack

$
0
0

In a previous post, we introduced the Logz.io Docker log collector. This log collector does an excellent job of collecting and shipping container logs, Docker daemon events, and container monitoring metrics to the Logz.io-hosted ELK Stack. But what about the machine hosting Docker? How can one determine the health of that system or monitor what is happening right now?

The Logz.io Docker performance agent is dedicated to monitoring the performance of hosts and can be used in a Docker environment together with the log collector to give you a comprehensive picture of all of the different layers that comprise your Docker environment.

The agent collects performance data using collectl, an open source tool that allows you to monitor various resource utilization metrics such as CPU, disk, memory, and inode use. The data is outputted into a log file, which is then picked up by RSYSLOG and forwarded into the Logz.io ELK Stack (for more information on how this agent works, read this article on monitoring performance with ELK).

This guide will describe how to install the agent and use its shipped data to analyze the performance of a Docker host in the ELK Stack.

Note: To follow the procedures described below, you’ll need Docker installed (preferably with some containers running) and a Logz.io account (create a free one here).

Installing the agent

First, pull the image from the Docker hub:

$ docker pull logzio/logzio-perfagent

Before we run the image, here is a brief explanation of the various environment variables used in the run command — both the mandatory and optional ones.

LOGZ_TOKEN (mandatory). This variable defines the Logz.io account to which the data will be shipped. You can find your token in the Settings section of your Logz.io user interface. USER_TAG (optional). This variable assigns an entered string to the user_tag field.  This can be useful for monitoring a number of Docker hosts and can help when creating visualizations in Kibana. One recommended use-case for this variable is to denote the host role. HOSTNAME (optional). This variable defines the hostname with which to associate the performance data that is sent by the container. This string will be provided in the syslog5424_host field of each entry. INSTANCE (optional). This variable defines the IP address that will be provided in the instance field of each entry.

Now, let’s get down to business. Here is an example of the run command used for running the image:

$ docker run -d --net="host" -e LOGZ_TOKEN="UfKqCazQjUYnBNcJqSryIRyDIjExjwIZ" -e USER_TAG="workers" -e HOSTNAME=`hostname` -e INSTANCE="10.1.2.3" --restart=always logzio/logzio-perfagent
Analyzing the data

After running the image, data should begin to show up in Kibana in a matter of seconds.

Usually, you’ll be shipping a number of other log types into Elasticsearch. To filter out the noise, query Elasticsearch by entering the USER_TAG variable that we used when running the image in the Kibana search field:

user_tag:workers

Now, you can begin to analyze the logs by adding some fields. For example, add the ‘type’, ‘instance’ and ‘mem_used’ fields. This will give you some more insight into the list of logs:

Select one of the entries to view all of the available fields. This will give you a better idea of what data is being shipped into the system and indexed by Elasticsearch.

Visualizing the data

Next, let’s see how to transform the data into a more user-friendly visualization. To do this, first save the search above. The saved search can now be the basis of any visualization or dashboard that you create.

Next, select the Visualize tab. You will get a selection of various visualization types from which you can choose. In this example, we’re going to go for the line-chart visualization type.

What we’re going to visualize is the average CPU usage over time. To do this, the configuration of the X and Y axes is as follows:

Y axis – Aggregation by the average value of the ‘cpu_sys_percent’ field X axis – Aggregation by a date histogram using the ‘@timestamp’ field

Hit the green Play button to see a preview of the visualization:

This is just one example of how to visualize the performance data that is collected by the agent. Read on to learn how to hit the ground running with a ready-made monitoring dashboard.

Installing the Docker Performance Dashboard

Logz.io provides Docker users with a ready-made dashboard for monitoring the performance of the host machine. This dashboard is available in the ELK Apps tab within the Logz.io user interface. ELK Apps is a free collection of pre-made and customized Kibana searches, visualizations and  dashboards for specific log types.

To install the Docker Performance Dashboard, select the ELK Apps tab and search for Docker:

Click the Install button in the performance tab, and the dashboard will be displayed in Kibana:

The dashboard contains the following visualizations:

CPU User Mode % CPU Wait % (Disk IO) CPU Avg. Load 1 Memory Free Net RX Total KB Net TX Total KB CPU System % Total Sockets Used CPU Idle % Disk Total Write KB Disk Total Read KB Memory Used Total Inodes Used User Tags

In just a few seconds, you will have an entire monitoring dashboard up and running that will paint a real-time picture of how your Docker host is performing. As mentioned in the introduction, this agent should be used together with the Docker log collector to get a comprehensive view of your Docker environment.

Logz.io is a predictive, cloud-based log management platform that is built on top of the open-source ELK Stack. Start your free trial today!

//

Logz.io offers enterprise-grade ELK as a service with alerts, unlimited scalability, and collaborative analytics

Start your free trial!

Homebrewed APM with Docker and the ELK Stack was first posted on May 23, 2016 at 11:53 am.
©2016 "Logz.io". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at shani@anymation.co.il

How to Build a PCI-DSS Dashboard with ELK and Wazuh

$
0
0

The Payment Card Industry Data Security Standard (PCI-DSS) is a common proprietary IT compliance standard for organizations that process major credit cards such as Visa and MasterCard. The standard was created to increase control of cardholder data to reduce credit card fraud. The PCI-DSS specifies twelve requirements for compliance (PDF) and explicitly requires that organizations implement log management (requirement 10):

Track and monitor all access to network resources and cardholder data. Logging mechanisms and the ability to track user activities are critical in preventing, detecting and minimizing the impact of a data compromise. The presence of logs in all environments allows thorough tracking, alerting and analysis when something does go wrong. Determining the cause of a compromise is very difficult, if not impossible, without system activity logs.

In this post, we will describe how to build a PCI Compliance dashboard with the ELK (Elasticsearch, Logstash, Kibana) log management stack. We will use several tools including OSSEC Wazuh and demonstrate how to build a PCI-DSS dashboard.

For simplicity, we will use the Logz.io-hosted ELK-as-a-Service, but everything that is described below can be done with any installation of the open source ELK Stack. Two additional methods for integrating ELK using the OSSEC Docker image and Logstash are included at the end of this post.

Deploying OSSEC Wazuh

OSSEC is a scalable, multi-platform, open source/intrusion detection system (HIDS). OSSEC helps to implement PCI-DSS by performing log analysis, checking file integrity, monitoring policy, detecting intrusions, and alerting and responding in real time. It is also commonly used as a log analysis tool that supports the monitoring and analyzing of network activities, web servers, and user authentications. OSSEC is comprised of two components: the central manager component, which receives and monitors the incoming log data, and agents that collect and send information to the central manager.

Wazuh has developed modules for OSSEC integration with log management platforms. To integrate OSSEC HIDS with the ELK Stack, we will create the PCI dashboard with Wazuh HIDS modules because they improve the manager.

In our example below, we used two servers – one for the manager and one for a single agent. For testing purposes, it’s also possible to have these both on the same server.

OSSEC is multi-platform, but for the sake of simplicity we will use Ubuntu Servers (in our example, we used AWS EC2 instances).

Start by downloading the OSSEC Wazuh from GitHub and installing the development tools and compilers.

For Ubuntu the commands are:

sudo apt-get update
sudo apt-get install gcc make git

The following are the commands to download the project from GitHub, compile it and install:

mkdir wazuh_ossec && cd wazuh_ossec
git clone https://github.com/wazuh/ossec-wazuh.git
cd wazuh-_ossec 
sudo ./install.sh

The image below illustrates the Wazuh HIDS installation phase. We want to install the central manager service, and this field in our case will contain the server value.

The other settings are related to other services we want to use. These include services such as emailed notifications and file integrity checks, which monitor files on servers and calculate the checksums on every change of a particular file. This is important to detect unauthorized modification of critical OS and other system files (PCI requirement # 11.5).

Once the installation is done, you can start the OSSEC manager with the command:

sudo /var/ossec/bin/ossec-control start

The following manager commands can validate that everything is working as expected:

$ ps aux | grep ossec

root     1017  0.0  0.1  15024 1524 ?        S    23:01   0:00 /var/ossec/bin/ossec-execd
ossec    1021  0.1  0.4  20248  4236 ?        S    23:01   0:00 /var/ossec/bin/ossec-analysisd
root     1025 0.0  0.1  31604   1624 ?        S    23:01   0:00 /var/ossec/bin/ossec-logcollector
root     1037  0.0  0.1  7380   2732 ?        S    23:01   0:00 /var/ossec/bin/ossec-syscheckd
ossec    1040  0.0  0.1  18312   1708 ?        S    23:01   0:00 /var/ossec/bin/ossec-monitord

$ sudo lsof /var/ossec/logs/alerts/alerts.json
COMMAND     PID  USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
Ossec-ana    1021    ossec    11w    REG    252,1        7388    402765    /var/ossec/logs/alerts/alerts.json

$ sudo cat /var/ossec/logs/alerts/alerts.json
{“rule”: {“level”: 3, “comment”: “Login session opened.”, “sidid”: 5501, “firedtimes”: 2, “groups”: [“pam”, “syslog”, “authentication_success”], “PCI_DSS”:[“10.2.5”]....}}

The next step is to install the agent on our second server. For the Ubuntu Server OS (the Wily version — for other Ubuntu version types, check these other Wazuh repositories), you’ll have to add Ubuntu repositories with the following command:

echo -e "deb http://ossec.wazuh.com/repos/apt/ubuntu wily main" >> /etc/apt/sources.list.d/ossec.list

After adding the repositories, update the repository with the command apt-get update and install the OSSEC agent with the commands:

$ apt-get update 

$ apt-get install ossec-hids-agent

Just before the end of the installation process, the agent will allow you to enter the manager IP address. Enter the manager IP address or just leave it as 127.0.0.1.

To configure the manager and agent on your OSSEC manager server, run the next command:

$ sudo /var/ossec/bin/manage_agents

Select ‘A’ to add an agent:

(A)dd an agent (A).
(E)xtract key for an agent (E).
(L)ist already added agents (L).
(R)emove an agent (R).
(Q)uit.
Choose your action: A,E,L,R or Q: A

You’ll then need to enter agent A’s name, an IP address, and an ID.

In our case, we needed to enter the agent-001 (note that the ID will be generated for you if you leave that field empty).

The next step is to extract the agent’s key because this is the key we will use to import data from the agent. Now, instead of using the “A” option, we will type “E” and then enter the agent ID.

Note: The configuration file /var/ossec/etc/ossec.conf contains the section client, where you can also type the manager’s IP address (if not done previously):

<ossec_config>
  <client>
    <server-ip>MANAGER_IP_ADDRESS</server-ip>
  </client>

On the agent server, run the program /var/ossec/bin/manage_agents and use option “I” to import the key from the agent’s server (the one we want to monitor):

 (I)mport key from the server (I).
   (Q)uit.
Choose your action: I or Q: I

* Provide the Key generated by the server.
* The best approach is to cut and paste it.
*** OBS: Do not include spaces or new lines.

Paste it here (or '\q' to quit): <KEY FROM SERVER>

Agent information:
   ID:<ID>
   Name:<agent name>
   IP Address:<agent IP address>

Confirm adding it?(y/n): y

Your agent has now been added, and you should use the following command to restart it to apply the changes:

$ /var/ossec/bin/ossec-control restart
Integrating Wazuh with the ELK Stack

Now that you have Wazuh installed, the next step is to integrate it with the ELK Stack hosted by Logz.io (see the bonus section at the end of the article for different methods of integrating these two stacks). If you do not have a Logz.io account, you can begin your free trial here.

S3 Syncing and Shipping

To ingest your Wazuh data into the Logz.io ELK Stack, you can use the Amazon S3 storage service. Sync the OSSEC folder containing your logs with a specific S3 bucket (which we named ossec-logs, in this example) and then establishing a shipping pipeline from S3 into Logz.io.

Use the AWS CLI sync command to copy all new and updated files consistently from the OSSEC server to a specified S3 bucket. On the OSSEC manager server, we created a job that executes the following command:

$ aws s3 sync /var/ossec/logs/alerts/ s3://ossec-logs/$(date +"%Y-%m-%d-%H")

The $(date + “%Y-%m-%d-%H) portion of the command helps to group the logs hourly on S3.

To ship the data from S3 to Logz.io, select the Log Shipping tab located at the top of the Logz.io dashboard, expand the AWS item on the right menu of the Log Shipping page, and click S3 Bucket.

As shown above, enter the S3 bucket name and the S3 access secret keys. You can leave the prefix empty; Logz.io will find all sub-groups inside this bucket for you.

Next, select the region where the bucket is located, and in the last step, select other as the log types, and enter the json in the field that appears after selecting other from the drop-down box. This ensures that Logz.io parses the json correctly. Click Save, and you will be ready to ship the logs from S3.

Creating a PCI Compliance Dashboard

Once your pipeline is established, you can begin to analyze the logs in Kibana. Your next step to create a visual dashboard that will contain relevant PCI compliance visualizations for identifying trends and correlations within the data.

As an example, we’re going to create a line chart depicting PCI requirements over time.

Open the Visualize tab in Kibana and select the Line Chart visualization tab. Configure the X axis of the visualization to aggregate by time and use the field rule.PCI_DSS as a sub-aggregation. The configurations should look as follows:

Hit the Play button to view a preview of the visualization.

This is one example of visualizing Wazuh data that is being ingested into Elasticsearch. We created a PCI Compliance dashboard that contains a series of relevant PCI compliance visualizations that are all available in the ELK Apps gallery — our library of pre-made Kibana visualizations, dashboards, and searches that are customized for specific types of data.

To install the dashboard, open the ELK Apps tab and search for “PCI Compliance.” All you have to do next is hit the Install button.

In addition to the PCI Requirements Over Time visualization described above, here’s a list of the other available visualizations in this dashboard:

PCI-DSS Requirements – the time distribution of PCI-DSS requirement 6.1 OSSEC Alerts – alerts triggered by the OSSEC manager File Integrity – two visualizations that show the number of file changes on the host and a list of files with details on checksums and relevant PCI-DSS requirements High Risk Alerts – the number of high risk alerts over time (the high risk alerts for the current configuration of Wazuh OSSEC are those that have an AlertLevel greater than 10) High Risk Traffic Meter – a general indicator on high risk alerts in your environment

Each OSSEC alert log stored in Elasticsearch is tagged with PCI requirements for where it belongs. As a result, it is possible to track logs based on their PCI requirements number — and that gives you a better picture about the state of the system over time.

Using Wazuh’s PCI Dashboard

Wazuh also provide an easy way of adding a PCI dashboard to Kibana.

In the Objects section of the Kibana Settings, click the Import button to load the dashboard. After clicking the Import button, select the file and then refresh the Kibana page to see the imported dashboards:

Now, you can go back to the Dashboard section of Kibana and select the PCI Compliance dashboard. To get a list of loaded dashboards, click the Open icon to go to the dashboard:

PCI-DSS Dashboard for AWS

Amazon physical infrastructure already complies with PCI-DSS; however, you are still responsible for implementing the monitoring and detection of your environment for security misconfiguration and vulnerabilities. To have that, you need to send logs from your account and network security activities. In this section, we will guide you on how to create a few alerts and build a dashboard.

Note: Information on enabling different kind of logs such as CloudTrail and VPC can be found in a prior guide of ours on building a SIEM dashboard for your AWS environment.

In real-life scenarios, it is recommended that you install a vulnerability scanner. For example, rootsh and Snort will prevent intrusions and give you better insights into overall network traffic as well as the intentions of any attackers.

In this example, we will start by creating automated alerts when log data from CloudTrail or VPC Flow logs reveals that specific, defined events are occurring. In this example, we will create two types of alerts: one that uses CloudTrail logs to detect multiple failures when attempting to log into an AWS console, and one that users VPC flow logs to check connection destination ports.

The first alert satisfies the aforementioned PCI-DSS compliance requirement #10 to “track and monitor all access to network resources and cardholder data.” It will track user Login failure from CloudTrail.

The query responseElements.ConsoleLogin:”Failure” is used for filtering logs that contain information on failed logins due to authentication failure:

The next alert is the one tagged with PCI requirements #1 and #5. In this case, we created the alert that will check the VPC destination port.

The query in the alert for requirement #5 will check the VPC flow logs and — for server 10.0.0.2 — try to determine whether there some open ports that are not 80 (http):

After we created a few charts using existing logs to monitor for login failures, vulnerabilities, and account changes, we can put all of the widgets into a single dashboard:

A Final Note

We hope that this guide will help you to take the next step in the implementation of PCI-DSS compliance, and we invite you to get in touch with us for any specific questions or support.

Logz.io is a predictive, cloud-based log management platform that is built on top of the open-source ELK Stack. Start your free trial today!

Bonus 1: Using the OSSEC Docker

Another way of integrating Wazuh with the ELK Stack is using the OSSEC Docker image.

The ossec-elk container includes the OSSEC manager and the ELK stack.

To install the ossec-elk container, run the following command:

$ docker run -d -p 55000:55000 -p 1514:1514/udp -p 1515:1515 -p 514:514/udp -p 5601:5601 -v ~/mnt/ossec:/var/ossec/data --name ossec wazuh/ossec-elk

Mounting the /var/ossec/data directory allows the container to be replaced without any configuration or data loss.

The next step is to install the agent as already explained in the sections above. Once done, you will need to add the new agent using the following command:

$ docker exec -it ossec /var/ossec/bin/manage_agent

After adding the new agent, restart the OSSEC manager with the following command:

$ docker exec -it ossec /var/ossec/bin/ossec-control restart

When the manager and agent are ready, we can access Kibana using the URL: http://<your_docker_ip>:5601 to create the dashboard.

Kibana will ask you to configure the index pattern. Kibana will do this automatically, but we need to point to the correct index pattern. Select the “Use event times to create index names” option, and then select the Daily value (as illustrated in the screen below) in the index pattern interval. For the index name or pattern, enter [ossec-]YYYY.MM.DD. By default, Kibana will select the @timestamp value in the Time-field name box. To finish, click Create.

In the Discover section, you will see the log events that the OSSEC manager has received and processed:

Bonus 2: Using Logstash

One final way to forward Wazuh data into Logz.io or your own ELK Stack is with Logstash.

Depending on whether your architecture is single-hosted or distributed, we will configure the Logstash server to read OSSEC alerts directly from the OSSEC log file.

First we will install Logstash on the OSSEC manager server and then configure Logstash to ship the log data to Logz.io. As illustrated in the screen below, we have to use mutate in the filter plugin and add_field with the token:

The final Logstash configuration for shipping OSSEC alerts to Logz.io is as follows:

input {
  file {
    type => "ossec-alerts"
    path => "/var/ossec/logs/alerts/alerts.json"
    codec => "json"
  }
}
filter {
    geoip {
      source => "srcip"
      target => "geoip"
      database => "/etc/logstash/GeoLiteCity.dat"
      add_field => [ "[geoip][location]", "%{[geoip][longitude]}" ]
      add_field => [ "[geoip][location]", "%{[geoip][latitude]}"  ]
    }
    date {
        match => ["timestamp", "YYYY MMM dd HH:mm:ss"]
        target => "@timestamp"
    }
    mutate {
      convert => [ "[geoip][location]", "float"]
      rename => [ "hostname", "AgentName" ]
      rename => [ "geoip", "GeoLocation" ]
      rename => [ "file", "AlertsFile" ]
      rename => [ "agentip", "AgentIP" ]
      rename => [ "[rule][comment]", "[rule][description]" ]
      rename => [ "[rule][level]", "[rule][AlertLevel]" ]
      remove_field => [ "timestamp" ]
      add_field => { "token" => <TOKEN> }
    }
}
output {
tcp {
    host => "listener.logz.io"
    port => 5050
    codec => json_lines
}
}

This configuration will send logs from the OSSEC alert file to the Logz.io service, which automatically stores the logs in Elasticsearch with some renamed fields in order to fit with the dashboard that Wazuh created for Kibana. The important piece of the Logstash configuration is the addition of the token field. Without that, Logz.io will discard the logs.

//

Logz.io offers enterprise-grade ELK as a service with alerts, unlimited scalability, and collaborative analytics

Start your free trial!

How to Build a PCI-DSS Dashboard with ELK and Wazuh was first posted on May 24, 2016 at 2:22 pm.
©2016 "Logz.io". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at shani@anymation.co.il

10 Resources You Should Bookmark If You Run Your Own ELK Stack

$
0
0

Sometimes it’s hard to see the forest from the trees. As a newbie in the world of the ELK Stack, you may be feeling overwhelmed by the number of articles and tutorials on the Web. But even more grown ELKs have been known to wander lost among the trees.

At Logz.io, we’ve compiled a number of ELK guides to help beginners to hit the ground running, but there are some additional excellent resources out there containing extremely valuable information. This article lists ten resources that we believe you must read if you’re doing ELK. We’ve listed public sites, company blogs, and personal sites. We hope you find it helpful!

1. Elastic

The Elastic site has the content one would expect from the commercial entity behind the the ELK Stack. From getting started guides explaining how to install and setup each of the stack’s components, to videos, user stories and forums — the Elastic site must be among the first sources of information to explore when starting out with the stack. Familiarize yourself with the various pages, your logs depend on it!

2. EagerElk

EagerElk is an excellent blog containing information for slightly more advanced users. Written by Jurgens du Toit, this blog details the author’s lessons learned from years of experience working with Elasticsearch, Logstash, and Kibana. The blog also contains a series of white papers and a list of links leading to additional sources of information. Jurgens also contributed an Elasticsearch tutorial, Logstash tutorial, and a guide to Elasticsearch queries to the Logz.io Blog.

3. Technology Explained

This personal blog by Alexandre Lourenco contains some extremely useful posts on Elasticsearch but also shows how to set up a centralized logging system with the ELK Stack. Some of the commands need to be updated to reflect the latest versions of the stack, but as a whole, the articles are well worth the extra copy-paste needed to complete the steps.

4. Tim Roes’ Blog

Tim Roes has put together an excellent compilation of articles describing how to work with Kibana — the pretty face of the ELK Stack. His four-part Kibana tutorial here provides a good way to get acquainted with the basics, and the more advanced “Writing Kibana Plugins” series take it up a notch by explaining how to customize Kibana for your own environment.

5. StackOverflow

Much has been written on the wealth of technical ELK information to be found in StackOverflow, and yet still it seems people underestimate this treasure trunk. As you venture into the world of ELK, you will undoubtedly encounter problems and issues. But do not despair! Remember, hundreds of thousands are downloading ELK a year, and there is a huge chance someone else has encountered the same issue that you’re experiencing. And if not, a dedicated community of ELKers will be quick to answer any new question.

6. DigitalOcean

DigitalOcean provides excellent step-by-step tutorials that often rank among the first results when searching for an ELK related topic on Google. Especially useful are the installation tutorials, which specify the exact requirements you need to set up the entire ELK Stack together with guidelines to shipping specific types of logs. Here for example, is a tutorial describing how to install the stack on Ubuntu 14.04 with Nginx. While a touch too lengthy, these how-tos are extremely detailed and cover all the bases.

7. DZone

DZone contains a large amount of technology-related articles that cover a wide array of topics. An ELK-related article written by experienced ELKers is published almost every week, that taken together comprise a solid database of articles to learn from. The article are a bit scattered, so you will need to search for the specific topic in which you’re interested.

8. Reddit

Reddit is one of my favorite sources of information on ELK-related topics, if not the most useful. To stay up-to-date with the latest developments in the world of ELK, I recommend subscribing to most if not all of the following subreddit groups: Elasticsearch, Elastic, ELK Stack, Kibana, and Logstash (the last two are not very active but are worth monitoring just in case.)

9. Github

While not the most obvious place to look for information about how to use the ELK Stack, the Elasticsearch, Logstash and Kibana repos on GitHub can provide information on open issues and their status. Of course, when you become a fully grown ELKer, this is the place to contribute back to the community. If you have a few minutes to spare, it’s always a nice exercise to measure the strength of a community by checking out GitHub graphs!

10. Logz.io

Ah, yes. We couldn’t let this one pass. As mentioned above, we’ve put a lot of effort in putting together some in-depth resources on using the ELK Stack our readers enjoy reading. For those beginning their long and winding ELK path, it’s worth highlighting The Definitive Guide to AWS Log Analytics Using ELK and The Complete Guide to the ELK Stack — the latter of which is a compilation of getting started and best practice articles written by six different writers. And there are some other interesting posts I’ll let you discover on your own.

This list is by no means complete. We know there are other resources out there, so please feel free to comment below so our next article in the series will be able to list the top 50 resources!

//

Logz.io offers enterprise-grade ELK as a service with alerts, unlimited scalability, and collaborative analytics

Start your free trial!

10 Resources You Should Bookmark If You Run Your Own ELK Stack was first posted on May 25, 2016 at 12:04 pm.
©2016 "Logz.io". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at shani@anymation.co.il

Proactive Log Analysis with Logz.io

$
0
0

The DevOps Toolbox – Open Source Log Analytics

$
0
0

Logz.io cofounders Tomer Levy and Asaf Yigal discuss how to do log analytics with the ELK stack at a DevOps meetup held by Akamai Technologies.

Log Analysis with Logz.io

$
0
0

Take a deeper look into Logz.io in this short demo of how to perform end-to-end log analyze with visualizations, alerts and more.

Working with ELK Apps in Logz.io

$
0
0
Viewing all 198 articles
Browse latest View live