In previous blog posts, my colleagues have introduced Prometheus and explained in detail how it works. For those who did not read those posts; Prometheus is basically pull-style monitoring. You have a service which sends requests for metrics and target nodes respond with them via http/s protocols. When using Prometheus, you get to a point when you wonder if there is a way to have redundant metric collecting. Since there is a thing called Federation you might think that this is the redundancy that you want.
Our task: Let’s suppose we have a set of 100,000 files placed in 100,000 paths. We need to know the size of each and then make a list of the ones larger than n megabytes with full paths while not spending ages on it. Simple methods like bash’s find and grep are too slow, so in this article we will talk about how we can use python multiprocessing library for our files.
Sometimes we are all in need of doing some quick and basic setup to monitor our key services. In these cases, this super simple cheatsheet comes into play. This exact guideline is meant to be used for several node Elasticsearch clusters but can also be used to monitor almost everything – just replace ES exporter with anything that suits your own taste and set up a nice informative dashboard. Setting up Node Exporters.
If we are working with monitoring systems, we usually want to know if we have some unusual behavior in our graphs, for example if disk I/O graph is briefly increased. This behavior is called spikes. But how can we catch the spikes correctly if we use Prometheus in our infrastructure? Prometheus is a TSDB (time series database), it can export data to monitoring systems such as Grafana. Prometheus has 4 types of metrics:
Elasticsearch is the name of a full-text search engine in computer science, distributed for free under the Apache license. It has a RESTful interface and offers high availability, speed, and scalability. It is developed in Java and can be communicated with via the web interface. Elasticsearch is a schematic database, therefore it is not necessary to define the database structure because it is created based on embedded data. It can be included on the list of NoSQL databases.
Why Prometheus? Prometheus is an open-source system monitoring and alerting toolkit originally built at SoundCloud. Since its release in 2012, many companies and organizations have adopted Prometheus, and the project has a very active community. It is developed as an open project, independent of any company or organization.“ It is based on metrics and is designed to measure and visualise the overall health and performance of services, it is similar to tools like Graphana/Graphite, but offering a more robust and comprehensive feature set, including:
In our infrastructure we manage mainly Linux hosts, but there are also a few Windows servers that meet clients’ requirements. The best way to manage cloud infrastructure is by automation, using Puppet or Ansible for example. Unfortunately, it is only effective with a vast amount of hosts with similar features. We decided to manage all Windows hosts manually because in this case automation processes (Puppet, Ansible) would be more time consuming.