The Cloudinfrastack Blog

2020.01.17  

Our task: Let’s suppose we have a set of 100,000 files placed in 100,000 paths. We need to know the size of each and then make a list of the ones larger than n megabytes with full paths while not spending ages on it. Simple methods like bash’s find and grep are too slow, so in this article we will talk about how we can use python multiprocessing library for our files.

list_image
2019.12.04  

Sometimes we are all in need of doing some quick and basic setup to monitor our key services. In these cases, this super simple cheatsheet comes into play. This exact guideline is meant to be used for several node Elasticsearch clusters but can also be used to monitor almost everything – just replace ES exporter with anything that suits your own taste and set up a nice informative dashboard. Setting up Node Exporters.

list_image
2019.11.04  

Cloudinfrastack has been challenged with designing CDN, storing 200M images and serving them to web with high performance caching. In this talk we will present complete solution, consisting of Ceph object storage gateway RadosGW, frontend image serving and on-the-fly resizing, and essential tips for designing, running and optimizing such demanding task with Ceph. Record from Cloud and DevOps Meetup on October 16th.

2019.11.04  

This technical talk covers aspects of designing a data-center fabric, creating it in the virtual space for testing and verification including an open-source tool based roll-out. Andreas is System Engineer with more than 20 years of experience. Andreas is currently working as Pre-Sales System Engineer at Cumulus Networks. Record from Cloud and DevOps Meetup on October 16th.

2019.11.04  

In the past years APIs became an essential part of modern web applications. API incorporated most of the stuff historically done on the server side - authentication and authorization, combining data sources, calling 3rd party services etc. This puts a lot of pressure on API stability and scalability, yet we still need a sustainable pace for feature development. Karel is a cloud enthusiast and has a long term experience with architecture on both AWS and Google Cloud Platform.

2019.10.31  

If we are working with monitoring systems, we usually want to know if we have some unusual behavior in our graphs, for example if disk I/O graph is briefly increased. This behavior is called spikes. But how can we catch the spikes correctly if we use Prometheus in our infrastructure? Prometheus is a TSDB (time series database), it can export data to monitoring systems such as Grafana. Prometheus has 4 types of metrics:

list_image
2019.07.30  

When it comes to cloud storage data, most users and companies use google drive platform. It is a good place to save your files into an online storage and therefore be able to reach your files from anywhere. Of course, even google needs to sustain its services, so some restrictions like a limited cloud space are common. When you require more practical use in your business you will reach the point of having two options.

list_image
2019.07.22  

Elasticsearch is the name of a full-text search engine in computer science, distributed for free under the Apache license. It has a RESTful interface and offers high availability, speed, and scalability. It is developed in Java and can be communicated with via the web interface. Elasticsearch is a schematic database, therefore it is not necessary to define the database structure because it is created based on embedded data. It can be included on the list of NoSQL databases.

2019.07.15  

On hot summer days when the heat is in the air, my mind starts to think about vacation and the time passing by, but business never stops and it’s nice to have all things nicely prepared before you leave the office. Especially when you can use OpenStack instrument called Heat. So, let’s take look at it a bit. Heat is a very useful orchestration tool for OpenStack users as it provides a way to automate the process of cloud components creations.

list_image
2019.06.06  

What is meant by “infrastructure as code” Infrastructure as code is a way to maintain infrastructure by automated processes and minimize human effort needed to configure anything from physical baremetals to services running on many virtual hosts. There are 2 ways to do this. The first way is to have an automating software running on every host and pulling configuration from servers which have all configuration ‘recipes’. The second way is that configuration servers push configuration to host (insecure – configuration server needs to have access to all servers).

2019.05.31  

Since our infrastructure is powered by Openstack, Cinder takes care of exposing our block devices to virtual machines. And because we value open source software, we use Ceph as the storage backend (as well as LVM in certain setups). In today’s article, I will show you the overview of Ceph architecture, pinpoint its advantages and disadvantages, and show you a demo of Ceph snapshotting to demonstrate its power and the ease of administration.

list_image
2019.05.03  

Let’s say you have just finished installing Prometheus, full of enthusiasm you want to take another step, create the structure of exporters and sort out from which exact services you want to harvest metrics. If you use it on a small scale, source code control is not your biggest concern, but when you want to collect metrics from your whole infrastructure, you definitely want to know the binaries you are running.

list_image
2019.04.24  

Golang, as a very ops/admin focused language, has a huge community and thus a lot of useful packages that can help us in the everyday development regarding monitoring, graphing, and automatization. I’m going to demonstrate a few that I use in most of my programs, either as a substitution of the default package with a similar functionality or a totally new functionality that I consider a core need of the modern ops/admin tool development.

list_image
2019.03.29  

Why Prometheus? Prometheus is an open-source system monitoring and alerting toolkit originally built at SoundCloud. Since its release in 2012, many companies and organizations have adopted Prometheus, and the project has a very active community. It is developed as an open project, independent of any company or organization.“ It is based on metrics and is designed to measure and visualise the overall health and performance of services, it is similar to tools like Graphana/Graphite, but offering a more robust and comprehensive feature set, including:

list_image
2019.03.29  

Our Infrastructure We are currently managing over 2000 virtual machines, hundreds of bare metals, tens of services, and tens of user accounts. You can imagine how difficult it was to add or change existing users (change permission, access, ssh keys, and so on). The Pains of Locally Managed Users Previously, our users were deployed only with puppet, which is great, however, searching for users in different git repositories, different branches, wasn’t the right way.

list_image
2019.03.29  

In our infrastructure we manage mainly Linux hosts, but there are also a few Windows servers that meet clients’ requirements. The best way to manage cloud infrastructure is by automation, using Puppet or Ansible for example. Unfortunately, it is only effective with a vast amount of hosts with similar features. We decided to manage all Windows hosts manually because in this case automation processes (Puppet, Ansible) would be more time consuming.

list_image
2019.03.20  

Our transition from lower Puppet version to Puppet 5 was (and still is) somewhat tricky. For us - managing more than 2000 hosts with puppet - this task is really time consuming. Every host must be switched manually to make sure no critical changes will apply. Luckily, there are some steps that can be done to simplify this task, some of which I’m going to explain. This will allow you to switch to higher version of Puppet without losing your precious data, and allow you to use other Puppet 5 features in the future.

list_image