Monitoring System: What Can We Do
5In the last article „Monitoring System: How I Found Out We Need It“ I’ve told you how we got to the first monitoring system, and why you should get it. Today, I will do it the opposite way and instead of theory, I will go straight to what our most productive “employee“ can do.
The primary function, as the name implies, is monitoring. We strive to monitor everything that is important to us and helps us with the prevention or timely detection of errors. The list of checked items is on our website (only in Czech). But there is never enough self-praise 😀 so I will share it here as well:
- Servers: availability, traffic (CPU, HDD, RAM) update status, hardware health (CPU, RAM, PSU, HDD, RAID, FANs, temperatures), event logs, antivirus updates, successful backup rate, unsuccessful logins, disk space, service operations [including advanced AD DS, MS Exchange, MS SQL, DFSR], warranty expirations, certificate validity, various minor settings (VSS, FW, WPAD, UAC, accuracy of hours …).
- Stations: availability, antivirus updates, update status, disc space, SMART disc, BSOD occurrence, unsuccessful logins, presence of unauthorized SW, occurrence in ERA and TeamViewer console, executing restarts and various minor settings (FW, UAC, RemoteUAC, local users, VSS, checked items after launch, turned off guest account…).
- Storage: availability, remaining space, FW version, availability of updates, hardware health (CPU, FANs, temperatures, RAID, PSU, RAM).
- UPS: availability, battery status (age, load, remaining runtime, temperature), power status, FW version.
- Switches, routers, Wifi AP, cameras: availability, FW version
- Detection of unknown devices on the network: scans all networks (ARP scan) and compares detected devices with the database – scan of unknown devices via nmap and handing over for a manual scan. We thus have an overview of what is connected to the network, and whether someone had not connected a home PC or an unauthorized Wifi AP.
- Monitoring itself: based on the rules (our standards), it knows what is to be checked and reported on every device if it finds inconsistency. E.g. somebody will connect to the NAS, but forgets to set checkup in the monitoring system.
If we had to check everything manually, it would take us a month or so before we could go through it once. The monitoring system can handle it every 5 minutes.
Why is monitoring important?
In my case, for the peace of mind, that everything is all right. And knowing that the customer is getting the best service we can provide.
You can object, that if server or router fails, you’ll find out even without the monitoring system because the customer will call you immediately with the network not working. But there are many things that will not happen right away, but if they are not fixed in time, these can have serious consequences. For example:
- One disc in RAID1 or RAID5 fails. If it is not repaired by the time the next disk fails, business stops for a day and X hours of work will be lost.
- Backups go corrupt and it is found out only as something needs to be restored. The customer will be just angry, in the best case scenario.
- Service on a server stops (for example, a measuring service) and it is found out at the end of the month when the data needs to be checked, but there is none.
It´s written everywhere, that patching (updating) is the very basis. I agree with this and my colleagues are trying to keep everything up-to-date. Even if Microsoft does not help with the occasional poorly released updates.
We have used WSUS previously, but it did not fit our workflow. We have dozens of separate companies and we did not want to have our own WSUS in each company. In addition, it is HW-intensive, does not have centralized management (over multiple instances), and has limited support for third-party application patching.
Our monitoring system has integrated patch management. Everything is in one console (learning is faster, everything is in one place, and fewer “agents” are needed on stations), we have a local cache for customers (so that 100 computers do not download the same update separately) and we also do software updates from third-party manufacturers other than Microsoft (overview of supported SW).
We are not able to automatically patch network devices just yet. Meanwhile, through the monitoring system, we have their inventory (device type, manufacturer, model, and FW version), and we are working this quarter on a module that will alert us which devices need updating when the new firmware comes out.
We also manage asset management through the monitoring system. The system keeps track of servers, computers, laptops (including configuration) and network devices (routers, switches, cameras, phones, Wifi APs, and other boxes). It’s great to have an overview and it does not cost us any extra time or work.
Occasionally, the customer wants to know what computer “park” it has. You want to find out what are the oldest machines or are just looking for a PC with certain software installed.
Sometimes I see some sort of inventory running somewhere in Excel. But it has the disadvantage that it is exceedingly difficult to keep it up to date. In addition, I’m not sure whether the customer would like to pay an invoice with item – “device list update”. It is regularly read from the device itself, so it’s up-to-date and accurate. 🙂
What is our monitoring system built from
To not to boast with the work of others, I have to admit that our monitoring system is built on SolarWinds RMM, which you can purchase yourself. Either directly from SolarWinds or through PB Com.
We have been using the system since 2013 and we are among the 3 largest users in the Czech and Slovak republic. During this time, we have gained a lot of experience with the system and we have significantly modified the system. We basically use it just as a “chassis” to run checks on the devices, and we did the rest ourselves. Now it’s 40% SolarWinds and 60% PATRON-IT. 😀
If you do not use any monitoring system yet, surely SolarWinds RMM is a good start. It is only necessary to prepare yourself for it being a tool rather than a solution. I think installation is just the beginning. Then you need to identify what you want to monitor, how you want to monitor it, how the system should alert you. Then you will probably find things you do not actually need to monitor. You will then begin to discover errors in the system (about which anyone using this tool might tell you: FB SolarWinds group – Best Practices, LinkedIn Closed Group). It takes a lot of time, but if you struggle through, you will have a great little helper.
Or if you do not have the time to set it all by yourself, try letting us now. We, at PATRON-IT, cooperate with internal IT departments to take care of their business environment together. We’ve found out that we can do better together than each on our own. We have the knowledge and experience in the field of monitoring, security, networks, and servers (it is our day-to-day activity), and internal IT knows business processes, information systems (which we never can and they are closest to users). We can thus create the environment where everything is in order, there are no outages, everything runs fast and users and management are satisfied with IT.
Before we got the system to its current state, it took us years of work and gradual adjustments. Just a long evolution, no revolution. There is a pile of ideas for further development, but it is a shame that as big as it is, it’s not as fast as it was at the beginning. Every new technology must be deployed to all customers and takes a while before it is implemented.
How did you like the article? Do you have any insights and ideas? Did I make a mistake, or do you disagree with me? Leave me a comment, or send me an email, I would like to learn something new.
EDIT 13.12.2018: We’ve created a page where we’ve described how the monitoring system will help your work and how to test it for free. We have also prepared a live production environment video clip (so you can see how the system works in practice). Please have a look and share your feedback. 🙂