How Difficult is to Detect a Data Leak
Today, I would like to open a topic with you about how to detect data leak. At the end of September, cybercriminals successfully attacked Universal Health Services (UHS). It is a big issue because UHS is not just an ordinary company. It is the operator of more than 400 medical facilities! Take what the situation looked like here in the Czech Republic, when individual hospitals were attacked – fortunately in a time interval. And now imagine what it must look like when over 400 such medical facilities are shut down at the same time.
Meantime, it is speculated that the attack could have killed four people. Who should die due to doctors not receiving the results from the laboratories in time. 🙁 You can read more in the article “UHS hospitals hit by reported country-wide Ryuk ransomware attack. “
After the attack, UHS began to disconnect its networks from the internet and turn off each device. At the same time, UHS issued a statement stating, “No patient or employee data appears to have been accessed, copied, or misused. “And this brings us to the topic of today’s article.
Officially, no data will leak
When some attacked company comes up with a quick statement, it always surprises me that no personal data has been affected (read, stolen, abused). From the nature of the ransomware attack results, that if somebody encrypted data, hackers (ransomware) had to have access to it (i.e., “data has been accessed “).
Of course, I can imagine a case where only a part of the serves (e.g., containing applications) is encrypted; however, database servers (including data) survive. However, this will be more of an exception that proves the rule.
Simultaneously, these attacks, which end with the complete shutdown of the encrypted company, the hackers are on the network for some time. Sometimes it is hours, other days, and sometimes weeks or months. Those companies do not notice the presence of hackers or their activities. They detect hackers only when their systems stop working. Thus, it is pretty strange that they are so sure that no data has been leaked in such a short time.
There is an increasing group of cybercriminals that steal a part of the data from the victims, so they can blackmail them by publishing them. For example, the following groups do this: Ako, Avaddon, Clop, CryLock, DoppelPaymer, Maze, MountLocker, Nemty, Nephilim, Netwalker, Pysa/Mespinoza, Ragnar Locker, REvil, Sekhmet, Snatch, and Snake.
And last but not least, to prove/find out if you miss any of your data is not easy at all. Even if you search for a week and find nothing, it is not proof that it did not leak. Maybe you just didn’t search enough.😊
Now let’s think about how we can detect data leaks in our networks.
Amount of transferred data
The first place where I can look at is the use of the internet. Almost everyone has such a chart. At worst, an ISP (Internet Service Provider) could provide it.
I would expect hackers to steal data at night, when the rest of the devices do nothing and so everything would be visible in such a graph. One actual diagram is shown below, and the question is. Do you think that there was a data leak or not?
As far as I know, there was no data leak. At the same time, the graphs show that more data left the network during the night than during the day.😊 This brings us to the weaknesses of this method.
The device never sleeps
Unlike humans, network devices never sleep. They are constantly exchanging some data with the world. Especially during the night, many devices and programs are backed up to the clouds or download updates.
The medium-sized network contains hundreds of computers, mobile devices, tens of servers, printers, camera systems, VPN accesses, publicly available services, sometimes some loT and network elements (switches, routers, Wifi AP). That’s a lot of communication. At the same time, it is not a rule that hackers work only at night.😉
Unless you have a tiny or very strictly set network that you know well, such a graph will probably not tell you much. And even if it said, you still do not see what data, from who and where data leaked.
How much leaked data is the problem?
Another question to think about is the following. What do you think is worse: When 5MB or 2000MB data leaks? Yes, correct, it does not matter the size of the data but its content. A database containing the names, addresses, and telephone numbers of HIV-positive people has only a few megabytes. In contrast, photos from a corporate party have hundreds of megabytes. Although the leak of photos will involve a more significant amount of data, the leakage’s relevance will probably be more negligible.
When sensitive data can be only a few megabytes, it is practically unrealistic to detect its leakage from a transmitted data graph.
Data leaks also from the cloud
Almost all the companies, we manage, use some cloud storage. Whether it is Dropbox, Sharepoint, OneDrive, Exhange Online, or some cloud storage for backups.
If data leaks from these storages, they will not be visible in the graph of transferred data.😏 Another advantage for attackers is the speed of data downloading from these storages. The services are usually located in data centers with gigabit connectivity; thus, there is no problem downloading all company data within a few hours.
Attackers know that, so sometimes they steal data directly from the cloud. It is faster, and there is no risk of disclosure (they avoid the company’s on-site DLP and IDS systems).
Network analyzers
More and more companies start to have a network traffic analyzer. Among other things, it stores information about all network connections. Here, data leak could be found. What do you think?
The network analyzer is already much better to work with, and data leak can be detected. However, it is still not a “click, click, done” task. The screenshot shows almost 3 million connections in our network one day (these are routed connections, connections within one VLAN, are not visible).
Larger companies will have even more logs. Add to that the fact that you have to explore the traffic a few days to weeks back. This is a task worthy of Cinderella.😊 It is necessary to go through everything, sort and find the needle in the haystack. For somebody who does not work with network data regularly, it is a punishment task.
Oh, those encrypted connections
To not have the job easy, encrypted connections will make it difficult for us. Almost all websites are already available through HTTPS. At the same time, TLS 1.3 s ESNI, a DNS-over-HTTPS, is being prepared for us (administrators). This will cause us to lose a lot of information about connections. One way is to actively block new technology (within managed networks) or do SSL inspections.
So, it seems like it will be even harder in the future to decide which connections are legitimate and which are not. Respectively, it isn’t easy. For example, access to gmail.com may be fair, but there also can be data leak. But how to distinguish it? In my opinion, it is not possible only from the network analyzer.
You still do not know what leaked
After all, even if you find some suspicious connections and know from and where these connections went, how long it took, and how much data was transferred…unfortunately, you will still miss one crucial parameter, which is information about what escaped.
As with the method with the amount of transferred data, you can also overlook the data leak. Alternatively, the data can escape directly from the cloud; then, you will not find anything here.
Logs from systems and stations
Fortunately, information systems also include “audit logs” (i.e., who, what, and when did in the system). Likewise, Windows allows you to audit access to files (included in the OS’s price😉 ).
To be optimistic, I will start with one significant advantage. From these logs, you are finally able to find out what data could have leaked.
A disadvantage is that much more of this data will be present than in the case of network connections. Someone who has already deployed auditing over NTFS can confirm. You probably will not be able to work without a tool for automatic processing.
For an audit of log information systems, it is stupid that hackers steal data directly from databases or backups. I. e., the logging mechanism of the information system is usually overpassed. Respectively, unless it is directly a hack of a web system.
Ransomware destroys traces
I have one more observation for you that could be easily overlooked (unless you have already encountered a similar situation). When cybercriminals attack an organization and encrypt everything, logging systems often end up encrypted too. I. e., attackers (either knowingly or unknowingly) encrypt systems, which were supposed to be used by the victims to investigate these incidents.
So how to correctly detect a data leak?
So far, I have only written about “how it is not possible.” But it would also be nice to talk about how to do it. The most important thing is to ask yourself the question in time: “How would I detect a data leak in our company?”. All systems must be deployed before an incident occurs. Only then will you have data from which you can investigate.
Suppose you ask yourself this questions after the security incident. In that case, you will probably have no other choice than to monitor the “shaming web” of the ransomware group, whether it has not published any of your data.🙁
Below you find some of my observations and ideas for solutions. We are all in a different situation, and something different will work for you. If you know about anything else, I will be grateful if you share it (more heads know more).
Canary tokens
A simple and relatively effective way to detect data leaks or system attacks is the following. The technique is to create special files (Word, PDF, figures) in the systems, which a regular user should never access. If they are opened/viewed, you will receive an email informing you that the document has been opened. You will often also get information from which IP address (hence the approximate address – city, state).
For more information, see Canarytokens.org or the video below.
Data Loss Prevention (DLP) systems
These are the systems that specialize in preventing data/information leakage. In the case the data leaks, they help you figure out how it has happened. I want to write you here more and recommend some systems, but I do not have enough practical experience yet (so far, I have just discovered ways not to go).
Azure Information Protection
A service offered by Microsoft as a part of its cloud services (Microsoft Azure Information Protection). I like this solution. The subscription costs “a few cents,” run in the cloud (deploys faster and does not require its infrastructure), and is integrated with the rest of Microsoft 365 services and software (Azure Active Directory, MS Office, Windows, mobile applications). We use this service ourselves and deploy it to our customers.
What I like most about this solution is that the data itself is protected (each file is encrypted separately). The average user cannot remove this protection from the document. Even if he manages to make copies of all the files and take them out of the company, he will not open them at home.😊
If you have not seen this service yet, I recommend you take a detailed look.
Conclusion
In my opinion, protecting the company against data leak is technically the most difficult. More complex than defend it against ransomware. Data can leak in multiple ways. It does not need to be always hackers, sometimes, the data is taken out by one of the employees.
At the same time, believe me, that the company’s size says nothing about the level of its IT security. I have heard many stories from the managers of large enterprises about how it works in their network, respectively, does not work.😊 The bigger the company, the more equipment, people, interests, technologies there are and the harder it is to keep everything secured.
I wish you to keep your networks safe and avoid incidents.
Martin
Discussion