Tuesday, November 2, 2010

Logging: Do they think?

Last week I had to discover the reason for some obscure log entry in ESXi vmkernel log. It looks something like this:

Nov  2 11:34:36 vmkernel: 16:21:06:29.145 cpu2:2446675)WARNING: UserObj: 569: Failed to crossdup fd 6, fs: def5 oid: 19000000030000003 type CHAR: Busy

Do I need to say that I found nothing?

This is very nice entry, on which, if you try to use your best google-fu, you'll end up with nothing. And this is only a single example. There are multitude of such log entries, not only from vmware but from almost every possible software vendor.

So, what is exactly the problem, you may ask? Well, the problem is that there are many variable parts that depend on a particular installation!  If you try to paste string into google search it will find nothing! This is a problem, not only for novice users, but also for experienced ones. If I have an error message that includes a single word, PID, pathname, maybe IP address or something like that, this is completely useless to google for. Because when you remove all the variable parts you end up with nothing.

Now, administrators/users could agree to always use some standard address (e.g. 1.1.1.1) but this is also not a solution. First, there has to be many predefined parameters, and second, no one could keep with them! And yes, in addition people are very good in making exceptions (i.e. being ignorant).

So, what is the conclusion?

Well, the conclusion is that those that produce applications have to think about logging and that today each long entry will be googled for! Furthermore, this is in their best interest. Beacuse, if users/administrator quickly find the solution for some problem they will be happier (and regard your software better) and in the same time pressing on helpdesk will be lower. Both cases translate into money gain!

And what is the recommendation?

Well, I recommend software vendors to carefully design log messages that contain some fixed and unique string that will be easy to find in google. Variable parts should be separated.

Finally, I have to say that I particularly hate log4j logs. It's not that they are bad, they are wonderful, but only if you are a developer! Again, if you try to google for them, or to process them in OSSEC, you'll have big problems.

Friday, June 11, 2010

Software patching strategy...

There's much controversy about security flaw found in Windows XP and published by Google's researcher. You can read about it across the Internet, but here's one story. Short version is this: after the researcher found the flaw he gave Microsoft five days to react. But, Microsoft has something called 'patch Tuesday' which means it is delivering patches every Tuesday. I agree that this brings predictability to IT departments, but this is only true if the number of patches is high. So in the end, Microsoft didn't react in the given timeframe and the researcher published exploit. As I sad in one comment, I don't believe that Microsoft didn't react on purpose. It is more likely that they didn't know how to react and/or their procedures are not up to the task. Some are criticizing the researcher's approach, while the others are not. And it is true that this brought some IT managers, CISOs and who knows who else into very dangerous situation. But then, something like this should be made equal to impossible by a responsible company that is producing such a critical piece of a software like OS.

Contrast this researcher's behavior with some cracker that happened to find the same flaw. He could either sell it or devise exploit himself. In either case the exploit would be used at some point and the software producer wouldn't be notified about the flaw. By the time it is clear that there is a flaw, five days to react is huge time frame!

What is my point? The point is that in present times, and in future even more so, it is luxury to react in more than a day, maybe even in more than a few hours. Patches and workarounds have to be available immediately. Of course, someone could comment now that in the complex software it is hard to devise patch in a such short time frame. But, actually, I think it is possible with few changes in how software is developed.

First, software has to be clearly modularized. Each module has to have capability to be disabled or enabled. Obviously, when a flaw is found in a module, the first reaction would be to disable the module. Of course, there are two problems with this. The first one is that there are modules that are absolutely critical for a functional system. For example, in Linux kernel it is not possible to disable kernel locks because kernel can not function without them. But such a critical software should be made small and controllable so that it can be fixed in a matter of hour. The second problem is that some modules are not critical for the system itself, but are critical for the environment in which they are working. For example, driver for a network card. It can be disable, and the system itself will be functional, but it will also be useless. This case should be handled in such way that each user can use predefined functionality definitions and alter them or create a new ones. To return to an example with a driver for a network card. In this case it is common wisdom that no system today can work without network card. So, some profile will define network card as critical and as such it will not be disabled without explicit confirmation from the administrator. But, such modules have to be developed with principles somewhere in between the absolutely necessary modules and modules that aren't necessary at all (like Help system that is the cause of the problem with Windows XP that started this blog entry).

About Me

scientist, consultant, security specialist, networking guy, system administrator, philosopher ;)