Forensic Case Studies - Carving and Parsing Solaris WTMPX Files
A few weeks back I was analyzing a Solaris 10 (SPARC) raw partition image and was trying to determine from the wtmpx files who had logged
into the system, from what/which remote IP addresses and when. To be more precise, I was tracking nagios account that was used to compromise
this machine. The problem I encountered was that the file system was completely wiped out - all files were gone.
Fortunately, this was done at filesystem level with rm -rf / command.
This means the data should still be there. But how to recover it?
Solaris wtmpx file format
Solaris uses /var/adm/wtmpx file which is in some way similar to /var/log/wtmp from Linux but unfortunately is incompatible. Also this
system is based on SPARC architecture which is big-endian so in contrast to Intel x86
(little-endian) the integers are stored in reverse order. This means we cannot use Linux native tools like last to parse contents of a
wtmpx file from Solaris. In order to recover it we need to know the exact structure. The easiest way to understand the format is to look at
the source code of programs that read and write to wtmpx files. Since the target system is Solaris, the format is very likely to be found in
/usr/include/utmpx.h C include file.
Here is an excerpt from Solaris 10’s utmpx.h:
123456789101112131415161718192021
struct timeval32
{
int tv_sec, tv_usec;
};
struct futmpx {
char ut_user[32]; /* user login name */
char ut_id[4]; /* inittab id */
char ut_line[32]; /* device name (console, lnxx) */
pid32_t ut_pid; /* process id */
int16_t ut_type; /* type of entry */
struct {
int16_t e_termination; /* process termination status */
int16_t e_exit; /* process exit status */
} ut_exit; /* exit status of a process */
struct timeval32 ut_tv; /* time entry was made */
int32_t ut_session; /* session ID, user for windowing */
int32_t pad[5]; /* reserved for future use */
int16_t ut_syslen; /* significant length of ut_host */
char ut_host[257]; /* remote host name */
};
Data carving
Each wtmpx entry is exactly 372-byte long (aligned to 4 bytes!) and it starts with an username trimmed to 32 bytes. Based on this
information we can create a pattern for scalpel - well known file carving utility. In this case, we want scalpel to scan for specified
string of bytes (header) and then save 372 byte long chunks of data that follow the header. If you want to learn more about the configuration
file syntax, I encourage you to review the manual page or the configuration file itself where you will
find many examples.
1
wtmpx y 372 nagios\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x74\x73
Let’s run it on the partition image and see the results!
1234567891011121314151617181920
elceef@cerebellum:~$ scalpel -o scalpel_out/ -O dd_nj090240-var
Scalpel version 1.60
Written by Golden G. Richard III, based on Foremost 0.69.
Opening target "/home/elceef/dd_nj090240-var"
Image file pass 1/2.
dd_nj090240-var: 100.0% |***************************************************************************************************| 20.0 GB 00:00 ETA
Allocating work queues...
Work queues allocation complete. Building carve lists...
Carve lists built. Workload:
wtmpx with header
"\x6e\x61\x67\x69\x6f\x73\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x74\x73" and
footer "" --> 8 files
Carving files from image.
Image file pass 2/2.
dd_nj090240-var: 100.0% |***************************************************************************************************| 20.0 GB 00:00 ETA
Processing of image file complete. Cleaning up...
Done.
Scalpel is done, files carved = 8, elapsed = 67 seconds.
After a minute the tool carved eight files out of the image.
12345678910111213
elceef@cerebellum:~/scalpel_out$ ll
total 44
drwxrwxr-x 2 elceef elceef 4096 aug 16 05:39 ./
drwxrwxr-x 4 elceef elceef 4096 aug 16 05:36 ../
-rw-rw-r-- 1 elceef elceef 372 aug 16 05:39 00000000.wtmpx
-rw-rw-r-- 1 elceef elceef 372 aug 16 05:39 00000001.wtmpx
-rw-rw-r-- 1 elceef elceef 372 aug 16 05:39 00000002.wtmpx
-rw-rw-r-- 1 elceef elceef 372 aug 16 05:39 00000003.wtmpx
-rw-rw-r-- 1 elceef elceef 372 aug 16 05:39 00000004.wtmpx
-rw-rw-r-- 1 elceef elceef 372 aug 16 05:39 00000005.wtmpx
-rw-rw-r-- 1 elceef elceef 372 aug 16 05:39 00000006.wtmpx
-rw-rw-r-- 1 elceef elceef 372 aug 16 05:39 00000007.wtmpx
-rw-rw-r-- 1 elceef elceef 963 aug 16 05:39 audit.txt
The entries look valid. We can easily spot account name, console and source IP address this session originated from. We miss other important
piece of the puzzle: timestamp and event type. We need to write a parser that will allow us to extract detailed information of each
event (entry) similar to how last command does.
Parsing
I created a quick and dirty python script that benefits mostly from struct module to handle binary data. This module has a function called
unpack() especially designed to parse binary and structured data according to a given format. Format strings are used to specify the
expected layout when unpacking data. They are build up from format characters which specify the type and size of data being unpacked. I
strongly encourage you to review documention for struct module first in order to understand
better the meaning of format characters.
It is worth mentioning that I had to use pad bytes in the format string in order to maintain proper alignment for the futmpx struct involved.
Don’t be surprised if your calculations are not in accordance with sizeof(struct futmpx) - this is the way data structures are stored in the
memory.
#!/usr/bin/env python
import struct
import sys
import datetime
def type(x):
return {
0: 'EMPTY',
1: 'RUN_LVL',
2: 'BOOT_TIME',
3: 'NEW_TIME',
4: 'OLD_TIME',
5: 'INIT_PROCESS',
6: 'LOGIN_PROCESS',
7: 'USER_PROCESS',
8: 'DEAD_PROCESS',
9: 'ACCOUNTING'
}.get(x, 'UNKNOWN')
data = open(sys.argv[1], 'rb')
while True:
chunk = data.read(372)
if not chunk:
break
s = struct.Struct('>32s 4s 32s i H H H b b I I I 5I H 257s b')
unpacked = s.unpack(chunk)
#TODO: timezone
timestamp = datetime.datetime.fromtimestamp(int(unpacked[9])).strftime('%Y-%m-%d %H:%M:%S')
print(str(unpacked[0]) + '\t' + str(unpacked[3]) + '\t' + str(unpacked[2]) + '\t' + str(timestamp) + '\t' + str(unpacked[18]) + '\t' + type(unpacked[4]))
Now it’s time to see this code in action. My script takes only a single file as an argument so I use the following command line kung-fu to
parse all files (in this case single wtmpx entries) at once and sort by the timestamp:
Works like a charm! But there is still area for improvement. This code does not convert the time to the correct time zone. Take this
into account before building a timeline.
Happy end
Solving this case would not be possible without this promising technique. The compromised system was configured to keep track of only
unsuccessful authentication attempts leaving wtmpx records as the only reliable source of information about the origin of the attack. The
person responsible for the destruction of this system was too confident - deleting all files is not enough to cover all tracks. Now
personal details of this individual are known and the case is closed. Cheers!