Last year we posted an article
about
graphic
representations of malware
, in which we commented that it's possible
to
automatically
identify and classify malware into a family based
on
their
graphical structure
representation. This representation is based on the relationship between
function calls in the executable.

These relationships create a graph of the internal structure of the
executable.
These graphs are very similar among samples of the same
family or among samples w
hich share the same
source code. There are several publications about this technique
(Ero Carrera &
Gergely Erdély [VB2004])
and all of us have heard about Sabre
Security
VxClass
Project
, which is a system to automatically unpack and classify a binary into
a family.

PandaLabs is 'two or three steps ahead' too and we
have developed our own system to automatically identify and classify the samples
we receive
daily. Of course, this system
works with unpacked samples, that's why we use it with our
generic unpacker engine. We have made a flash video [14 MB] (to show
you how this system works. Basically the steps are:

  • Unpack the sample
    (the system only works with unpacked binaries)
  • Drag&Drop it into the client
    application
  • The client
    application send
    s it to the graph
    server
  • The server analyzes it with IDA and uses several python
    scripts to extract:
    • Graph of
      function calls
    • Control Flow Graph (cfg) of
      functions
    • Entropy
    • CRC32 and custom CRC of
      functions
  • Preselect samples from the database, applying several filters: entropy,
    compiler, filesize
    ,… Then, the resulting ones will be compared with our sample.

This data will be used to compare the
sample
with our entire graph database (Actually, we have already analyzed and stored
in the graph database 185.000 samples).