Learning detector of malicious network traffic from weak labels

Abstract. We address the problem of learning a detector of malicious
behavior in network traffic. The malicious behavior is detected based on
the analysis of network proxy logs that capture malware communication
between client and server computers. The conceptual problem in using
the standard supervised learning methods is the lack of sufficiently representative
training set containing examples of malicious and legitimate
communication. Annotation of individual proxy logs is an expensive process
involving security experts and does not scale with constantly evolving
malware. However, weak supervision can be achieved on the level of
properly defined bags of proxy logs by leveraging internet domain black
lists, security reports, and sandboxing analysis. We demonstrate that
an accurate detector can be obtained from the collected security intelligence
data by using a Multiple Instance Learning algorithm tailored
to the Neyman-Pearson problem. We provide a thorough experimental
evaluation on a large corpus of network communications collected from
various company network environments.

Source: http://cmp.felk.cvut.cz/ftp/articles/franc/Franc-Malware-ECML2015.pdf


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s