Large Scale Graph Mining For Web Reputation Inference
Yonghong Huang, Paula Greve

Abstract:
The explosion of the number of devices and users on the Internet results in massive amounts of data and information. This poses the most complex challenges in security we have ever faced. The detection of malicious domains and Internet protocol (IP) addresses has been a hot topic in cyber security. We present a scalable and effective graph inference system for detecting malicious domains and IP addresses. The goal is to protect Internet users from network threats. Based on the loopy belief propagation algorithm, the system infers every domain or IP reputation, flagging it with high reputation as malicious one. We have evaluated the system with 75 million-node graph constructed from the huge dataset (500 gigabytes). The system attains performance with 86\% and 87\% area under receiver operating curves for inferring domain and IP reputations respectively. We demonstrate that the graphical solution provides rapid assessment of safe or risky sites on McAfee"s data. It provides an automatic tool for web reputation inference in the field and serves as an assisting tool for "first pass" classification and triaging.