A MACHINE-LEARNING TOOLKIT FOR LARGE-SCALE ECRIME FORENSICS.
DefPloreX, uses a combination of machine-learning and visualization techniques to practically turn original unstructured data into meaningful high-level descriptions. Real-time information on incidents, breaches, attacks and vulnerabilities, for example, are efficiently processed and condensed into objects that are easily browsable — making them suitable for efficient large-scale eCrime forensics and investigations.
DefPloreX ingests plain CSV inputs about web incidents to analyze, explores their resources with headless browsers, extracts features from deface pages, and uploads the resulting data to an Elastic index. Distributed headless browsers are coordinated via Celery. Using Python Panda, NumPy and PyTables, DefPloreX provides offline “views” of the data, allowing easy pivoting and exploration.
This toolkit automatically groups similar deface pages in clusters and organizes web incidents in campaigns. Requiring only one pass, clustering is intrinsically parallel and not memory bound. DefPloreX offers text- and web-based UIs, which can be queried using a simple language for investigations and forensics.