Emergent Mind

Abstract

Modern network sensors continuously produce enormous quantities of raw data that are beyond the capacity of human analysts. Cross-correlation of network sensors increases this challenge by enriching every network event with additional metadata. These large volumes of enriched network data present opportunities to statistically characterize network traffic and quickly answer a key question: "What are the primary cyber characteristics of my network data?" The Python GraphBLAS and PyD4M analysis frameworks enable anonymized statistical analysis to be performed quickly and efficiently on very large network data sets. This approach is tested using billions of anonymized network data samples from the largest Internet observatory (CAIDA Telescope) and tens of millions of anonymized records from the largest commercially available background enrichment capability (GreyNoise). The analysis confirms that most of the enriched variables follow expected heavy-tail distributions and that a large fraction of the network traffic is due to a small number of cyber activities. This information can simplify the cyber analysts' task by enabling prioritization of cyber activities based on statistical prevalence. -- Los sensores de red modernos producen enormes cantidades de datos sin procesar que est\'an m\'as all\'a de la capacidad del an\'alisis humano. Una correlaci\'on cruzada de sensores de red se convierte en un desaf\'io al enriquecer cada evento de red con metadatos adicionales. Estos grandes vol\'umenes de datos de red enriquecidos presentan una oportunidad para caracterizar estad\'isticamente el tr\'afico de red y responder a la pregunta: "?Cu\'ales son las principales caracter\'isticas cibern\'eticas de mis datos de red?" Los esquemas de an\'alisis de Python GraphBLAS y D4M permiten realizar an\'alisis estad\'isticos an\'onimos, r\'apidos y eficientes en conjuntos grandes de datos de red. Este enfoque se prueba utilizando miles de millones de muestras de datos de red an\'onimos del observatorio de Internet m\'as grande (Telescopio CAIDA) y decenas de millones de registros an\'onimos del fondo comercial con la mayor capacidad de enriquecimiento (GreyNoise). El an\'alisis confirma que la mayor\'ia de las variables enriquecidas siguen las distribuciones de cola pesada y que una gran fracci\'on del tr\'afico de red se debe a una peque`na cantidad de actividades cibern\'eticas. Esta informaci\'on puede simplificar la tarea de los analistas cibern\'eticos al permitir la priorizaci\'on de las actividades cibern\'eticas en funci\'on de la prevalencia estad\'istica.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.