The sensor component of INFER is responsible for packet capture. The sensor program captures packets using libpcap, and feeds captured packets to various modules. The modules are located in src/lib/sensor/. Each module is responsible for writing its own synopses of the captured traffic. Currently, the following modules have been developed: - http (src/lib/sensor/http) Parses and stores information from http headers. - hbf (src/lib/sensor/hbf) Stores HBFs for all packet payloads above a configurable size. - neoflow (src/lib/sensor/neoflow) Stores neoflow, a flow synopses which includes all layer 2 and 3 info, as well as flow statistics and traffic-content type. - live_ips (src/lib/sensor/live_ips) Keeps track of which hosts on the network were "alive". A live host is defined as a host that was seen as a source of network traffic. - dns (src/lib/sensor/dns) Parses and stores DNS information. ie. who queried which names, who resolved which IPs, and so on. * * MIDDLEWARE (src/bin/analysis, among others) * The middleware component of infer is responsible for processing the data that was written by the sensor. It is not necessary for the middleware to run on the same machine as the sensor; it just needs access to the data written by the sensor modules. The middleware is run via the system cron daemon. The daily (src/bin/scripts/infer_daily.sh) and hourly (src/bin/scripts/infer_hourly.sh) scripts are run daily and hourly, respectively. The hourly script currently has one responsibility: insert the http data that was output from the sensor into PostgreSQL. The daily script is responsible for launching the rest of the middleware processes: (src/bin/analysis) Much like the sensor, the analysis does not do any processing itself; it it merely responsible for loading modules, reading data, and passing the data to the modules. Each module is responsible for keeping its own state and storing the results of its analysis in a PostgreSQL database. Modules are run in stages. That is, all data is fed to each module in a stage. When that stage is complete, the next stage of modules is fed the same data. This allows for later-stage modules to make use of state possibly accumulated by earlier-stage modules. The following modules are currently in use: - darkSpaceSources (src/lib/analysis/darkSpaceSources) Determines which external hosts attempted to contact internal hosts that were not "alive". - infectedContacts (src/lib/analysis/infectedContacts) Determines which internal hosts have contacted hosts which were on one of several publically-accessible black lists. - nonDNSTraffic (src/lib/analysis/nonDNSTraffic) Determines which internal hosts have contacted external hosts without having first resolved their IP addresses. This is often indicative of P2P traffic. - ftpBruteForcers (src/lib/analysis/ftpBruteForcers) - sqlBruteForcers (src/lib/analysis/sqlBruteForcers) - sshBruteForcers (src/lib/analysis/sshBruteForcers) - telnetBruteForcers (src/lib/analysis/telnetBruteForcers) Determines which external hosts have attempted to "brute force" unauthorized access to various services running on internal hosts. - hostTraffic (src/lib/analysis/hostTraffic) Accumulates various traffic statistics about each host on the network, including the number of flows, number of packets, bytes transferred, etc. - evasiveTraffic (src/lib/analysis/evasiveTraffic) Isolates traffic that exibits behavior in line with methods used to evade detection by older IDSs, including traffic with a low ttl and highly-fragmented traffic. - darkSpaceTargets (src/lib/analysis/darkSpaceTargets) Determines which "live" internal hosts were in contact with known external dark-space sources. - hostPairs (src/lib/analysis/hostPairs) Records pairs of internal/external IP address that have been in communication with each other. - networkTraffic (src/lib/analysis/networkTraffic) Accumulates basic network-wide statistics, namely: the number of "live" IPs, the total number of flows, the total number of packets, and the number of bytes transferred. - topPorts (src/lib/analysis/topPorts) Determines the most-active ports for both ingress and egress traffic for each internal host. - commChannels (src/lib/analysis/commChannels) Isolates flows exibiting command-and-control-type behavior. If a host is determined to be infected, these could possibly be the channels used to control the host remotely. - bandwidth_utilization (src/lib/analysis/bandwidth_utilization) Records the overall ingress and egress bandwidth usilization of the network. - host_bandwidth_utilization (src/lib/analysis/host_bandwidth_utilization) Records the overall ingress and egress bandwidth usilization of each host on the network. - network_exposure (src/lib/analysis/network_exposure) Records the exposure of the network to each AS and Country. - host_exposure (src/lib/analysis/host_exposure) Records the exposure of each internal host to each AS and Country. (src/bin/interesting_ips) interesting_ips is responsible for ranking how malicious each host in the network is. (src/bin/scripts/infer_index_symptom_tables.sh) This script simply creates indecies on various symptom tables so as to allow for shorter query times by the frontend. (src/bin/scripts/infer_http_domain_relevance.pl) This script calculates the "relevance" of a domain name with regard to http traffic. (src/bin/scripts/infer_http_host_relevance.pl) This script calculates the "relevance" of an IP address with regard to http traffic. (src/bin/browser_stats) This program accumulates statistics about which web browsers are being utilized by internal hosts. (src/bin/web_server_browsers) This program accumulates statistics about which browsers are being used to contact internal web servers. (src/bin/web_server_crawlers) This program accumulates statistics about which crawlers are being used to contact internal web servers. (src/bin/web_server_servers) This program accumulates statistics about which http server software are being run on internal web servers. (src/bin/web_server_top_hosts) This program accumulates statistics about which http domains are most commonly served by internal web servers. (src/bin/web_server_top_urls) This program accumulates statistics about which urls are most commonly served by internal web servers. (src/bin/mutual_contacts) This program determines which hosts are related to each other using an algorithm developed by Baris Coskun. (src/bin/privacy_matrix) This program generates the privacy graph used to determine how closely- related two internal hosts are. (src/bin/pruned_network_graph) This program generates a pruned copy of the entire network graph that is used to generate visualizations in the frontend. (src/bin/external_contacts) This program accumulates statistics about how many internal hosts have been contacted by each external host. (src/bin/internal_contacts) This program accumulates statistics about how many external hosts have been contacted by each internal host. (src/bin/scripts/infer_smtp_notify.pl) This script sends out email alerts for the top-20 ranked hosts.