Dr. Giona Casiraghi is our next guest in the PhD Seminar on Network Science, 10 December.

Title: Analysis of Empirical Networks with the Generalised Hypergeometric Ensemble of Random Graphs

Abstract:
Statistical ensembles of networks are probability spaces of all networks that are consistent with given aggregate statistics.
They have become instrumental in the analysis of the various type of data in the form of complex networks.
Their numerical and analytical study provides the foundation for the inference of topological patterns, the definition of network-analytic measures, as well as for model selection and statistical hypothesis testing.
Contributing to the foundation of these data analysis techniques, we introduce the generalised hypergeometric ensemble of random graphs (gHypEG).
This ensemble is a broad class of analytically tractable statistical ensembles of finite, directed multi-edge networks.

GHypEG is a generalisation of the classical configuration model, commonly used to generate random networks with given degree sequence or distribution.
Different from this, we utilise an edge-centric sampling of m edges from a set of all possible edges, such that the sequence of expected degrees of vertices is preserved.
For each pair i,j of the n vertices, we sample edges from a set of possible multi-edges uniformly at random.
Such a process corresponds to an urn problem where edges map to balls in an urn.
Specifically, we obtain an urn with M=m^2 balls of n^2 different colours, representing all possible edges between a given pair of vertices.
Each adjacency matrix A whose entries sum to m corresponds to one particular realisation drawn from this ensemble.
The probability of exactly drawing A={A_{ij}} edges between each pair of vertices is given by the multivariate hypergeometric distribution.
By biasing this sampling process, we can further generalise the ensemble.
We do so assigning to each pair of vertices a given propensity to form an edge, i.e., arbitrary degree-corrected tendencies of pairs of vertices to form edges between each other.
This new sampling process is described by a biased urn, whose sampling probability is the multivariate non-central Wallenius hypergeometric distribution.

Studying empirical and synthetic data, we show that this class of ensembles provides a robust framework for the analysis of empirical complex networks.
In particular, we demonstrate how gHypEG can be used to develop statistical regression models to analyse data in the form of complex networks.
The resulting non-linear parametric models take as independent variables diverse hypotheses about the network structure, e.g., community membership or more complicated vertex-vertex relations.
They allow then to regress the influence of such hypotheses on the network topology, estimating the intensity and the significance of their effects.