1

I am looking to create a series of graphs with a fixed number of nodes (say n = 100), with varying average degree but with (almost) the same transitivity in the network.

At the moment, the best I can come up with is

require(igraph)

g1 <- sample_pa(n = 100, power = 0.5, m = 2) # degree 3.94, transitivity 0.075
g2 <- sample_pa(n = 100, power = 0.5, m = 4) # degree 7.8, transitivity 0.135
g3 <- sample_pa(n = 100, power = 0.5, m = 4) # degree 11.58, transitivity 0.214

and so on. However, the transitivity is moving quite a bit and I am looking for a built-in function or an easy algorithm to create such graphs.

buzaku
  • 331
  • 1
  • 8

1 Answers1

0

There are several methods to rewire a graph to achieve a given transitivity / global clustering coefficient (GCC below):

1) Naive approach

On each iteration, a random subset of the edges is rewired whilst preserving the degree sequence of the graph. If the global clustering coefficient of the rewired graph is closer to the target value, the rewired graph is kept into the next iteration. The algorithm halts either when the GCC is sufficiently close to the target, or when a maximum number of iterations is reached.

2) Bansal et al. 2009

On each iteration a motif consisting of 5 nodes and 4 edges is randomly selected from the set of all such motifs, and two of the outer edges are rewired to produce a new candidate graph, G', which has the same degree sequence as G:

                       x                       x
                      / \                     / \
                    y1   y2     <=====>     y1---y2
                    |    |
                    z1   z2                 z1---z2

The rewiring step can either introduce a new triangle or break up an existing one depending on whether the current GCC is greater or smaller than the target value. If rewiring brings the GCC closer to the target value then G' is used in the next iteration. The algorithm halts either when the GCC is sufficiently close to the target value, or when a maximum number of iterations is reached.

Code

I work in python so I don't have these routines in R. But I am sure that you can work it out:

import numpy as np
import igraph
from itertools import izip, combinations, product
from random import Random


def bansal_shuffle(G, target_gcc, tol=1E-3, maxiter=None, inplace=True,
                   require_connected=False, seed=None, verbose=False):
    r"""
    Bansal et al's Markov chain method for generating random graphs with
    prescribed global clustering coefficients.

    On each iteration a motif consisting of 5 nodes and 4 edges is randomly
    selected from the set of all such motifs, and two of the outer edges are
    rewired to produce a new candidate graph, G', which has the same degree
    sequence as G:
                           x                       x
                          / \                     / \
                        y1   y2     <=====>     y1---y2
                        |    |
                        z1   z2                 z1---z2

    The rewiring step can either introduce a new triangle or break up an
    existing one depending on whether the current GCC is greater or smaller
    than the target value. If rewiring brings the GCC closer to the target
    value then G' is used in the next iteration. The algorithm halts either
    when the GCC is sufficiently close to the target value, or when a maximum
    number of iterations is reached.

    Arguments:
    ----------
        G: igraph.Graph
            input graph to be shuffled; can be either undirected or directed
        target_gcc: float
            target global clustering coefficient
        tol: float
            halting tolerance: abs(gcc - target_gcc) < tol
        maxiter: int
            maximum number of rewiring steps to perform (default is no limit)
        inplace: bool
            modify G in place
        require_connected: bool
            impose the additional constraint that G' must be connected on each
            rewiring step, in which case the input graph must also be connected
        seed: int
            seed for the random number generator, see numpy.random.RandomState
        verbose: bool
            print convergence messages

    Returns:
    --------
        G_shuf: igraph.Graph
            shuffled graph, has the same same degree sequence(s) as G
        niter: int
            total rewiring iterations performed
        gcc_best: float
            final global clustering coefficient

    Reference:
    ----------
    Bansal, S., Khandelwal, S., & Meyers, L. A. (2009). Exploring biological
    network structure with clustered random networks. BMC Bioinformatics, 10,
    405. doi:10.1186/1471-2105-10-405

    """

    if maxiter is None:
        maxiter = np.inf

    if require_connected:
        if not G.is_connected():
            raise ValueError(
                'if require_connected, the input graph must also be connected')

    if not inplace:
        G = G.copy()

    # seed RNG
    gen = np.random.RandomState(seed)

    # get initial GCC and loss
    gcc_initial = G.transitivity_undirected()
    gcc = gcc_best = gcc_initial
    loss = loss_best = target_gcc - gcc

    # if we're already close enough, return immediately
    if abs(loss_best) < tol:
        return G, 0, gcc

    # find all nodes with degrees >= 2 (ignoring direction). each of these must
    # be part of at least one triplet motif
    degrees = G.degree()
    candidate_x = _filter_by_degree(range(G.vcount()), degrees, 2)

    niter = 0
    terminating = False

    while not terminating:

        # for moderately-sized graphs it's faster to modify a deep copy of G
        # than to undo the rewiring whenever the GCC does not improve
        G_prime = G.copy()

        if loss > 0:
           _make_triangle(G_prime, candidate_x, gen)

        elif loss < 0:
            _break_triangle(G_prime, candidate_x, gen)

        # compute the new clustering coefficient and loss
        gcc = G_prime.transitivity_undirected()
        loss = target_gcc - gcc

        # did we improve?
        improved = abs(loss) < abs(loss_best)

        # if desired, we also confirm that the new graph is connected
        if require_connected:
            improved &= G_prime.is_connected()

        if improved:
            gcc_best = gcc
            loss_best = loss
            G = G_prime

        # print progress
        if verbose:
            print("iter=%-6i GCC=%-8.3g loss=%-8.3g improved=%-5s"
                  % (niter, gcc, loss, improved))

        niter += 1

        # check termination conditions
        if abs(loss_best) < tol:
            terminating = True
        elif niter >= maxiter:
            print('failed to reach targed GCC within maxiter')
            terminating = True

    return G, niter, gcc_best


def _make_triangle(G, candidate_x, gen):

    have_rewire_candidates = False

    # get candidate edges to rewire
    while not have_rewire_candidates:

        # a random node with a degree of at least 2
        x = gen.choice(candidate_x)

        # find all neighbors of this node with a degree of at least 2 (ignoring
        # direction)
        x_neighbors = G.neighbors(x)
        x_valid_neighbors = _filter_by_degree(
            x_neighbors, G.degree(x_neighbors), 2
        )

        # skip this candidate if there aren't at least two neighbors with
        # degrees >= 2
        if len(x_valid_neighbors) < 2:
            continue

        else:

            # shuffle the valid neighbors of x, iterate over every possible
            # pair
            gen.shuffle(x_valid_neighbors)

            for (y1, y2) in combinations(x_valid_neighbors, 2):

                # if y1 and y2 are already connected, skip them
                if _any_direction(G, y1, y2):
                    continue

                # find all outputs from y1
                from_y1 = G.neighbors(y1, igraph.OUT)

                # find all inputs to y2
                to_y2 = G.neighbors(y2, igraph.IN)

                # shuffle them
                gen.shuffle(from_y1)
                gen.shuffle(to_y2)

                # iterate over all possible pairs consisting of one output from
                # y1 and one input to y2.
                for z1, z2 in product(from_y1, to_y2):

                    # we're only interested in unconnected pairs
                    if _any_direction(G, z1, z2):
                        continue

                    # must satisfy z1 != x; z2 != x; z1 != z2
                    elif x not in (z1, z2) and z1 != z2:
                        have_rewire_candidates = True
                        break

                if have_rewire_candidates:
                    break

    # perform double edge swap
    G.delete_edges([(y1, z1), (z2, y2)])
    G.add_edges([(y1, y2), (z2, z1)])

    pass


def _break_triangle(G, candidate_x, gen):

    have_rewire_candidates = False
    n_edges = G.ecount()

    # get candidate edges to rewire
    while not have_rewire_candidates:

        # a random node with a degree of at least 2
        x = gen.choice(candidate_x)

        # find all neighbors of this node with a degree of at least 2 (ignoring
        # direction)
        x_neighbors = G.neighbors(x)
        x_valid_neighbors = _filter_by_degree(
            x_neighbors, G.degree(x_neighbors), 2
        )

        # skip this candidate if there aren't at least two neighbors with
        # degrees >= 2
        if len(x_valid_neighbors) < 2:
            continue

        else:

            # shuffle the valid neighbors of x, iterate over every possible
            # pair
            gen.shuffle(x_valid_neighbors)

            for (y1, y2) in combinations(x_valid_neighbors, 2):

                # if a connection exist from y1 --> y2...
                if G.are_connected(y1, y2):
                    y1_neighbors = G.neighbors(y1)
                    y2_neighbors = G.neighbors(y2)

                    exclude = x_neighbors + y1_neighbors + y2_neighbors

                    while not have_rewire_candidates:

                        # pick a random edge
                        eidx = gen.randint(0, n_edges)
                        e = G.es[eidx]
                        z2, z1 = e.source, e.target

                        # neither z1 nor z2 can be connected to either x, y1,
                        # or y2
                        if z1 not in exclude and z2 not in exclude:

                            have_rewire_candidates = True
                            break

                if have_rewire_candidates:
                    break

    # perform double edge swap
    G.delete_edges([(y1, y2), (z2, z1)])
    G.add_edges([(y1, z1), (z2, y2)])

    pass


def _filter_by_degree(indices, degrees, min_deg=2):
    return [ii for ii, dd in izip(indices, degrees) if dd >= min_deg]


def _any_direction(G, a, b):
    """
    if a --> b, return 1
    if a <-- b, return -1
    else,       return 0
    """
    if G.are_connected(a, b):
        return 1
    elif G.are_connected(b, a):
        return -1
    else:
        return 0


def naive_diffusion(G, target_gcc, tol=1E-3, maxiter=None, inplace=True,
                    seed=None, verbose=False):
    r"""
    Naive approach to obtaining a shuffled version of the input graph G with a
    prescribed target global clustering coefficient.

    On each iteration, a random subset of the edges is rewired whilst
    preserving the degree sequence of the graph. If the global clustering
    coefficient of the rewired graph is closer to the target value, the rewired
    graph is kept into the next iteration. The algorithm halts either when the
    GCC is sufficiently close to the target, or when a maximum number of
    iterations is reached.

    Arguments:
    ----------
        G: igraph.Graph
            input graph to be shuffled; can be either undirected or directed
        target_gcc: float
            target global clustering coefficient
        tol: float
            halting tolerance: abs(gcc - target_gcc) < tol
        maxiter: int
            maximum number of rewiring steps to perform (default is no limit)
        inplace: bool
            modify G in place
        seed: int
            seed for the random number generator, see random.Random
        verbose: bool
            print convergence messages

    Returns:
    --------
        G_shuf: igraph.Graph
            shuffled graph, has the same same degree sequence(s) as G
        niter: int
            total rewiring iterations performed
        gcc_best: float
            final global clustering coefficient

    """

    if maxiter is None:
        maxiter = np.inf

    if not inplace:
        G = G.copy()

    # use a Python random.Random instance as the RNG for igraph (probably
    # slightly slower, but allows seeding for repeatability)
    igraph.set_random_number_generator(Random(seed))

    loss_best = 1.
    n_edges = G.ecount()

    # start with a high rewiring rate
    rewiring_rate = n_edges

    n_iter = 0
    done = False

    while not done:

        # operate on a copy of the current best graph
        G_prime = G.copy()

        # adjust the number of connections to rewire according to the current
        # best loss
        n_rewire = min(max(int(rewiring_rate * loss_best), 1), n_edges)
        G_prime.rewire(n=n_rewire)

        # compute the global clustering coefficient
        gcc = G_prime.transitivity_undirected()
        loss = abs(gcc - target_gcc)

        # did we improve?
        improved = loss < loss_best

        if improved:

            # keep the new graph
            G = G_prime
            loss_best = loss
            gcc_best = gcc

            # increase the rewiring rate
            rewiring_rate *= 1.1

        else:

            # reduce the rewiring rate
            rewiring_rate *= 0.9

        if verbose:
            print("iter %4i: n_rewired=%5i, new GCC=%-8.3g, loss=%-8.3g, "
                  "improved=%5s, rewiring rate=%-8.3g"
                  % (n_iter, n_rewire, gcc, loss, improved, rewiring_rate))

        n_iter += 1

        # check termination conditions
        if loss_best <= tol:
            done = True
        elif n_iter > maxiter:
            print('failed to converge within maxiter (%i)' % maxiter)
            done = True

    return G, n_iter, gcc_best
Paul Brodersen
  • 7,169
  • 14
  • 30