Genetic Interactions: Different Models for Expectation

In the paper, we consider that the evident bias of genetic interactions toward hubs could have at least two underlying explanations:

Figure 1(b): Possible interpretations for Fig 1a. In scenario 1, physical interaction hubs (A and B) are biased to genetically interact with each other; in scenario 2, hubs (B) genetically interact with many partners (A) of both low and high physical connectivity.

We model the expectation in scenario 2 as follows:

A Null-Hypothesis: Scenario 2 hub bias
There are three types of genetic interactions in our ontology, given a particular hub size h: Hub-Hub, Hub-Leaf (or Leaf-Hub), and Leaf-Leaf. These three categories actually apply to all pairs of genes, whether or not they genetically interact.

Above, we show the space of possible pairs of genes, delineated according to whether the genes are hubs or leaves. The bottom right half of the square is shaded (red) because it contains only duplicates of pairs that appear in the upper left half.

In our most basic model of a hub bias (scenario 2), we take the probabiliy density to be constant with one density in the area labeled LL, and constant with another, higher, density in the areas labeled HL and HH. Thus, under this model, we can make a prediction, and compare it to our actual observations.

Given the number of genetic interactions involving a hub is g; the number of hubs is n; and the the number of non-hubs is m: g is the number of genetic interactions in regions HH and HL, so the number of genetic interactions that occur in region HH should be equal to g multiplied by the proportion of the HH+HL area that HH occupies. There are m ^. n pairs in area HL, and there are (n choose 2) pairs in area HH, so the proportion is simply (n choose 2) / ((n choose 2) + m ^. n), and thus, our prediction for the number of hub-hub interactions is:

which, incidentally, reduces to:

In Figure 1(c), we then compare our observations against this prediction at various hub sizes h.

Fig 1c: Ratio of Observed/Expected

Based on the data above, we reject Scenario 2 in favor of a model more like Scenario 1, as shown below.

The above figure shows probability density as darkness in each area, LL, HL, and HH. The (red) shaded area in the lower right half of each square still indicates a region that would be double-counting, which we therefore ignore.

Comparing to the physical network
Maslov and Sneppen report that

...for both interaction and regulatory networks, links between highly connected proteins are systematically suppressed, whereas those between a highly connected and low-connected pairs of proteins are favored. One test that we can perform is to superimpose a copy of they protein-protein interaction network upon itself, with the copy relabeled to be genetic interactions. In this network, we can re-run the test shown in Figure 1(c), to check that we no longer observe the hub-hub preference shown in our original Figure 1(c). See the upper right panel of the figure below, labeled "Physical vs physical, Scenario 2."

Yet we do observe a bias towards hub-hub interactions, relative to the hub-leaf interactions. This is the Scenario 1 from the earlier chart; how do we reconcile this with Maslov and Sneppen's finding?

Maslov and Sneppen vs Scenario 2
Maslov and Sneppen randomize in the following way:

P(K₀,K₁)

K₀

K₁

P_r(K₀,K₁)

In this model, a randomized network will have more hub-hub interactions than our original model would predict. For example, in a network with ten "hubs" of degree five, and fifty "leaves" of degree one, then in all, there are fifty hub endpoints and fifty leaf endpoints. Of the fifty edges that are dropped randomly on the network, it would be expected that about quarter (12 or 13) are hub-hub interactions, one half (25) are hub-leaf interactions, and one quarter (12 or 13) are leaf-leaf interactions. This would allow self-loops, and is a rough estimate.

Of the possible pairs of genes, (discounting self-loops, (60 choose 2) = 30 ^. 59 = 1770), (10 choose 2) = 45 are hub-hub pairs; 10 ^. 50 = 500 are hub-leaf pairs; (50 choose 2) = 25 ^. 49 = 1225 are leaf-leaf pairs. 1225 + 500 + 45 = 1770. Of the hub-hub pairs, one could crudely estimate that 12.5/45 would be realized (28%); of the hub-leaf pairs, 50/500 would be realized (10%); of the leaf-leaf pairs, 12.5/1225 would be realized (1%). This is clearly more like scenario 1 than scenario 2. (This is also a crude estimate, since the issues of self-loops and double-counting of interactions are not carefully handled.)

This much more closely resembles a different model:

Another model
Assume the selections of the two endpoints of a genetic interacion are independent. Estimate the probability of an endpoint being a hub by taking all genetic interactions, tabulating whether their endpoints fall on hubs or not, and calculating a percentage. In the case of the Maslov and Sneppen vs Scenario 2 example above, half the endpoints would be on hubs, half on leaves. In this case, we predict exactly what Maslov and Sneppen's scrambling produces, and in their randomized networks, we observe no hub-hub enrichment.

In fact, when we superimpose the physical network on itself, and check for the ratio of observed to expected hub-hub interactions using this model, we get a very different chart: See the lower right panel of the figure below, labeled "Physical vs physical, new model."

We actually observe fewer hub-hub interactions than expected under this model, which is exactly what Maslov and Sneppen report.

Applying the new model to the genetic interaction network
When we apply this model to the genetic interaction network, superimposed on the physical interaction network, we observe the following chart: See the lower left panel of the figure below, labeled "Original genetic data, new model."

So it appears that this new model, which selects the endpoints of genetic interactions independently with a bias toward hubs, does a much better job of explaining the genetic interactions than the original Scenario 2. This new model is much like Scenario 1, but even still, genetic interactions occur between hubs more frequently than the new model would predict. At many hub thresholds, this enrichment is not statistically significant, but in several instances it is (hub cutoffs 20, 25, and 30). This preference towards hub-hub interactions (lower left panel of figure above) is contrasted with the physical network's own preference away from hub-hub interactions (lower right panel of figure above), consistent with Masolv and Sneppen's observation. Even using the Scenario 2 test, we can see that the tendency towards interactions between physical hubs is stronger in the genetic network than in the physical network itself (upper left vs upper right panels of figure above).

Genetic Interactions: Different Models for Expectation

Online supplement to: