Skip to content

Generate binding effects

Generate enrichment effects for genes using vectorized operations, based on their bound designation, with separate experiment hops ranges for unbound and bound genes.

Note that the default values are a scaled down version of actual data. See also https://github.com/cmatKhan/callingCardsTools/blob/main/callingcardstools/PeakCalling/yeast/enrichment.py

Parameters:

Name Type Description Default
gene_population GenePopulation

A GenePopulation object. See generate_gene_population()

required
background_hops_range tuple[int, int]

The range of hops for background genes. Defaults to (1, 100)

(1, 100)
unbound_experiment_hops_range tuple[int, int]

The range of hops for unbound genes. Defaults to (0, 1)

(0, 1)
bound_experiment_hops_range tuple[int, int]

The range of hops for bound genes. Defaults to (1, 6)

(1, 6)
total_background_hops int

The total number of background hops. Defaults to 1000

1000
total_experiment_hops int

The total number of experiment hops. Defaults to 76

76
pseudocount float

A pseudocount to avoid division by zero. Defaults to 1e-10

1e-10

Returns:

Type Description
torch.Tensor

A tensor of enrichment values for each gene.

Raises:

Type Description
TypeError

If gene_population is not a GenePopulation object

TypeError

If total_background_hops is not an integer

TypeError

If total_experiment_hops is not an integer

TypeError

If pseudocount is not a float

TypeError

If background_hops_range is not a tuple

TypeError

If unbound_experiment_hops_range is not a tuple

TypeError

If bound_experiment_hops_range is not a tuple

ValueError

If background_hops_range is not a tuple of length 2

ValueError

If unbound_experiment_hops_range is not a tuple of length 2

ValueError

If bound_experiment_hops_range is not a tuple of length 2

Source code in yeastdnnexplorer/probability_models/generate_data.py
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
def generate_binding_effects(
    gene_population: GenePopulation,
    background_hops_range: tuple[int, int] = (1, 100),
    unbound_experiment_hops_range: tuple[int, int] = (0, 1),
    bound_experiment_hops_range: tuple[int, int] = (1, 6),
    total_background_hops: int = 1000,
    total_experiment_hops: int = 76,
    pseudocount: float = 1e-10,
) -> torch.Tensor:
    """
    Generate enrichment effects for genes using vectorized operations, based on their
    bound designation, with separate experiment hops ranges for unbound and bound genes.

    Note that the default values are a scaled down version of actual data. See also
    https://github.com/cmatKhan/callingCardsTools/blob/main/callingcardstools/PeakCalling/yeast/enrichment.py

    :param gene_population: A GenePopulation object. See `generate_gene_population()`
    :type gene_population: GenePopulation
    :param background_hops_range: The range of hops for background genes. Defaults to
        (1, 100)
    :type background_hops_range: tuple[int, int], optional
    :param unbound_experiment_hops_range: The range of hops for unbound genes. Defaults
        to (0, 1)
    :type unbound_experiment_hops_range: tuple[int, int], optional
    :param bound_experiment_hops_range: The range of hops for bound genes. Defaults to
        (1, 6)
    :type bound_experiment_hops_range: tuple[int, int], optional
    :param total_background_hops: The total number of background hops. Defaults to 1000
    :type total_background_hops: int, optional
    :param total_experiment_hops: The total number of experiment hops. Defaults to 76
    :type total_experiment_hops: int, optional
    :param pseudocount: A pseudocount to avoid division by zero. Defaults to 1e-10
    :type pseudocount: float, optional
    :return: A tensor of enrichment values for each gene.
    :rtype: torch.Tensor
    :raises TypeError: If gene_population is not a GenePopulation object
    :raises TypeError: If total_background_hops is not an integer
    :raises TypeError: If total_experiment_hops is not an integer
    :raises TypeError: If pseudocount is not a float
    :raises TypeError: If background_hops_range is not a tuple
    :raises TypeError: If unbound_experiment_hops_range is not a tuple
    :raises TypeError: If bound_experiment_hops_range is not a tuple
    :raises ValueError: If background_hops_range is not a tuple of length 2
    :raises ValueError: If unbound_experiment_hops_range is not a tuple of length 2
    :raises ValueError: If bound_experiment_hops_range is not a tuple of length 2

    """
    # NOTE: torch intervals are half open on the right, so we add 1 to the
    # high end of the range to make it inclusive

    # check input
    if not isinstance(gene_population, GenePopulation):
        raise TypeError("gene_population must be a GenePopulation object")
    if not isinstance(total_background_hops, int):
        raise TypeError("total_background_hops must be an integer")
    if not isinstance(total_experiment_hops, int):
        raise TypeError("total_experiment_hops must be an integer")
    if not isinstance(pseudocount, float):
        raise TypeError("pseudocount must be a float")
    for arg, tup in {
        "background_hops_range": background_hops_range,
        "unbound_experiment_hops_range": unbound_experiment_hops_range,
        "bound_experiment_hops_range": bound_experiment_hops_range,
    }.items():
        if not isinstance(tup, tuple):
            raise TypeError(f"{arg} must be a tuple")
        if not len(tup) == 2:
            raise ValueError(f"{arg} must be a tuple of length 2")
        if not all(isinstance(i, int) for i in tup):
            raise TypeError(f"{arg} must be a tuple of integers")

    # Generate background hops for all genes
    background_hops = torch.randint(
        low=background_hops_range[0],
        high=background_hops_range[1] + 1,
        size=(gene_population.labels.shape[0],),
    )

    # Generate experiment hops unbound genes
    unbound_experiment_hops = torch.randint(
        low=unbound_experiment_hops_range[0],
        high=unbound_experiment_hops_range[1] + 1,
        size=(gene_population.labels.shape[0],),
    )
    # Generate experiment hops bound genes
    bound_experiment_hops = torch.randint(
        low=bound_experiment_hops_range[0],
        high=bound_experiment_hops_range[1] + 1,
        size=(gene_population.labels.shape[0],),
    )

    # Use bound designation to select appropriate experiment hops
    experiment_hops = torch.where(
        gene_population.labels == 1, bound_experiment_hops, unbound_experiment_hops
    )

    # Calculate enrichment for all genes
    return (experiment_hops.float() / (total_experiment_hops + pseudocount)) / (
        (background_hops.float() / (total_background_hops + pseudocount)) + pseudocount
    )