Skip to content

Generate Gene Population

Generate two sets of genes, one of which will be considered genes which show a bound, and the other which does not. The return is a one dimensional boolean tensor where a value of ‘0’ means that the gene at that index is part of the unbound group and a ‘1’ means the gene at that index is part of the bound group. The length of the tensor is the number of genes in this simulated organism.

Parameters:

Name Type Description Default
total int

The total number of genes. defaults to 1000

1000
bound_group float

The proportion of genes in the bound group. defaults to 0.3

0.3

Returns:

Type Description
GenePopulation

A one dimensional tensor of boolean values where the set of indices with a value of ‘1’ are the bound group and the set of indices with a value of ‘0’ are the unbound group.

Raises:

Type Description
TypeError

if total is not an integer

ValueError

If bound_group is not between 0 and 1

Source code in yeastdnnexplorer/probability_models/generate_data.py
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
def generate_gene_population(
    total: int = 1000, bound_group: float = 0.3
) -> GenePopulation:
    """
    Generate two sets of genes, one of which will be considered genes which show a
    bound, and the other which does not. The return is a one dimensional boolean tensor
    where a value of '0' means that the gene at that index is part of the unbound group
    and a '1' means the gene at that index is part of the bound group. The length of the
    tensor is the number of genes in this simulated organism.

    :param total: The total number of genes. defaults to 1000
    :type total: int, optional
    :param bound_group: The proportion of genes in the bound group. defaults to 0.3
    :type bound_group: float, optional
    :return: A one dimensional tensor of boolean values where the set of indices with a
        value of '1' are the bound group and the set of indices with a value of '0' are
        the unbound group.
    :rtype: GenePopulation
    :raises TypeError: if total is not an integer
    :raises ValueError: If bound_group is not between 0 and 1

    """
    if not isinstance(total, int):
        raise TypeError("total must be an integer")
    if not 0 <= bound_group <= 1:
        raise ValueError("bound_group must be between 0 and 1")

    bound_group_size = int(total * bound_group)
    logger.info("Generating %s genes with bound", bound_group_size)

    labels = torch.cat(
        (
            torch.ones(bound_group_size, dtype=torch.bool),
            torch.zeros(total - bound_group_size, dtype=torch.bool),
        )
    )[torch.randperm(total)]

    return GenePopulation(labels)