The Database Interface Classes¶

For each API endpoint exposed in the Django app, there is a corresponding class that provide methods to execute CRUD operations asynchronously.

There are two types of API endpoints -- those that contain only records data, and those that store both records and pointers to files.

Connecting to the Database¶

The database currently runs on HTCF service partition. This is a single node with 8 CPU and 30 GB that is meant for long running low resource jobs. The components that need to run are a postgres database, a redis instance and the django app. As long as these components are running on the service partition, you can connect via an ssh tunnel with:

ssh username@login.htcf.wustl.edu -N -L 8001:n240:8000

where the 8001:n240:8000 takes the form of local_port:cluster_node:app_port. The django app will always be served on port 8000, and n240 is the only service partition node. You may choose a different local port.

If you do this and cannot connect, let me know and I'll check the status of the jobs on the cluster.

Database username and password¶

Once you have a tunnel, you can access the database frontend at 127.0.0.1:8001 (or a different local port, if you changed that number). If you haven't already signed up, you'll need to click the 'sign up' button and follow the instructions. The e-mail server is not hooked up at the moment, so when it says "see the e-mail", send a slack message and let me know. I'll give you a link to complete the sign up process. After that, you can just use the "sign in" button.

For computational tasks, including generating rank response data, celery workers must be launched on the HTCF general partition. There is currently a script that is meant to monitor the redis queue and launch/kill these workers automatically, but this functionality is new and largely untested. You can monitor the workers/tasks if you create another tunnel with:

ssh username@login.htcf.wustl.edu -N -L 8002:n240:5555

You'd access this dashboard at 127.0.0.1:5555

The login is currently:

raw
username: "flower"
password: "daisy"

(yes, really -- the security comes from the fact that you need to login with HTCF)

Configuring the Database Interface Classes¶

The database classes expect the following environmental variables to be set.

raw
BASE_URL='http://127.0.0.1:8001'
TOKEN='<your token>'
BINDING_URL='http://127.0.0.1:8001/api/binding'
BINDINGMANUALQC_URL='http://127.0.0.1:8001/api/bindingmanualqc'
CALLINGCARDSBACKGROUND_URL='http://127.0.0.1:8001/api/callingcardsbackground'
DATASOURCE_URL='http://127.0.0.1:8001/api/datasource'
EXPRESSION_URL='http://127.0.0.1:8001/api/expression'
EXPRESSIONMANUALQC_URL='http://127.0.0.1:8001/api/expressionmanualqc'
FILEFORMAT_URL='http://127.0.0.1:8001/api/fileformat'
GENOMICFEATURE_URL='http://127.0.0.1:8001/api/genomicfeature'
PROMOTERSET_URL='http://127.0.0.1:8001/api/promoterset'
PROMOTERSETSIG_URL='http://127.0.0.1:8001/api/promotersetsig'
REGULATOR_URL='http://127.0.0.1:8001/api/regulator'

This can be achieved in the package during development with a .env file at the top most level of the package. The .env file is loaded in the package __init__.py.

If you are importing yeastdnnexplorer into a different environment, then you'll need to add the package dotenv and execute load_dotenv(dotenv_path=env_path). If the .env file is in the same PWD in which you execute that command, there is no need to specify a path.

Token Authentication¶

Once you have a username and password to the database, you can retrieve your token. Make sure that you put this token, at least, in a .env file, and make sure that .env file is in your .gitignore.

Alternatively, you could retrieve and store in memory the token at the beginning of each session -- this is more secure if you are not using a .env file.

The .env file is already in the yeastddnexplorer .gitignore

curl -X 'POST' \
  'http://127.0.0.1:8001/auth-token/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "username": "username",
  "password": "password"
}'

Or with python:

import requests

url = "http://127.0.0.1:8001/auth-token/"
headers = {
    "accept": "application/json",
    "Content-Type": "application/json",
}
data = {
    "username": "username",
    "password": "password",
}

response = requests.post(url, json=data, headers=headers)
print(response.text)

Using the Interface Classes¶

In [1]:

Copied!

from yeastdnnexplorer.interface import *
import matplotlib.pyplot as plt
from yeastdnnexplorer.interface import *
import matplotlib.pyplot as plt

Records Only Endpoints¶

The records only endpoints are:

BindingManualQC
DataSource
ExpressionManualQC
FileFormat
GenomicFeature
PromoterSetSig
Regulator

When the read() method is called on the corresponding API classes, a dataframe will be returned in the response.

All of the read() methods, for both types of API endpoints, return the result of a callable. By default, the callable returns a dictionary with two keys: metadata and data. For response only tables, the metadata value will be the records from the database as a pandas dataframe and the data will be `None.

Example -- RegulatorAPI¶

In [2]:

Copied!

regulator = RegulatorAPI()

result = await regulator.read()
result.get("metadata")
regulator = RegulatorAPI()

result = await regulator.read()
result.get("metadata")

Out[2]:

	id	uploader	modifier	regulator_locus_tag	regulator_symbol	upload_date	modified_date	under_development	notes	genomicfeature
0	1	chasem	chasem	YAL051W	OAF1	2024-07-01	2024-07-01T12:47:18.619129-05:00	False	none	24
1	2	chasem	chasem	YBL103C	RTG3	2024-07-01	2024-07-01T12:47:19.667722-05:00	False	none	140
2	3	chasem	chasem	YBL066C	SEF1	2024-07-01	2024-07-01T12:47:20.523161-05:00	False	none	186
3	4	chasem	chasem	YBL054W	TOD6	2024-07-01	2024-07-01T12:47:21.309606-05:00	False	none	199
4	5	chasem	chasem	YBL052C	SAS3	2024-07-01	2024-07-01T12:47:22.161007-05:00	False	none	201
...	...	...	...	...	...	...	...	...	...	...
1809	1810	chasem	chasem	YOR262W	GPN2	2024-07-01	2024-07-01T14:14:36.164403-05:00	False	none	6387
1810	1811	chasem	chasem	YPR190C	RPC82	2024-07-01	2024-07-01T14:14:38.921261-05:00	False	none	7070
1811	1812	chasem	chasem	YPL228W	CET1	2024-07-01	2024-07-01T14:15:51.518999-05:00	False	none	6603
1812	1813	chasem	chasem	YKL049C	CSE4	2024-07-01	2024-07-01T14:15:56.555122-05:00	False	none	4083
1813	1814	chasem	chasem	YMR168C	CEP3	2024-07-01	2024-07-01T14:22:14.060524-05:00	False	none	5258

1814 rows × 10 columns

Record and File Endpoints¶

The record and file endpoints are the following:

CallingCardsBackground
Expression
PromoterSet
PromoterSetSig
RankResponse *

The default read() method is the same as the Records only Endpoint API classes. However, there is an additional argument, retrieve_files which if set to True will retrieve the file for which each record provides metadata. The return value of read() is again a callable, and by default the data key will store a dictionary where the keys correspond to the id column in the metadata.

In [3]:

Copied!





# First, retrieve only the records -- you'll want to filter these results down before
# retrieving the files most likely
pss_api = PromoterSetSigAPI()
result = await pss_api.read()
result.get("metadata")
# First, retrieve only the records -- you'll want to filter these results down before
# retrieving the files most likely
pss_api = PromoterSetSigAPI()
result = await pss_api.read()
result.get("metadata")

Out[3]:

	id	uploader	modifier	single_binding	composite_binding	fileformat	background	upload_date	modified_date	file	promoter	source	regulator_symbol	regulator_locus_tag	condition	rank_recall	data_usable	background_name
0	1	chasem	chasem	1.0	NaN	1	NaN	2024-07-01	2024-07-01T13:51:48.611781-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	NaN	harbison_chip	OAF1	YAL051W	YPD	unreviewed	unreviewed	NaN
1	2	chasem	chasem	2.0	NaN	1	NaN	2024-07-01	2024-07-01T13:51:49.643452-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	NaN	harbison_chip	PDR3	YBL005W	YPD	unreviewed	unreviewed	NaN
2	3	chasem	chasem	3.0	NaN	1	NaN	2024-07-01	2024-07-01T13:51:50.744384-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	NaN	harbison_chip	HIR1	YBL008W	YPD	unreviewed	unreviewed	NaN
3	4	chasem	chasem	4.0	NaN	1	NaN	2024-07-01	2024-07-01T13:51:51.507918-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	NaN	harbison_chip	HAP3	YBL021C	YPD	unreviewed	unreviewed	NaN
4	5	chasem	chasem	5.0	NaN	1	NaN	2024-07-01	2024-07-01T13:51:52.277595-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	NaN	harbison_chip	TOD6	YBL054W	YPD	unreviewed	unreviewed	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2237	7011	admin	admin	NaN	145.0	5	1.0	2024-07-30	2024-07-30T16:39:36.457965-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	4.0	mitra_cc	CBF1	YJR060W	NaN	unreviewed	pass	ad1
2238	7012	admin	admin	NaN	146.0	5	1.0	2024-07-30	2024-07-30T16:39:36.848168-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	4.0	mitra_cc	GCN4	YEL009C	NaN	unreviewed	pass	ad1
2239	7013	admin	admin	NaN	147.0	5	1.0	2024-07-30	2024-07-30T16:39:37.234144-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	4.0	mitra_cc	OAF1	YAL051W	NaN	unreviewed	pass	ad1
2240	7014	admin	admin	NaN	148.0	5	1.0	2024-07-30	2024-07-30T16:39:38.547155-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	4.0	mitra_cc	YOX1	YML027W	NaN	unreviewed	pass	ad1
2241	7015	admin	admin	NaN	149.0	5	1.0	2024-07-30	2024-07-30T16:39:39.713590-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	4.0	mitra_cc	LEU3	YLR451W	NaN	unreviewed	pass	ad1

2242 rows × 18 columns

Filtering¶

All API classes have a params attribute which stores the filtering parameters which will be applied to the HTTP requests.

In [4]:

Copied!

pss_api.push_params({"regulator_symbol": "GZF3",
                     "workflow": "nf_core_callingcards_1_0_0",
                     "data_usable": "pass"})
pss_api.push_params({"regulator_symbol": "GZF3",
                     "workflow": "nf_core_callingcards_1_0_0",
                     "data_usable": "pass"})

In [5]:

Copied!

pss_api.params
pss_api.params

Out[5]:

ParamsDict({'regulator_symbol': 'GZF3', 'workflow': 'nf_core_callingcards_1_0_0', 'data_usable': 'pass'})

Retrieving files from a Records and Files Object¶

To retrieve files from a Records and Files endpoint object, do the following:

In [6]:

Copied!

# note that retrieve_files is set to True
result = await pss_api.read(retrieve_files = True)

# the metadata slot is the same as before
result.get("metadata")
# note that retrieve_files is set to True
result = await pss_api.read(retrieve_files = True)

# the metadata slot is the same as before
result.get("metadata")

Out[6]:

	id	uploader	modifier	single_binding	composite_binding	fileformat	background	upload_date	modified_date	file	promoter	source	regulator_symbol	regulator_locus_tag	condition	rank_recall	data_usable	background_name
0	6577	admin	admin	1837	NaN	5	1	2024-07-01	2024-07-01T16:43:50.145871-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	4	brent_nf_cc	GZF3	YJL110C	unknown	pass	pass	ad1
1	6580	admin	admin	1841	NaN	5	1	2024-07-01	2024-07-01T16:43:50.968078-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	4	brent_nf_cc	GZF3	YJL110C	unknown	pass	pass	ad1
2	6642	admin	admin	1902	NaN	5	1	2024-07-01	2024-07-01T16:43:54.969507-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	4	brent_nf_cc	GZF3	YJL110C	unknown	pass	pass	ad1
3	6651	admin	admin	1911	NaN	5	1	2024-07-01	2024-07-01T16:43:55.326651-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	4	brent_nf_cc	GZF3	YJL110C	unknown	pass	pass	ad1
4	6717	admin	admin	1960	NaN	5	1	2024-07-01	2024-07-01T16:44:00.500038-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	4	brent_nf_cc	GZF3	YJL110C	unknown	pass	pass	ad1

In [7]:

Copied!

# but now the data slot is a dictionary where the `id` are keys and the values
# are the files parsed into pandas dataframes
result.get("data").get("6568")
# but now the data slot is a dictionary where the `id` are keys and the values
# are the files parsed into pandas dataframes
result.get("data").get("6568")

Filtering on multiple items¶

Some filters, and eventually all, will accept multiple arguments as a comma separated string without spaces. For example:

In [8]:

Copied!

pss_api.push_params({"regulator_symbol": "GZF3,RTG3"})

print(pss_api.params)
pss_api.push_params({"regulator_symbol": "GZF3,RTG3"})

print(pss_api.params)

regulator_symbol: GZF3,RTG3, workflow: nf_core_callingcards_1_0_0, data_usable: pass

Parameters can be removed one by one

In [9]:

Copied!

print(pss_api.params)
print(pss_api.params)

regulator_symbol: GZF3,RTG3, workflow: nf_core_callingcards_1_0_0, data_usable: pass

In [10]:

Copied!

pss_api.pop_params('data_usable')

print(pss_api.params)
pss_api.pop_params('data_usable')

print(pss_api.params)

regulator_symbol: GZF3,RTG3, workflow: nf_core_callingcards_1_0_0

or cleared entirely

In [11]:

Copied!

pss_api.pop_params(None)

print(pss_api.params)
pss_api.pop_params(None)

print(pss_api.params)

Another example with Expression¶

This is used to get an expression_id to use in the RankResponseAPI() below

In [12]:

Copied!

expression = ExpressionAPI()

expression.push_params({"regulator_symbol": "GZF3",
                        "lab": "mcisaac",
                        "time": "15"})

expression_res = await expression.read()

expression_res.get("metadata")
expression = ExpressionAPI()

expression.push_params({"regulator_symbol": "GZF3",
                        "lab": "mcisaac",
                        "time": "15"})

expression_res = await expression.read()

expression_res.get("metadata")

Out[12]:

	id	uploader	modifier	regulator_id	regulator_locus_tag	regulator_symbol	source_name	assay	upload_date	modified_date	...	replicate	control	mechanism	restriction	time	file	notes	regulator	source	promotersetsig_processing
0	2516	chasem	chasem	3578	YJL110C	GZF3	mcisaac_oe	overexpression	2024-07-01	2024-07-01T13:36:13.814201-05:00	...	1	undefined	gev	N	15.0	https://yeastregulatorydb-htcf-data.s3.amazona...	strain_id:SMY156n ; date:20150101	135	7	False
1	2510	chasem	chasem	3578	YJL110C	GZF3	mcisaac_oe	overexpression	2024-07-01	2024-07-01T13:36:08.810276-05:00	...	1	undefined	gev	P	15.0	https://yeastregulatorydb-htcf-data.s3.amazona...	strain_id:SMY156 ; date:20150101	135	7	False

2 rows × 22 columns

Rank Response¶

The rank response endpoint is slightly different than the others. It is implemented asynchronously on the database side, and will run many tasks simultaneously. As such, it uses submit() and retrieve() methods.

Additionally, it is a POST request and all parameters are passed in a json list. If you include, for a given dictionary, multiple items to promoetersetsig_ids and/or expression_ids, those datasets will be aggregated prior to calculating the rank response. The current parameters that may be passed for each are:

promotersetsig_ids: a list of promotersetsig_ids. If more than 1, then the data will be aggregated prior to calcuating rank response
expression_ids: a list of expression_ids. If more than 1, then the data will be aggregated prior to calcuating rank response
expression_effect_colname: name of the column to use for the rank response effect
expression_effect_threshold: The threshold to use on abs(effect) to label responsive/ unresponsive genes
expression_pvalue_threshold: the threshold to use below which to label responsive/ unresponsive genes
rank_bin_size: the size of the bins by which rank response is summarized.
rank_by_binding_effect: if this is "true", then rank by the binding effect first rather than pvalue. This is used for harbison_chip and mcisaac_oe
summarize_by_rank_bin: if this is set to false, the unsummarized rank response (merged binding/response) is returned

In [13]:

Copied!





rr_api = RankResponseAPI()

data = [
    {
        "promotersetsig_ids": ["5555"],
        "expression_ids": ["2510"],
        "rank_by_binding_effect": "true",
    }
]

group_id = await rr_api.submit(post_dict=data)

result = await rr_api.retrieve(group_id)
rr_api = RankResponseAPI()

data = [
    {
        "promotersetsig_ids": ["5555"],
        "expression_ids": ["2510"],
        "rank_by_binding_effect": "true",
    }
]

group_id = await rr_api.submit(post_dict=data)

result = await rr_api.retrieve(group_id)

In [14]:

Copied!

result.get("metadata")
result.get("metadata")

Out[14]:

	regulator_symbol	promotersetsig_ids	expression_ids	n_responsive	total_expression_genes	id
0	GZF3	5555	2510	810	6175.0	a46a15db-8318-476e-80aa-af5588909eef

In [15]:

Copied!

result.get("data").get(result.get("metadata").id[0])
result.get("data").get(result.get("metadata").id[0])

Out[15]:

	rank_bin	n_responsive_in_rank	random	n_successes	response_ratio	pvalue	ci_lower	ci_upper
0	5	2	0.131174	2	0.400000	0.131211	0.052745	0.853367
1	10	3	0.131174	5	0.500000	0.005510	0.187086	0.812914
2	15	4	0.131174	9	0.600000	0.000027	0.322870	0.836636
3	20	0	0.131174	9	0.450000	0.000490	0.230578	0.684722
4	25	2	0.131174	11	0.440000	0.000149	0.244024	0.650718
5	30	1	0.131174	12	0.400000	0.000224	0.226558	0.593965
6	35	1	0.131174	13	0.371429	0.000296	0.214732	0.550769
7	40	0	0.131174	13	0.325000	0.001278	0.185729	0.491295
8	45	1	0.131174	14	0.311111	0.001363	0.181659	0.466491
9	50	0	0.131174	14	0.280000	0.004946	0.162311	0.424905
10	55	1	0.131174	15	0.272727	0.004378	0.161380	0.409619
11	60	0	0.131174	15	0.250000	0.011526	0.147186	0.378596

In [16]:

Copied!





df = result.get("data").get(result.get("metadata").id[0]).iloc[1:50,:]

# Plot
plt.figure(figsize=(10, 6))
plt.plot(df['rank_bin'], df['response_ratio'], marker='o')
plt.title('GZF3 - promotersetsig_id 6577. McIsaac 15 2510')
plt.xlabel('Rank Bin')
plt.ylabel('Response Ratio')
plt.title('Response Ratio vs Rank Bin')
plt.grid(True)
plt.show()
df = result.get("data").get(result.get("metadata").id[0]).iloc[1:50,:]

# Plot
plt.figure(figsize=(10, 6))
plt.plot(df['rank_bin'], df['response_ratio'], marker='o')
plt.title('GZF3 - promotersetsig_id 6577. McIsaac 15 2510')
plt.xlabel('Rank Bin')
plt.ylabel('Response Ratio')
plt.title('Response Ratio vs Rank Bin')
plt.grid(True)
plt.show()

No description has been provided for this image

CallingCards Aggregated Data¶

For regulators which have data generated in the Brentlab and processed through the nf-core/callingcards:1.0.0 workflow, if that data has been manually (or eventually automatically) QC reviewed, and if there are at least 2 samples which are labeled as data_usable, then there will exist a BindingConcatenated record to which both a BindingManualQC record and a PromoterSetSig record are foreign keyed.

To view the set of samples for which there is aggregated data, you may do the following:

In [17]:

Copied!

pss_api.pop_params(None)

pss_api.push_params({"source_name": "brent_nf_cc", "aggregated": "true"})

callingcards_aggregated_meta_res = await pss_api.read()

callingcards_aggregated_meta_df = callingcards_aggregated_meta_res.get("metadata")
pss_api.pop_params(None)

pss_api.push_params({"source_name": "brent_nf_cc", "aggregated": "true"})

callingcards_aggregated_meta_res = await pss_api.read()

callingcards_aggregated_meta_df = callingcards_aggregated_meta_res.get("metadata")

In [18]:

Copied!

callingcards_aggregated_meta_df
callingcards_aggregated_meta_df

Out[18]:

	id	uploader	modifier	single_binding	composite_binding	fileformat	background	upload_date	modified_date	file	promoter	source	regulator_symbol	regulator_locus_tag	rank_recall	data_usable	background_name
0	6867	admin	admin	NaN	6	5	1	2024-07-06	2024-07-06T11:08:33.125644-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	4	brent_nf_cc	WTM1	YOR230W	pass	pass	ad1
1	6868	admin	admin	NaN	3	5	1	2024-07-06	2024-07-06T11:08:33.125640-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	4	brent_nf_cc	MIG2	YGL209W	pass	pass	ad1
2	6869	admin	admin	NaN	1	5	1	2024-07-06	2024-07-06T11:08:33.119818-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	4	brent_nf_cc	CAT8	YMR280C	pass	pass	ad1
3	6870	admin	admin	NaN	5	5	1	2024-07-06	2024-07-06T11:08:33.125605-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	4	brent_nf_cc	PDR1	YGL013C	pass	pass	ad1
4	6871	admin	admin	NaN	4	5	1	2024-07-06	2024-07-06T11:08:33.129564-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	4	brent_nf_cc	PHO4	YFR034C	pass	pass	ad1
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
66	6933	admin	admin	NaN	67	5	1	2024-07-06	2024-07-06T11:08:45.567814-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	4	brent_nf_cc	ABF2	YMR072W	fail	pass	ad1
67	6934	admin	admin	NaN	68	5	1	2024-07-06	2024-07-06T11:08:45.921894-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	4	brent_nf_cc	USV1	YPL230W	pass	pass	ad1
68	6935	admin	admin	NaN	69	5	1	2024-07-06	2024-07-06T11:08:46.166036-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	4	brent_nf_cc	MGA1	YGR249W	pass	pass	ad1
69	6936	admin	admin	NaN	70	5	1	2024-07-06	2024-07-06T11:08:46.246482-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	4	brent_nf_cc	CIN5	YOR028C	pass	pass	ad1
70	6937	admin	admin	NaN	71	5	1	2024-07-06	2024-07-06T11:08:47.987665-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	4	brent_nf_cc	ROX1	YPR065W	pass	pass	ad1

71 rows × 17 columns

In [19]:

Copied!

pss_api.push_params({"regulator_symbol": "GZF3"})

callingcards_res = await pss_api.read()

gzf3_callingcards_res = callingcards_res.get("metadata")

gzf3_callingcards_res

#gzf3_callingcards_res[~gzf3_callingcards_res.composite_binding_id.isna()]
pss_api.push_params({"regulator_symbol": "GZF3"})

callingcards_res = await pss_api.read()

gzf3_callingcards_res = callingcards_res.get("metadata")

gzf3_callingcards_res

#gzf3_callingcards_res[~gzf3_callingcards_res.composite_binding_id.isna()]

Out[19]:

	id	uploader	modifier	single_binding	composite_binding	fileformat	background	upload_date	modified_date	file	promoter	source	regulator_symbol	regulator_locus_tag	rank_recall	data_usable	background_name
0	6873	admin	admin	NaN	7	5	1	2024-07-06	2024-07-06T11:08:33.117009-05:00	https://yeastregulatorydb-htcf-data.s3.amazona...	4	brent_nf_cc	GZF3	YJL110C	pass	pass	ad1

In [20]:

Copied!





data = [
    {
        "promotersetsig_ids": ["6873"],
        "expression_ids": ["2510"],
    }
]

group_id = await rr_api.submit(post_dict=data)

agg_result = await rr_api.retrieve(group_id)
data = [
    {
        "promotersetsig_ids": ["6873"],
        "expression_ids": ["2510"],
    }
]

group_id = await rr_api.submit(post_dict=data)

agg_result = await rr_api.retrieve(group_id)

In [21]:

Copied!





agg_df = agg_result.get("data").get(agg_result.get("metadata").id[0]).iloc[1:50,:]

# Plot
plt.figure(figsize=(10, 6))
plt.plot(agg_df['rank_bin'], agg_df['response_ratio'], marker='o')
plt.title('Aggregated GZF3. McIsaac 15 2510')
plt.xlabel('Rank Bin')
plt.ylabel('Response Ratio')
plt.title('Response Ratio vs Rank Bin')
plt.grid(True)
plt.show()
agg_df = agg_result.get("data").get(agg_result.get("metadata").id[0]).iloc[1:50,:]

# Plot
plt.figure(figsize=(10, 6))
plt.plot(agg_df['rank_bin'], agg_df['response_ratio'], marker='o')
plt.title('Aggregated GZF3. McIsaac 15 2510')
plt.xlabel('Rank Bin')
plt.ylabel('Response Ratio')
plt.title('Response Ratio vs Rank Bin')
plt.grid(True)
plt.show()

Caveats¶

I have written the scripts to automatically check the redis queue for work and to both launch celery worker nodes, and kill them when they are finished. But, though they work if I run them manually, they have not worked when scheduled through a cronjob. I'll work with Brian and Eric next week to figure out why.
I haven't tested each of the endpoint APIs individually. Help is welcome.