CLI Reference Overview¶
tfbpmodeling provides a comprehensive command-line interface for transcription factor binding and perturbation modeling. The CLI is designed to be user-friendly while offering advanced options for power users.
Main Command Structure¶
python -m tfbpmodeling [GLOBAL_OPTIONS] COMMAND [COMMAND_OPTIONS]
Global Options¶
Options that apply to all commands:
| Option | Description | Default | Choices |
|---|---|---|---|
--log-level |
Set logging verbosity | INFO |
DEBUG, INFO, WARNING, ERROR, CRITICAL |
--log-handler |
Log output destination | console |
console, file |
Available Commands¶
Currently, tfbpmodeling provides one main command:
linear_perturbation_binding_modeling: Complete workflow for TFBP analysis
Command Help System¶
Getting Help¶
Display main help:
python -m tfbpmodeling --help
Display command-specific help:
python -m tfbpmodeling linear_perturbation_binding_modeling --help
Help Output Format¶
The CLI uses a custom help formatter that organizes options into logical groups:
- Input: Data files and basic parameters
- Feature Options: Feature engineering and selection
- Binning Options: Data stratification parameters
- Parameters: Model configuration and thresholds
- Output: Result directories and naming
- System: Performance and logging options
Common Usage Patterns¶
Basic Analysis¶
python -m tfbpmodeling linear_perturbation_binding_modeling \
--response_file data.csv \
--predictors_file binding.csv \
--perturbed_tf YPD1
Reproducible Analysis¶
python -m tfbpmodeling linear_perturbation_binding_modeling \
--response_file data.csv \
--predictors_file binding.csv \
--perturbed_tf YPD1 \
--random_state 42 \
--log-level DEBUG \
--log-handler file
High-Performance Analysis¶
python -m tfbpmodeling linear_perturbation_binding_modeling \
--response_file data.csv \
--predictors_file binding.csv \
--perturbed_tf YPD1 \
--n_cpus 16 \
--n_bootstraps 5000
Exit Codes¶
The CLI uses standard exit codes:
- 0: Success
- 1: General error (invalid arguments, file not found, etc.)
- 2: Modeling error (convergence failure, insufficient data, etc.)
Logging¶
Log Levels¶
| Level | Description | When to Use |
|---|---|---|
DEBUG |
Detailed diagnostic information | Development, troubleshooting |
INFO |
General information about progress | Normal operation |
WARNING |
Warning messages about potential issues | Production monitoring |
ERROR |
Error messages for recoverable problems | Error investigation |
CRITICAL |
Critical errors that stop execution | System failures |
Log Handlers¶
Console Handler (default)¶
Outputs log messages to the terminal with color coding:
--log-handler console
File Handler¶
Saves log messages to a timestamped file:
--log-handler file
Creates log files named: tfbpmodeling_YYYYMMDD-HHMMSS.log
Example Log Output¶
2024-01-15 14:30:22,123 - INFO - Starting linear perturbation binding modeling
2024-01-15 14:30:22,125 - INFO - Loading response data from: data/expression.csv
2024-01-15 14:30:22,234 - INFO - Loading predictor data from: data/binding.csv
2024-01-15 14:30:22,456 - INFO - Perturbed TF: YPD1
2024-01-15 14:30:22,458 - INFO - Starting Stage 1: Bootstrap modeling on all data
2024-01-15 14:30:22,459 - DEBUG - Bootstrap parameters: n_bootstraps=1000, random_state=None
Configuration Files¶
While tfbpmodeling doesn't currently support configuration files, you can create shell scripts or aliases for commonly used parameter combinations:
Shell Script Example¶
#!/bin/bash
# run_analysis.sh
RESPONSE_FILE="$1"
PREDICTORS_FILE="$2"
PERTURBED_TF="$3"
OUTPUT_DIR="${4:-./results}"
python -m tfbpmodeling linear_perturbation_binding_modeling \
--response_file "$RESPONSE_FILE" \
--predictors_file "$PREDICTORS_FILE" \
--perturbed_tf "$PERTURBED_TF" \
--output_dir "$OUTPUT_DIR" \
--n_bootstraps 2000 \
--squared_pTF \
--ptf_main_effect \
--iterative_dropout \
--random_state 42 \
--log-level INFO \
--log-handler file
Usage:
./run_analysis.sh expression.csv binding.csv YPD1 my_results
Bash Alias Example¶
# Add to ~/.bashrc or ~/.bash_profile
alias tfbp-basic='python -m tfbpmodeling linear_perturbation_binding_modeling'
alias tfbp-advanced='python -m tfbpmodeling linear_perturbation_binding_modeling --n_bootstraps 2000 --squared_pTF --ptf_main_effect --iterative_dropout --random_state 42'
Usage:
tfbp-basic --response_file data.csv --predictors_file binding.csv --perturbed_tf YPD1
Error Handling¶
Common Error Messages¶
File Not Found¶
ERROR: Response file not found: data/missing_file.csv
Invalid Perturbed TF¶
ERROR: Perturbed TF 'INVALID_TF' not found in response file columns
Insufficient Data¶
ERROR: Insufficient data after filtering. Found 5 samples, minimum required: 10
Convergence Issues¶
WARNING: LassoCV failed to converge for 15/1000 bootstrap samples
--max_iter or check data preprocessing
Debugging Tips¶
- Start with defaults: Use minimal parameters first
- Enable debug logging: Add
--log-level DEBUG - Use file logging: Add
--log-handler fileto preserve logs - Check input data: Verify file formats and content
- Reduce complexity: Lower bootstrap samples for initial testing
Performance Considerations¶
Memory Usage¶
- Memory usage scales with: number of features × number of samples × bootstrap samples
- For large datasets, consider reducing
--n_bootstrapsor--top_n
CPU Usage¶
- Set
--n_cpusto match your system capabilities - Each LassoCV call uses the specified number of CPUs
- Default of 4 CPUs works well for most systems
Runtime Estimation¶
Approximate runtime factors: - Bootstrap samples: Linear scaling - Feature count: Quadratic scaling with regularization - Sample count: Linear scaling - CPU cores: Near-linear speedup
For typical datasets (1000 features, 100 samples): - 1000 bootstraps: ~10-30 minutes - 2000 bootstraps: ~20-60 minutes - 5000 bootstraps: ~1-3 hours
Next Steps¶
- Linear Perturbation Binding Modeling: Detailed documentation for the main command
- Tutorials: Step-by-step examples
- API Reference: Programmatic usage documentation