Contributing to tfbpmodeling¶
We welcome contributions to tfbpmodeling! This guide will help you get started with contributing code, documentation, or bug reports.
Getting Started¶
Development Setup¶
-
Fork and Clone
# Fork the repository on GitHub git clone https://github.com/YOUR_USERNAME/tfbpmodeling.git cd tfbpmodeling # Add upstream remote git remote add upstream https://github.com/BrentLab/tfbpmodeling.git -
Install Dependencies
# Install Poetry if you haven't already pip install poetry # Configure Poetry (recommended) poetry config virtualenvs.in-project true # Install all dependencies including dev tools poetry install -
Set Up Pre-commit Hooks
poetry run pre-commit install
Development Workflow¶
-
Create Feature Branch
# Start from dev branch git checkout dev git pull upstream dev # Create feature branch git checkout -b feature/your-feature-name -
Make Changes
- Write code following our style guidelines
- Add tests for new functionality
-
Update documentation as needed
-
Test Your Changes
# Run tests poetry run pytest # Run with coverage poetry run pytest --cov --cov-branch --cov-report=xml # Check code style poetry run black . poetry run flake8 poetry run mypy tfbpmodeling -
Commit and Push
git add . git commit -m "Add feature: description of your changes" git push origin feature/your-feature-name -
Create Pull Request
- Open a pull request against the
devbranch - Provide clear description of changes
- Reference any related issues
Project Structure¶
tfbpmodeling/
├── tfbpmodeling/ # Main package
│ ├── __main__.py # CLI entry point
│ ├── interface.py # Main workflow functions
│ ├── modeling_input_data.py
│ ├── bootstrapped_input_data.py
│ ├── bootstrap_model_results.py
│ ├── evaluate_interactor_significance_*.py
│ ├── stratified_cv*.py
│ ├── utils/ # Utility functions
│ └── tests/ # Test suite
├── docs/ # Documentation (MkDocs)
├── tmp/ # Exploratory development
├── pyproject.toml # Poetry configuration
├── mkdocs.yml # Documentation configuration
├── CLAUDE.md # Claude Code instructions
└── README.md
Core Modules¶
__main__.py: CLI entry point with argparse setupinterface.py: Main workflow orchestration and CLI functionsmodeling_input_data.py: Data loading and preprocessingbootstrapped_input_data.py: Bootstrap resampling functionalitybootstrap_model_results.py: Results aggregation and statisticsevaluate_interactor_significance_*.py: Statistical significance testingstratified_cv*.py: Cross-validation with stratificationutils/: Helper functions for data manipulation
Exploratory Development¶
The tmp/ directory is set up for exploratory data analysis and interactive development:
- Jupyter notebooks: Can be run from
tmp/in the virtual environment - iPython kernel: Installed in the development environment
- Version control: Files in
tmp/are excluded from git tracking - Testing:
tmp/directory is ignored by pytest - Experimentation: Safe space for prototyping and data exploration
See tmp/README.md for more information about using this directory.
Code Standards¶
Style Guidelines¶
We use automated tools to maintain consistent code style:
- Black: Code formatting (88 character line length)
- Flake8: Style checking and linting
- MyPy: Type checking
- isort: Import sorting
Code Quality¶
- Type Hints: All functions should have type hints
- Docstrings: Use Sphinx-style docstrings for all public functions
- Tests: Write tests for all new functionality
- No Secrets: Never commit API keys, passwords, or other secrets
Example Code Style¶
from typing import List, Optional, Tuple
import numpy as np
import pandas as pd
def process_data(
data: pd.DataFrame,
threshold: float = 0.05,
normalize: bool = True
) -> Tuple[pd.DataFrame, List[str]]:
\"\"\"
Process input data with filtering and normalization.
:param data: Input dataframe with features as columns
:param threshold: Minimum value threshold for filtering
:param normalize: Whether to normalize data to unit variance
:return: Processed dataframe and list of excluded features
\"\"\"
excluded_features: List[str] = []
# Filter low-variance features
for col in data.columns:
if data[col].var() < threshold:
excluded_features.append(col)
processed_data = data.drop(columns=excluded_features)
if normalize:
processed_data = (processed_data - processed_data.mean()) / processed_data.std()
return processed_data, excluded_features
Testing¶
Test Structure¶
Tests are located in tfbpmodeling/tests/:
tests/
├── test_interface.py
├── test_modeling_input_data.py
├── test_bootstrapped_input_data.py
├── test_utils.py
└── fixtures/
├── sample_expression.csv
└── sample_binding.csv
Writing Tests¶
Use pytest for all tests:
import pytest
import pandas as pd
from tfbpmodeling.modeling_input_data import ModelingInputData
class TestModelingInputData:
\"\"\"Test suite for ModelingInputData class.\"\"\"
def test_initialization(self, sample_data_files):
\"\"\"Test basic initialization.\"\"\"
data = ModelingInputData(
response_file=sample_data_files['response'],
predictors_file=sample_data_files['predictors'],
perturbed_tf='YPD1'
)
assert data is not None
assert len(data.get_feature_names()) > 0
def test_invalid_perturbed_tf(self, sample_data_files):
\"\"\"Test error handling for invalid perturbed TF.\"\"\"
with pytest.raises(KeyError, match="not found in response"):
ModelingInputData(
response_file=sample_data_files['response'],
predictors_file=sample_data_files['predictors'],
perturbed_tf='INVALID_TF'
)
@pytest.mark.parametrize("normalize", [True, False])
def test_normalization_options(self, sample_data_files, normalize):
\"\"\"Test normalization parameter.\"\"\"
data = ModelingInputData(
response_file=sample_data_files['response'],
predictors_file=sample_data_files['predictors'],
perturbed_tf='YPD1',
normalize_weights=normalize
)
# Test that normalization was applied correctly
assert data.normalize_weights == normalize
@pytest.fixture
def sample_data_files(tmp_path):
\"\"\"Create sample data files for testing.\"\"\"
# Create sample response data
response_data = pd.DataFrame({
'sample1': [0.1, 0.2, 0.3],
'sample2': [0.4, 0.5, 0.6],
'YPD1': [0.7, 0.8, 0.9]
}, index=['gene1', 'gene2', 'gene3'])
# Create sample predictor data
predictor_data = pd.DataFrame({
'TF1': [0.1, 0.2, 0.3],
'TF2': [0.4, 0.5, 0.6]
}, index=['gene1', 'gene2', 'gene3'])
# Save to temporary files
response_file = tmp_path / "response.csv"
predictor_file = tmp_path / "predictors.csv"
response_data.to_csv(response_file, index_label='gene_id')
predictor_data.to_csv(predictor_file, index_label='gene_id')
return {
'response': str(response_file),
'predictors': str(predictor_file)
}
Running Tests¶
# Run all tests
poetry run pytest
# Run specific test file
poetry run pytest tfbpmodeling/tests/test_interface.py
# Run with coverage
poetry run pytest --cov=tfbpmodeling --cov-report=html
# Run tests matching pattern
poetry run pytest -k "test_modeling"
# Run tests with verbose output
poetry run pytest -v
Documentation¶
Documentation Structure¶
Documentation is built with MkDocs and uses the Material theme:
docs/
├── index.md
├── getting-started/
│ ├── installation.md
│ └── quickstart.md
├── cli/
│ ├── overview.md
│ └── linear-perturbation-binding-modeling.md
├── tutorials/
│ ├── basic-workflow.md
│ └── advanced-features.md
├── api/
│ ├── interface.md
│ └── modeling_input_data.md
└── development/
├── contributing.md
└── testing.md
Writing Documentation¶
- Use clear, concise language
- Include code examples for all features
- Provide both basic and advanced usage examples
- Link between related sections
Building Documentation¶
# Serve documentation locally with live reload
mkdocs serve
# Build documentation
mkdocs build
# Deploy to GitHub Pages (maintainers only)
poetry run mkdocs gh-deploy
Issue Reporting¶
Bug Reports¶
When reporting bugs, please include:
- Environment Information
- Python version
- tfbpmodeling version
-
Operating system
-
Reproduction Steps
- Minimal code example
- Input data characteristics
-
Expected vs actual behavior
-
Error Messages
- Complete error traceback
- Log output if available
Feature Requests¶
For feature requests, provide:
- Use Case: Describe the problem you're trying to solve
- Proposed Solution: How you envision the feature working
- Alternatives: Other approaches you've considered
- Examples: Code examples of how it would be used
Development Guidelines¶
Branch Management¶
- main: Stable release branch
- dev: Development branch for integration
- feature/*: Feature development branches
- hotfix/*: Critical bug fixes
Commit Messages¶
Use clear, descriptive commit messages:
Add bootstrap confidence interval calculation
- Implement percentile method for CI estimation
- Add support for custom confidence levels
- Include tests for edge cases
- Update documentation with examples
Code Review Process¶
- All changes require pull request review
- CI tests must pass
- Code coverage should not decrease
- Documentation must be updated for new features
- At least one maintainer approval required
Release Process¶
- Features merged to
devbranch - Testing and integration on
dev - Release candidate created
- Final testing and documentation review
- Merge to
mainand tag release - Update changelog and documentation
Getting Help¶
Communication Channels¶
- GitHub Issues: Bug reports and feature requests
- GitHub Discussions: Questions and general discussion
- Pull Request Reviews: Code-specific feedback
Maintainer Contact¶
- Chase Mateusiak: Lead developer
- Michael Brent: Principal investigator
Resources¶
Troubleshooting¶
Development Environment Issues¶
Poetry Installation Problems¶
If Poetry installation fails, try the alternative installation method:
pip install poetry
Virtual Environment Issues¶
If you encounter virtual environment problems:
# Remove existing environment
poetry env remove python
# Reinstall dependencies
poetry install
Pre-commit Hook Failures¶
If pre-commit hooks fail during commits:
# Run pre-commit manually to see specific issues
poetry run pre-commit run --all-files
# Fix any reported issues and commit again
Documentation Build Issues¶
If mkdocs build fails:
# Check for missing dependencies
poetry install
# Try building with verbose output
mkdocs build --verbose
# Check configuration
mkdocs config
Test Failures¶
If tests fail unexpectedly:
# Run tests with verbose output
poetry run pytest -v
# Run specific failing test
poetry run pytest path/to/failing_test.py::test_name -v
# Check test dependencies
poetry run pytest --collect-only
Common Development Issues¶
Import Errors¶
# Ensure package is installed in development mode
poetry install
# Check Python path
python -c "import tfbpmodeling; print(tfbpmodeling.__file__)"
Module Not Found¶
# Verify virtual environment is activated
which python
poetry env info
# Reinstall in development mode
poetry install --no-deps
Permission Errors¶
# Check file permissions
ls -la
# Fix permissions if needed
chmod +x scripts/your_script.sh
Getting Help with Development¶
If you encounter issues not covered here:
- Search existing issues: Check if someone else has faced the same problem
- Create a detailed issue: Include error messages, environment info, and steps to reproduce
- Join discussions: Use GitHub Discussions for questions and help
- Contact maintainers: Reach out directly for urgent issues
Recognition¶
Contributors are recognized in:
- CHANGELOG.md: Feature and bug fix credits
- AUTHORS.md: Comprehensive contributor list
- Release Notes: Major contribution highlights
- Documentation: Author attributions for significant additions
Thank you for contributing to tfbpmodeling!