Skip to main content
ByteFlow AI LabsByteFlow AI Labs

Science · Methods & Publications

Methods, tools & published work

Tool selection follows community benchmarks and peer-reviewed comparisons — not vendor relationships. Each tool listed here is in active use in project delivery. Where multiple tools address the same problem, the choice is documented in project methods sections.

Benchmark-guided selection

Tools are evaluated against independent benchmarks (GIAB for variant calling, DREAM challenge for somatic, PBMC datasets for single-cell) before adoption.

Explicit version pinning

Every analysis records exact software versions in environment.yml or container image digests. Results are labelled with the pipeline version that produced them.

Methods written to publication standard

Delivered methods sections follow the level of detail required by journals such as Nature Methods and Bioinformatics — sufficient for independent replication.

Methods Catalog

Tools by analysis domain

Quality Control & Preprocessing

5 tools
FastQCPer-base sequence quality, adapter content, k-mer overrepresentation
MultiQCAggregate QC reports across samples; Nextflow/nf-core integration
fastpAdapter trimming, quality filtering, and per-read QC (preferred over Trimmomatic for speed)
TrimmomaticSliding-window trimming; used for legacy or specific adapter configurations
NanoPlotLong-read QC for Oxford Nanopore data (read length, quality distributions)

Read Alignment & Mapping

6 tools
STARSplice-aware alignment for RNA-seq; supports two-pass mapping and fusion detection
HISAT2Memory-efficient RNA-seq aligner; graph-based alignment using known splice sites
BWA-MEM2Short-read alignment for DNA-seq (WGS, WES, ChIP-seq); 2× faster than BWA-MEM
minimap2Long-read alignment (PacBio CLR/HiFi, ONT); splice-aware mode for long-read RNA-seq
Bowtie2Short-read alignment for ChIP-seq and small RNA workflows
samtoolsSAM/BAM manipulation, sorting, indexing, flagstat, and depth calculation

Variant Analysis

7 tools
GATK4 / HaplotypeCallerGermline SNP and indel calling; cohort joint genotyping with GenomicsDBImport
Mutect2Somatic variant calling for tumour–normal pairs and tumour-only modes
DeepVariantDeep learning–based germline caller; consistently top-ranked in benchmarks (GIAB)
Strelka2Somatic small variant and indel calling; computational efficiency for large cohorts
bcftoolsVCF/BCF manipulation, filtering, merging, and stats
VEPVariant Effect Predictor (Ensembl); consequence annotation and CADD/SIFT/PolyPhen integration
SnpEff / ANNOVARFunctional annotation for non-Ensembl reference genomes (crop/livestock species)

RNA-seq Quantification & Differential Expression

7 tools
STAR + RSEMTranscript-level quantification with alignment-based estimates
SalmonAlignment-free transcript quantification; ultrafast, GC-bias correction
featureCountsGene-level count matrices from BAM files (Subread package)
DESeq2Negative binomial model for differential expression; recommended for most bulk RNA-seq designs
edgeREmpirical Bayes dispersion estimation; preferred for very small sample sizes (n < 3 per group)
limma/voomLinear modelling for complex experimental designs; ANOVA-type contrasts
fgseaFast gene set enrichment analysis (GSEA) and MSigDB pathway analysis

Single-cell & Spatial Transcriptomics

6 tools
Cell Ranger / STARsoloBarcode demultiplexing and UMI counting for 10x Genomics Chromium and similar platforms
scanpyPython-based single-cell analysis (clustering, trajectory, velocity); AnnData format
Seurat (R)R-based scRNA-seq analysis; reference-based label transfer with Azimuth
scVI / scANVIDeep generative models for batch correction and semi-supervised cell annotation
Monocle3 / PAGAPseudotime and trajectory inference for developmental analyses
Squidpy / MERFISH toolsSpatial transcriptomics analysis; neighbourhood enrichment and spatial statistics

Structural Biology & Molecular Modelling

5 tools
AlphaFold2 / ColabFoldProtein structure prediction from sequence; multimer complex prediction
ESMFoldLanguage model–based structure prediction; faster for large-scale screening
AutoDock Vina / GninaMolecular docking for drug candidate screening; CNN scoring with Gnina
GROMACSMolecular dynamics simulation; free energy perturbation for binding affinity
PyMOL / ChimeraXStructure visualisation and figure preparation for publications

AI & Machine Learning

6 tools
PyTorch / LightningPrimary deep learning framework; multi-GPU training and experiment logging
scikit-learnClassical ML (random forests, SVM, logistic regression) for biomarker discovery
XGBoost / LightGBMGradient-boosted trees for tabular genomic and clinical data
HuggingFace TransformersPre-trained genomic language models (DNABERT, Nucleotide Transformer, ESM-2)
SHAPShapley value–based feature importance for model interpretability
MLflowExperiment tracking, hyperparameter logging, and model versioning

Metagenomics & Microbiome

5 tools
Kraken2 / BrackenTaxonomic classification and abundance estimation from short reads
MetaPhlAn4Strain-level profiling using clade-specific markers
HUMAnN3Functional pathway abundance from shotgun metagenomics
MEGAHIT / MetaSPAdesDe novo metagenomic assembly
QIIME2Amplicon (16S/ITS) diversity analysis; DADA2 denoising integration

This list reflects tools in active use. It is not exhaustive — project-specific tools are documented in the methods section of each deliverable.

Publications

Published work

No records

Publications will appear here

Peer-reviewed articles, preprints, and datasets produced in or with project engagements will be listed here as they are submitted or published. Records include DOI, PMID, and author attribution.

Discuss a research collaboration

Research & Collaboration

Working on a research question?

We work with academic groups, research institutes, and industry R&D teams. Co-authorship and research collaboration terms are available for suitable projects.

Get in touch