Introducing ClawBio and what 86% European GWAS participation means for the AI tools we build

Feb 25, 2026

Tomorrow night I will stand in front of 40 bioinformaticians at the London Bioinformatics Meetup and I will ask a simple question: where does your genomic data go when you use AI?

Nobody can answer this with certainty.

This is the problem. Not that AI isn’t powerful enough for bioinformatics. It absolutely is. The problem is that the tools we have today were not built for us.

The three failures of general-purpose AI in genomics

Privacy. Genomic data is the most sensitive data a person can produce. It identifies you, your relatives, and your disease risk, permanently. Yet most AI tools route data through cloud APIs. For clinical or biobank data, this is often a non-starter. For Indigenous and underrepresented populations, it is an ethical minefield.
Reproducibility. When a collaborator asks “how did you generate this figure?”, clicking through a ChatGPT conversation is not an answer. Science demands exact reproduction. That means the commands, the dependencies, and the data checksums. General-purpose AI does not think in terms of reproducibility bundles.
Domain expertise. A general model does not know that VCF files need ancestry-aware annotation. It does not know that single-cell RNA-seq data requires doublet removal before clustering. It does not know that a GWAS pipeline must account for population structure. Every bioinformatics analysis carries domain assumptions that generic tools miss.

ClawBio is my answer to all three problems.

What ClawBio is

ClawBio is a skill library built on OpenClaw (180k+ GitHub stars) that runs entirely on your laptop. No genomic data leaves your machine. Every analysis ships with commands.sh, environment.yml, SHA-256 checksums, and a full audit log.

At launch, three skills are ready for production use.

The Equity Scorer computes a Health Equity Index from any VCF file. It produces a single number between 0 and 100 that quantifies how well your dataset represents global population diversity. In the live demo, a 50-sample dataset scored 76 out of 100. The message was “Good, but European samples are overrepresented at 44%.” One number. Instant accountability.

The PharmGx Reporter takes consumer genetic data such as 23andMe or AncestryDNA and profiles 12 pharmacogenes across 51 medications using CPIC guidelines. It runs in under one second with zero cloud dependencies.

The Bio Orchestrator routes natural-language requests such as “analyse this VCF for population diversity” to the right skill automatically.

But the real announcement is not the software. It is the model.

Skills as infrastructure, not features

Every skill in ClawBio is a SKILL.md file. The file uses YAML frontmatter to describe capabilities and dependencies. It then includes markdown instructions that any Claude-compatible agent can follow. Optional Python or R scripts handle the computation. That is all.

This means any bioinformatician can contribute a skill by writing a structured markdown file and some scripts. There is no framework to learn. There is no SDK to install. There is no vendor lock-in.

We need skills for GWAS pipelines, metagenomics classifiers, clinical variant reporters, pathway enrichment, spatial transcriptomics, and proteomics analysis. The community that builds these skills is the community that shapes how AI gets used in our field.

Why this matters beyond bioinformatics

Around 86% of GWAS participants are of European ancestry. Polygenic risk scores derived from these studies perform poorly across populations. This is not just a data gap. It is a structural bias baked into the tools, the pipelines, and the assumptions.

If we build AI tools that do not account for population diversity by default, we amplify the problem. ClawBio puts equity into the infrastructure, not as an afterthought.

The repository is MIT licensed and live now: github.com/manuelcorpas/ClawBio

The talk will be broadcast live on my YouTube channel.