Base quality scores are essential to short read variant calling

27 May 2020

In an earlier post a few days ago, I said “discarding base quality dramatically reduces variant calling accuracy”. I didn’t provide evidence. This certainly doesn’t sound persuasive. In this post, I will show an experiement to support my claim.

I downloaded high-coverage short reads for sample HG002 from GIAB ftp, converted to unsorted FASTQ with samtools collate, mapped them to hs37d5 (for compatibility with GIAB) with bwa-mem, called variants with GATK v4 and compared the calls to the GIAB truth v4.1. I then estimated the false negative rate (FNR=1-sensitivity) and false discovery rate (FDR=1-precision) with RTG’s vcfeval. I optionally applied the hard filters proposed in my earlier paper. For “no quality”, I set all base quality to Q25 which corresponds to the average empirical error rate of this dataset.

# qual bins	Filtered	SNP FNR	SNP FDR	INDEL FNR	INDEL FDR
Lossless	No	0.58%	0.46%	0.63%	0.25%
8 bins	No	0.58%	0.45%	0.66%	0.26%
2 bins	No	0.59%	0.95%	0.55%	0.34%
No quality	No	0.60%	6.38%	0.64%	0.44%
Lossless	Yes	2.54%	0.07%	2.27%	0.06%
8 bins	Yes	2.52%	0.07%	2.30%	0.06%
2 bins	Yes	2.53%	0.11%	2.24%	0.07%
No quality	Yes	2.71%	0.20%	2.51%	0.08%
HiFi; no qual	No	0.80%	0.10%	1.46%	1.29%

Several comments:

If we completely drop base quality, the SNP FDR becomes 10 times higher. Most of additional false calls are due to low ALT allele fraction. Hard filtering can improve this metric but the resulting SNP FDR is still twice as high. Base quality scores are essential to accurate variant calling. For somatic mutation calling, short reads without base quality are virtually useless.
Using 2 quality bins (i.e. good/bad) gives a dramtic improvement over no-quality, though the result is not as good as 8-binning.
The accuracy of variants called with 8 quality bins is indistinguishable from the accuracy with the original quality. The file size of the sorted 8-binning alignment in CRAM is less than a quarter of the size of the orignal input in gzip’d FASTQ.
I guess using 4 quality bins may achieve the best balance between storage and accuracy. The GATK team reached this conclusion years ago. I forgot what was the exact binning scheme in use, so I am not including an experiment here.
The last line in the table evaluates dipcall variants called from a HiFi trio binning assembly. Hifiasm is the only assembler to date that can achieve this accuracy.