07 July 2014

“Abreak” is a subcommand of htsbox, which is a little toy forked from on the lite branch of htslib. It takes an assembly-to-reference alignment as input and counts the number of alignment break points. An earlier version was used in my fermi paper to measure the missassembly rate of human de novo assemblies. A typical output looks like:

Number of unmapped contigs: 239
Total length of unmapped contigs: 54588
Number of alignments dropped due to excessive overlaps: 0
Mapped contig bases: 2933399461
Mapped N50: 6241
Number of break points: 102146
Number of Q10 break points longer than (0,100,200,500)bp: (28719,7206,4644,3222)
Number of break points after patching gaps short than 500bp: 94298
Number of Q10 break points longer than (0,100,200,500)bp after gap patching: (23326,5320,3369,2194)

Here it gives the mapped contig bases, mapped N50 and number of break points with flanking sequences longer than 0, 100, 200 and 500bp.

Although GAGE-B and QUAST are more powerful, the use of MUMmer limits them to small genomes only. In contrast, “abreak” works with any aligners supporting chimeric alignment. When BWA-SW or BWA-MEM is used to map contigs, “abreak” can easily and efficiently work with mammal-sized assemblies.

UPDATE on 11/24/2014: changed links to htslib/htsbox. BTW, the results vary with mapping parameters. If contigs and references are very close to each other, it is recommended to use:

bwa mem -B9 -O16 -E1

for mapping. The default works better when the divergence is high.



blog comments powered by Disqus