Benchmark 3 v1: oriented contig and region order results analysis#

Only the non-perfectly successful instances are analysed.


Sufficient fail causes#

  • \(1_{mult} = 2\), while it should be \(3\).

    • Indeed, \(1_{cov} = 223\) and \(19_{cov} = 121\), which is the starter

Other observations#

  • contig \(0\), contig \(1\), contig \(2\) have no protein-DNA alignment length


Sufficient fail causes#

  • The starter \(0_{mult} = 1\) while contig \(0\) should belong to inverted repeat.


Sufficient fail causes#

  • Inversion in a single copy of (\(6_f\)) \(10_f 12_r 1_f\) (\(11_f\)) that becomes (\(6_f\)) \(1_r 12_f 10_r\) (\(11_f\))

    • Due to links \((6_f, 10_f)\), \((6_f, 1_r)\), \((1_f, 11_f)\) and \((10_r, 11_f)\)


The links \((11_r, 1_r)\) and \((10_r, 6_r)\) hide a little inverted repeat (74-75 bp), that Quast reveals:

  • \(ir\)

    • reference: 5 581 – 5 654 (74) bp

    • contig: 77 384 – 77 457 (74) bp

  • \(\overline{ir}\)

    • reference: 67 134 – 67 208 (75) bp

    • contig: 77 457 – 77 383 (75) bp


Sufficient fail causes#

  • IR are not equals, one IR has a SC of length \(1\) in it

    • Although \(7_{mult} = 1\) (expected), it is not taken otherwise it would decrease the value of the objective function.


Sufficient fail causes#

  • Missing link \((6_f, 4_f)\)

    • In link set there is \((6_f, 4_r)\) (obtained as a result) but no \((6_f, 4_f)\) (expected).

    • Surprisingly, there are \((9_f, 4_f)\) and \((9_f, 4_r)\)


Sufficient fail causes#

  • The starter \(4_{mult} = 1\), while it should be \(2\) as contig \(4\) belongs to an inverted repeat.


Sufficient fail causes#

  • Missing vertex contig \(12_f\) in the single-copy between \(15_r 12_f 13_r\): it may have participated in the path but because its \(12_{prob} = 0\), it would not increase the value of the objective function.


Sufficient fail causes#

  • Inversion of (\(10_f\)) \(2_f 1_r 3_r\) (\(5_f\)) due to links \((10_f, 3_f)\), \((10_f, 2_f)\), \((2_r, 5_f)\) and \((3_r, 5_f)\).


The links \((10_f, 2_f)\) and \((3_r, 5_f)\) hide a little inverted repeat (171 bp):

  • By aligning the reference at the positions 40kb-44kb to itself at positions 73kb-76kb, blast finds a perfect identity score at 42547-42717 aligned with 74795-74625 (plus/minus)

  • Note that \((10_f, 2_f)\) and \((10_f, 3_f)\) are highly similar (some SNPs)


Sufficient fail causes#

  • Get in IR \(2_r\) instead of \(2_r 10_f 9_f 2_r\): \(2_{mult} = 3\) while it should be \(= 4\).


Sufficient fail causes#

  • Link \((6_f, 11_r)\) has been taken instead of \((6_f, 7_r)\) ; so link \((2_f, 7_r)\) has been taken instead of \((2_f, 10_f)\)


Sufficient fail causes#

  • All good except that contig \(7\) is not chosen in the single-copy as \(7_{prob} = 0\)