Question

remove scaffold and other unplaced sequence before mapping ?

0

Entering edit mode

8.9 years ago

yongxpeng • 0

Hi,
I downloaded reference genomes from Ensembl (fasta format). But there are lots of sequences with name "dna:scaffold": https://github.com/CTLife/TEMP/tree/master/RefGenomes

Such as Mouse_GRCm38 (mm10), except chromosome 1-19, Mt, X and Y; others should be removed before mapping ? https://github.com/CTLife/TEMP/blob/master/RefGenomes/Mouse_GRCm38.p4.txt

Such as Human_GRCh38.p5 (hg38), https://github.com/CTLife/TEMP/blob/master/RefGenomes/Human_GRCh38.p5.txt, there are 516 sequences. In addition to chromosome 1-22, Mt, X and Y; others (such as CHR_HG2241_PATCH and KI270728.1) should be removed before mapping ?

RNA-Seq ChIP-Seq genome sequencing next-gen • 4.7k views

ADD COMMENT • link updated 8.9 years ago by abascalfederico ★ 1.2k • written 8.9 years ago by yongxpeng • 0

score 1 · Answer 1 · 2016-03-29

1

Entering edit mode

8.9 years ago

abascalfederico ★ 1.2k

The latest release of the human genome (don't know about mice) contains alternative contigs. You will need an alternative-contig aware algorithm like BWA: https://github.com/lh3/bwa/blob/master/README-alt.md

If you are not using one of this kind of algorithms it would be better to remove the alternative contigs. That's because a read may map to multiple alternative contigs and be (incorrectly) considered a non-uniquely mapped read.

HTH

ADD COMMENT • link 8.9 years ago by abascalfederico ★ 1.2k

0

Entering edit mode

OK, thank you. I am using BWA, Bowtie2 and Subread for ChIP-seq reads mapping. But for RNA-seq reads, the alternative contigs must be removed ?
How do you think about https://sequencing.qcfail.com/articles/genomic-sequence-not-in-the-genome-assembly-creates-mapping-artefacts/ ? It is a nice explanation of why we might not want to remove those extra sequences until after mapping.

ADD REPLY • link 8.9 years ago by yongxpeng • 0

0

Entering edit mode

If I understood well that link is about repetitive sequences, not about alternative contigs

For RNA-seq... it depends. For example, if you want to analyse HLA genes, which are highly diverse, you would need the alternative contigs. I guess most people just ignore alternative contigs because of the increase in complexity.

ADD REPLY • link 8.9 years ago by abascalfederico ★ 1.2k