Question

Alignment To Selected Region Of The Reference Genome

5

Entering edit mode

12.8 years ago

Houkto ▴ 220

Hi all,

I usually align my NGS reads of rat strains to Brown Norway (rat) reference genome. However, the rat reference genome still a draft and there are many gaps in the genome. Recently, we obtained one of these gaps (3 fasta files) that we are interested in by the group who will release the next assembly. The question now how I can integrate these fasta files (we know the coordinations) into the reference genome .fa file?

Would aligning our reads against these fasta files would subsequently give us the SNPs and Indel of the new fasta files regions or shall the way to do it is by integrating the files into the genome (would like to know how) is the only way to get the variants in the new regions ?

Many thanks

reference alignment fasta sequence • 3.2k views

ADD COMMENT • link updated 12.8 years ago by Swbarnes2 ★ 1.6k • written 12.8 years ago by Houkto ▴ 220

0

Entering edit mode

Are you mapping RNA-Seq reads, and the new contigs contain missing exons, or you have genomic NGS reads only?

ADD REPLY • link 12.8 years ago by Darked89 4.7k

0

Entering edit mode

Hi darked89, I am mapping DNA-seq reads

ADD REPLY • link 12.8 years ago by Houkto ▴ 220

score 1 · Answer 1 · 2012-02-17

1

Entering edit mode

12.8 years ago

Swbarnes2 ★ 1.6k

In general, you will get the most accurate results if the fasta you align your reads to contains all the genome that the sample has. If you have a read that aligns with one difference to your new sequence, but aligns with no differences somewhere else, you want your reference fasta to have both, so that the read goes to the right place. You don't want the alignerer forcing it to align to the wrong place, because you didn't put the right place in your reference.

So make a combined multi-fasta with all the sequences you believe the rat genome has, and realign to that.

If you are asking how to concatenate two files together, use the unix command 'cat'.

ADD COMMENT • link 12.8 years ago by Swbarnes2 ★ 1.6k

0

Entering edit mode

Hi swbarnes2, my new gaps are small in size for specific genes we asked for and they are in blast "-m 8" format. We are not sure if they contain new sequence alone or it has some old sequences (same in the reference genome) or are some differences to the old differences. I am just trying to do the simpliest way to exploit SNPs and indels from these new gaps. What in my mind now is to align the reads to one of the gaps (indexed with BWA) alone without the reference genome. Then i will try to call variants. But from what you are saying is i need to integrate it to the reference genome (cont.)

ADD REPLY • link 12.8 years ago by Houkto ▴ 220

0

Entering edit mode

the way i can achieve that is either by vim the reference genome then scroll to chromosomes in which the gap are located then paste the the gap sequences in it. Or shall I do cat gap1.fa gap2.fa RatreferenceGenome.fa. if that is the way to do it, what will happen if the new sequences contain some of the sequences that already existed in the reference genome will that affect the SNPs indels ? or there is another way of integrating the gaps sequences into the reference genome in which it intergated in its right place and we can trim the duplicated reads ? So which approach shall I do and why.Thx

ADD REPLY • link 12.8 years ago by Houkto ▴ 220