Question

How does a read, which is complementary to the reference genome, map to the reference?

1

Entering edit mode

9.2 years ago

yanweng ▴ 80

I have used STAR to map my reads to hg19. I was wondering how a read, which is complementary to the reference genome, gets aligned to the reference? I assume the software will automatically generate the complementary strand of the reference, and map the reads to the correct position, but how I could know whether the read has the same sequence as the reference itself or its complementary strand?

Another question is about "samtools mpileup". I am doing a Illumina paired-end RNA-seq, and the sequencing results contain read 1 and read 2. If I see a dot, does it mean there is a read (either from read 1 or read 2) map to the reference; or a read 1 map to reference (maybe the reference genome itself or its complementary)? Same question for the common. Does it mean a read (either read1 or read 2) map to the reference; or one read from read 1 map to the the reference sequence or its complementary? I will appreciate any help!

RNA-Seq genome next-gen-sequencing alignment • 5.8k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.2 years ago by yanweng ▴ 80

Ram · Answer 1 · 2016-02-11

5

Entering edit mode

9.2 years ago

Devon Ryan 105k

Aligners just reverse complement the reads. You can know whether a given read is reverse complemented by looking at the SAM flag. If bit 16 is set, then it's been reverse complemented.

In mpileup, a dot means a match to the + strand, a comma is the same for the reverse complement. For variant calling, of course, it doesn't really matter which strand is covered (though it's good to know if you have a bias).

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.2 years ago by Devon Ryan 105k

0

Entering edit mode

Thanks Devon! Now I understand the mapper actually revers complement the reads instead of generating the complementary strand of reference for mapping. Just want to clarify, when you mention "+ strand", you mean the reference sequence, or the sequencing reads which has not been reverse complemented?

ADD REPLY • link updated 2.8 years ago by Ram 45k • written 9.2 years ago by yanweng ▴ 80

0

Entering edit mode

I need to know whether the gene is on plus reference strand, or on the minus strand. Because I am studying mRNA post-transcriptional modification, and I will call A to G change if the gene is on plus strand; T to C change if the gene is on minus strand. How should I separate these two conditions?

ADD REPLY • link updated 2.8 years ago by Ram 45k • written 9.2 years ago by yanweng ▴ 80

1

Entering edit mode

Ah, in this case you're going to want to use pysam, since I assume you have strand-specific RNAseq data and you'll need to process reads according to the strand from which each fragment originated, rather than how the read itself aligned. If you're unfamiliar with that then I would strongly encourage you to find a bioinformatician (ideally locally) to collaborate with. That'll make your life a lot easier :)

ADD REPLY • link updated 2.8 years ago by Ram 45k • written 9.2 years ago by Devon Ryan 105k