Question

Next Generation Sequencing

1

Entering edit mode

10.1 years ago

Elnaaz ▴ 40

Hi ,

I have my datas of NGS by illumina which sequenced to find SNPs and analyzed in bam files of tablet software but I can not understand the options and meaning of topics like: in this, if anybody knows to guide me?

M00358:6:000000000-A6HCE:1:1114:8253:87 ,,,, or Cigar.....M or the BLUE and Green strap (band) shows orientation?
What is exactly difference between read length and fragment length?
What is the coverage depth?
What is insert length?
What is properly and improperly paired?

snp genome next-gen-sequencing • 3.8k views

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Elnaaz ▴ 40

2

Entering edit mode

read the spec : http://samtools.github.io/hts-specs/SAMv1.pdf

ADD REPLY • link 10.1 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Dear Pierre,.

Thank you so much I could find the website which was in bellow of your email ,but I have problem to know about exactly what is depth of coverage ? insert size?fragment length? properly and improperly paired ,,,,and so on in the bam files of tablet software I have to find SNPs in my sequenced data,

I would be so thankful if you help me,

Best Regards, Eli,

ADD REPLY • link 10.1 years ago by Elnaaz ▴ 40

1

Entering edit mode

search + find : What Is The Sequencing 'Depth' ?

ADD REPLY • link 10.1 years ago by Pierre Lindenbaum 164k

2

Entering edit mode

In addition to reading the SAM spec., also google around for terms like "coverage depth" (aka "depth of coverage"), "insert length" and so on.

ADD REPLY • link 10.1 years ago by Devon Ryan 104k

0

Entering edit mode

But I do not understand :properly pair read (1/2) or 2/2 and improperly 1/2 or 2/2 ?

ADD REPLY • link 10.1 years ago by Elnaaz ▴ 40

0

Entering edit mode

Google more (though the 1/2 and 2/2 have no meaning without context).

ADD REPLY • link 10.1 years ago by Devon Ryan 104k

0

Entering edit mode

Yes I have in my tablet visualizer of illumina sequenced data these identification:

for example:

M00358:6:000000000-A6HCE:1:1114:9091:13603

From: 2.261 to 2.400

Length: 140 U140 (140 mismatches)

Cigar: 140M

Improperly paired (1/2 ) ,insert size 163

Read direction is FORWARD

>M00358:6:000000000-A6HCE:1:1114:9091:13603
GACAATATCCCCCCGTTATGACCAATACAAAGATGCTTGGGATACTGGCG
TTGCGGTTGAGGTACATCTTCCTATATTGATACGGTACAATATTGTTCTC
TTACATTTCCTGGTTCAAGAATGTGATCCGCTACTTTATC

What does it mean ? or some times its written properly paired (2/2).Can you please make graphic example to simplified this description same your other nice explanation?

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Elnaaz ▴ 40

0

Entering edit mode

Ah, in that context, (1/2) just means read #1 from the pair (you have paired-end reads). There are any number of reasons that could be improperly paired. Perhaps the observed insert size is much shorter than expected. Perhaps the other read has the wrong orientation relative to read #1. It's tough to say without knowing what read #2 is.

ADD REPLY • link 10.1 years ago by Devon Ryan 104k

0

Entering edit mode

So , IT depends on reading conditions,,

Butt I was thinking it s relevant to mate pair end and pair end reading which I also can not recognize difference between them, and cigar ? in top context for example?

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Elnaaz ▴ 40

0

Entering edit mode

The only way to tell the difference between mate-pair and paired-end is to look at read orientation and insert size. If two reads point toward each other and have a shortish insert size, then they aligned as paired-end. If they point away from each other and have a very large insert size, then they aligned as mate-pairs. Obviously telling an aligner that you have one type of data and feeding it the other will result in a lot of "improperly paired" reads. Keep in mind that it at least used to be the case the mate-pair libraries still contained a lot of paired.end reads (no clue if that's still the case).

For the CIGAR string, just read the SAM spec. It's mention in there. You really should be able to find most of this yourself via google.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Devon Ryan 104k

0

Entering edit mode

Yes its trueI can find them but I just wanted to make sure with your explanation ,,,due to your correct description,

Thanks again,

ADD REPLY • link 10.1 years ago by Elnaaz ▴ 40

Ram · Answer 1 · 2014-10-17

Okay.

When you sequence, you make libraries by cutting of your genomic DNA into small pieces (say 500bp).

And then attach adapters to both the ends of your fragment (say 25bp)

Now this is your fragment, and its length would be (25 + 500 + 25) bp.

Insert would be the sequence between your adapters. Insert length/size would be 500bp.

Now for paired end sequencing, you take your fragment and sequence some 'n' bases (say 100bp) starting from both the ends. (this is your - read length 100bp) . Why only 100bp, why not whole fragment you ask? Its mainly because the DNA becomes wobbly as your read length increases and it gets harder to sequence. But new sequencing technologies are overcoming this which can achieve several kbs of read length (look at ion torrent minion). So coming back, you cover 100 bp from right and left end of your fragment. The uncovered or the middle part of ~350bp (550-200) is the inner mate.

Coverage and Depth are being used interchangeably. Coverage would be what percentage of your target sequence is being covered by sequencing. Say if your genome is 1000bp long and you sequence it - align it and open it in tablet/IGV and you see that there is an area of 100bp where there are no reads! That part is not covered by the sequencer for some reason. So your coverage would be 90% (geneome covered[900]/genome size[1000]).

Depth is how many reads are there at a given position.

Frequently you see in articles they mention the coverage in terms of 'X'. If they say 50X, that is on an average each base in target sequence is covered by 50 reads. One can calculate this X coverage by simple calculation:

(Number of mapped reads * read length) / genome size

Sincere apologies if anything is wrong here.

Source: Numerous Biostar and Seqanswers posts, this blog.