Question

N in fastq data

0

Entering edit mode

6.9 years ago

analytical • 0

Hi

I recently got 3 sample WGS for snp analysis from same genome. I see that my fastq file has Ns. How should I tackle this? If I do not trim this and take it for mapping onto my reference genome will the aligner ignore this N while mapping?

Total number of bases   45191688940
Number of base N    884912

Also the total number of bases in all 3 sample is different. How is this possible when sequencing was done for same genome of different samples?

s1_R1                s1_R2             s2_R1              s2_R2            s3_R1                s3_R2
45191688940 45191688940 43709052900 43709052900 53171402300 53171402300

fasta • 2.5k views

ADD COMMENT • link updated 23 months ago by Ram 44k • written 6.9 years ago by analytical • 0

score 1 · Answer 1 · 2018-01-04

1

Entering edit mode

6.9 years ago

Chris Miller 22k

I think you have some fundamental misunderstandings about how fastq files and sequencing work. You can have any number of short reads from a genome stored in a fastq file. That has no relationship to how large your target genome is. And yes, aligners will generally handle Ns appropriately.

ADD COMMENT • link 6.9 years ago by Chris Miller 22k

0

Entering edit mode

So what do you mean by total number of bases.?

ADD REPLY • link 6.9 years ago by analytical • 0

1

Entering edit mode

each fastq line has a length (for example, 100 bp). multiply that by the number of fastq entries and that's how many bases of sequence you have. Only after you align the data do you have the ability to talk about sequence coverage across the genome.

Seriously, though - There are many resources explaining the sequencing and alignment process. I recommend that you seek out and read some of them so that you understand this before proceeding.

ADD REPLY • link 6.9 years ago by Chris Miller 22k