N in fastq data
1
0
Entering edit mode
6.9 years ago
analytical • 0

Hi

I recently got 3 sample WGS for snp analysis from same genome. I see that my fastq file has Ns. How should I tackle this? If I do not trim this and take it for mapping onto my reference genome will the aligner ignore this N while mapping?

Total number of bases   45191688940
Number of base N    884912

Also the total number of bases in all 3 sample is different. How is this possible when sequencing was done for same genome of different samples?

s1_R1                s1_R2             s2_R1              s2_R2            s3_R1                s3_R2
45191688940 45191688940 43709052900 43709052900 53171402300 53171402300
fasta • 2.5k views
ADD COMMENT
1
Entering edit mode
6.9 years ago

I think you have some fundamental misunderstandings about how fastq files and sequencing work. You can have any number of short reads from a genome stored in a fastq file. That has no relationship to how large your target genome is. And yes, aligners will generally handle Ns appropriately.

ADD COMMENT
0
Entering edit mode

So what do you mean by total number of bases.?

ADD REPLY
1
Entering edit mode

each fastq line has a length (for example, 100 bp). multiply that by the number of fastq entries and that's how many bases of sequence you have. Only after you align the data do you have the ability to talk about sequence coverage across the genome.

Seriously, though - There are many resources explaining the sequencing and alignment process. I recommend that you seek out and read some of them so that you understand this before proceeding.

ADD REPLY

Login before adding your answer.

Traffic: 2476 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6