Entering edit mode
6.9 years ago
raya.girish
▴
40
Hi I have 2 data
%QC >= 20
1_R1.fastq.gz 1_R2.fastq.gz 2_R1.fastq.gz 2_R2.fastq.gz
97.967 94.239 98.206 94.948
%QC >= 30
1_R1.fastq.gz 1_R2.fastq.gz 2_R1.fastq.gz 2_R2.fastq.gz
95.687 88.76 95.124 87.051
Is this quality good? This data is from allotetraploid genome. I am looking for SNPs and indel.
Again is it okay to do Denovo or reference based analysis of allotetraploid genome because my reference genome is available.
I don't know how you calculated this quality scores (which tool was used...) but I guess these are the fractions of reads which are above an average quality-score (20 and 30 in your example).
In that case, those reads look pretty good. But you say that you want to do variant calling on those, and quite a lot more things are important than, such as the number of reads and the distribution of the reads on the genome. Based on this numbers there is nothing to worry about.
A tool for having a look at the quality of reads is FastQC. You can make plots of various aspects of the data (but there is also no real reason to panic if something doesn't look great there).
Aggregate QC numbers like this mean little in terms of how well the dataset will work for downstream analysis. For what it is worth as others have already said it is good quality. You should move on to next step of alignment/SNP calling (assuming you have scanned/trimmed the data already).
You need to add more details. And please format your question better. Use the code (
101010
) formatting bar button.Example:
What's that?
Good for what purpose?
This is allotetraploid genome WGS obatined from illumina
Edit your question and add that information in there. Also add in what you would like to do with your data - where you're looking to go, that is. Without the ultimate goal in mind, there's no telling if you're headed in the right direction.
Ram I have added some detail please check the question