Hello,
I have downloaded SRA data from NCBI and converted it into Fastq file (Pair end sequences), then I analysed the sequences using fastqc. The following results I got, which I think ok. But still confuse. Can anybody shade some light on this aspect. (Per base sequence quality http://dropcanvas.com/#aG53s1AtEBgLv1)
##FastQC 0.11.2
>>Basic Statistics pass
#Measure Value
Filename PPlf.fastq
File type Conventional base calls
Encoding Sanger / Illumina 1.9
Total Sequences 24000000
Sequences flagged as poor quality 0
Sequence length 75
%GC 44
Can I use fastq sequences derived from SRA format directly for assembly and scaffolding purpose? Or else I will have to do pre-processing like removal of low quality reads,trimming of low quality bases,adapter removal?
regards
rahul
Thank you very much for showing interest in my post.
What about masking instead of teaming? can then I use trimmed sequences for the assembling and scaffolding?
regards
rahul
Yes, you should use trimmed for assembly. Having adapters and low quality bases in your sequences will significantly impact the assembly.
Why do you want to mask instead of trim?
Before 2 days back I have attended online NCBI NGS online workshop where I heard about masking over trimming.I have not gone through this paper but presenter acknowledged this paper.
Paper link: http://www.ncbi.nlm.nih.gov/pubmed/25494997
regards
rahul
I have never done it and I don't think it would change much but you can try masking if you prefer. Thx for the link btw !
That paper refers specifically to SNP calling.
yes you are right.... that was my mistake