Entering edit mode
6.7 years ago
mikysyc2016
▴
120
Hi all, I download a sra file from pubmed and transfer it to fastq, but when i open it, the file is weird.please see below:
@SRR653007.1 ILLUMINA-8879DC:227:2:1:1030:943:0:1 length=35
NTGACTGGATGCCTGGGTTGATGCTGTTGTTATTT
+SRR653007.1 ILLUMINA-8879DC:227:2:1:1030:943:0:1 length=35
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@SRR653007.2 ILLUMINA-8879DC:227:2:1:1063:946:0:1 length=35
NATTGTGTTTTTCTCATTTTCCGTAATTTTCAGTT
+SRR653007.2 ILLUMINA-8879DC:227:2:1:1063:946:0:1 length=35
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@SRR653007.3 ILLUMINA-8879DC:227:2:1:1102:948:0:1 length=35
NGGTTCAGATCATCTGCACAGGTCACGTCAGCCCT
I do not know whether the %%%%%% will effect the analysis. Can i get your suggestion? Thanks
Nothing is wrong with your file, it's a Fastq file which encodes the nucleotidic sequence and their quality, the "%" is a low-quality value in Phred+33 scale. Check https://en.wikipedia.org/wiki/FASTQ_format
I also download other sra file, they are look like common. Thank you for your reply. Does that mean the some part of the file quality is not good enough?
You can draw no useful conclusions by looking at a few reads out of possible millions in a file. Go ahead and do some proper QC/analysis.
That's a bit unlikely. Pubmed doesn't have sra files, but is a database of academic literature.
SRA has sra files.
Before digging deeper into this: What is the question you want to answer, what kind of data is this and most importantly, why do you find the % qualities strange? Are you familiar with fastq files in general?