Entering edit mode
9.5 years ago
dpidad
•
0
Given a sequence as below:
@SRR211279.25468524 HWUSI-EAS404_106009863:7:120:17892:21339 length=200
CCAACCTCTACCCATNACCCAGTTCCGAAGTTGCTTCCACATTTTCAGGTATCTTTATAGNNATGCTCCAGTCCTCATTTGCCATTTTTGGTAANANTTANCTNTGTANTCTCCGNNNTNNNCNCTNGCNATNTNANANNNTTCANTNNNNNNNNNNNNNNNANNNNNANTNANANATTTCNGAGCCCCCCCAGANGCAG
+SRR211279.25468524 HWUSI-EAS404_106009863:7:120:17892:21339 length=200
IIIGIIIIIIGGGGG%DEEEEDBDEIHIIIHIIIIIIIHIDHIHHIIIGGIGIIIGEEEE%%;==><;>>IIIIHIIIIIIIIIIIIIIIGDDD%8%;;8%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
What is the length of the sequence? 200 bases? or 200base pair(bp) or 100bp? What tool is recommended for measuring the length in "bp" for FASTA/FASTQ File?
Devon, you can use
echo -n
(no need to subtract 1) :-)dpidad, try fastqc for measuring lengths of your reads in fastq. More solutions here: Sequence Length Distribution From A Fastq File
Just goes to show, regardless of how long I've been using the CLI, there's always something useful to learn! :o)
Thanks for the clarification.
I got this read file from ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR211/SRR211279/SRR211279.sra. When extracted (using fastq-dump) got the SRR211279.fastq file with each reads of length 200bp. However, came across a paper referring "SRR211279 (25.23M 100bp paired-end reads generated by Illumina GAIIx) from the Washington University Genome.
From where to get the SRR211279 100bp paired end read files?
I'm using this files with Soap3-dp, which needs 2 read files for pair-end rund. Venturing new into these topics, any pointers would be helpful.
You forgot the --split-files or --split-spots option. fastq-dump is not the best designed program in the world, since it really should do this automatically.