Measuring sequence length in "bp"
1
0
Entering edit mode
9.5 years ago
dpidad • 0

Given a sequence as below:

@SRR211279.25468524 HWUSI-EAS404_106009863:7:120:17892:21339 length=200
CCAACCTCTACCCATNACCCAGTTCCGAAGTTGCTTCCACATTTTCAGGTATCTTTATAGNNATGCTCCAGTCCTCATTTGCCATTTTTGGTAANANTTANCTNTGTANTCTCCGNNNTNNNCNCTNGCNATNTNANANNNTTCANTNNNNNNNNNNNNNNNANNNNNANTNANANATTTCNGAGCCCCCCCAGANGCAG
+SRR211279.25468524 HWUSI-EAS404_106009863:7:120:17892:21339 length=200
IIIGIIIIIIGGGGG%DEEEEDBDEIHIIIHIIIIIIIHIDHIHHIIIGGIGIIIGEEEE%%;==><;>>IIIIHIIIIIIIIIIIIIIIGDDD%8%;;8%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

What is the length of the sequence? 200 bases? or 200base pair(bp) or 100bp? What tool is recommended for measuring the length in "bp" for FASTA/FASTQ File?

genome sequence • 4.6k views
ADD COMMENT
0
Entering edit mode
9.5 years ago

Technically, base pairs refers to double-stranded sequences. Practically speaking, however, "base pair" and "bases" are equivalent, so the length is 200 regardless. BTW, wc -c will count the number of characters in a line for you (you'll need to subtract 1 from the result).

ADD COMMENT
1
Entering edit mode

Devon, you can use echo -n(no need to subtract 1) :-)

echo -n ACGT | wc -c
4

dpidad, try fastqc for measuring lengths of your reads in fastq. More solutions here: Sequence Length Distribution From A Fastq File

ADD REPLY
0
Entering edit mode

Just goes to show, regardless of how long I've been using the CLI, there's always something useful to learn! :o)

ADD REPLY
0
Entering edit mode

Thanks for the clarification.

I got this read file from ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR211/SRR211279/SRR211279.sra. When extracted (using fastq-dump) got the SRR211279.fastq file with each reads of length 200bp. However, came across a paper referring "SRR211279 (25.23M 100bp paired-­end reads generated by Illumina GAIIx) from the Washington University Genome.
From where to get the SRR211279 100bp paired end read files?

I'm using this files with Soap3-dp, which needs 2 read files for pair-end rund. Venturing new into these topics, any pointers would be helpful.

ADD REPLY
0
Entering edit mode

You forgot the --split-files or --split-spots option. fastq-dump is not the best designed program in the world, since it really should do this automatically.

ADD REPLY

Login before adding your answer.

Traffic: 1791 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6