I received 8 RNA sequences data from company. I would like to check the sequences deep and be sure that the company performed the sequencing with exact deep as requested. would you please let me know how I can do it?
Assuming that you have gzipped fastq files, then a simple zcat foo.fq.gz | wc -l will let you know. Just divide the resulting number by 4 and that will tell you if they met the minimum read number they guaranteed. Note that "depth", as typically defined, has no real meaning in RNAseq and should almost* never be used.
Thank you for your comment. I am very new in RNA seq data analysis. Would you please let me know why we should divided the number by 4?
ADD REPLY
• link
updated 2.8 years ago by
Ram
44k
•
written 10.0 years ago by
hana
▴
190
1
Entering edit mode
Because in fastq file, every fourth line denotes your reads, e.g. 2,6,10,14,18,22,26 and so on these lines in fastq is reads sequences, other lines are of different purpose;
1st line starts with '@' is your sequence ID for your read
2nd line is your read sequence
3rd line starts with '+' is something something
4th line is quality value of your read which is in 2nd line
The purpose of the + line was to indicate that the sequence lines were finished (the sequence can be multiline, even if that will break most tools since it's almost never done). These days, however, the + line is just an extra useless 2 bytes.
I figured out a single liner for that!
Thanks for your reply, would you please let me know why we should divided the resulting number by 4?
Thank you
There are 4 lines per read.
Thank you for your comment. I am very new in RNA seq data analysis. Would you please let me know why we should divided the number by 4?
Because in fastq file, every fourth line denotes your reads, e.g. 2,6,10,14,18,22,26 and so on these lines in fastq is reads sequences, other lines are of different purpose;
3rd line is usually left blank to conserve space. It usually held the ID of the read (again). Had a hearty chuckle at "something something" though :)
ha aha ha! I was not sure that what to write there, because I never understood the importance of third line :)
The purpose of the + line was to indicate that the sequence lines were finished (the sequence can be multiline, even if that will break most tools since it's almost never done). These days, however, the + line is just an extra useless 2 bytes.
The third line seems to have a rich glorious heritage!
Thank you for sharing the information