Difference between total number of reads in fastq file and no of bases/nt sequences in fastq file?
1
1
Entering edit mode
3.1 years ago
Fizzah ▴ 30

Hello there; I am a beginner in Data analysis domain and want to clear my concept number of reads and no of bases in fastq file. what is the difference between total number of reads in fastq file and no of bases/nt sequences in fastq file?

What command will return out put for total number of reads and total no of bases?

FASTQ • 3.7k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
1
Entering edit mode
3.1 years ago
Mensur Dlakic ★ 28k

Each read represents a short piece of DNA that was sequenced. Reads consist of a certain number of bases. When you add up all the bases from all the reads, you get a total number of bases.

There is a program called stats.sh from the BBtools package that will tell you the number of reads (it will call them scaffolds/contigs) and a total number of bases in each fastq files:

Minimum         Number          Number          Total           Total           Scaffold
Scaffold        of              of              Scaffold        Contig          Contig
Length          Scaffolds       Contigs         Length          Length          Coverage
--------        --------------  --------------  --------------  --------------  --------
    All                  1,000           1,000         149,982         149,981   100.00%
    100                  1,000           1,000         149,982         149,981   100.00%
ADD COMMENT
2
Entering edit mode

If you want a simple way to get these numbers using Linux commands, these two lines will give you a number of reads and total bases, respectively:

awk '{if (NR % 4 == 0) print $0}' myfile.fastq | wc | awk '{print $1}'
awk '{if (NR % 4 == 0) print $0}' myfile.fastq | wc | awk '{print ($3-$1)}'

This assumes that your file is called myfile.fastq.

If you are curious, the first part takes each fourth line from the fastq file, because those lines contain the nucleotide sequence. wc command in Linux counts lines, characters and bytes, and awk selects which of those are printed out.

ADD REPLY
0
Entering edit mode

Thank you soo much for such detail answers. I got the point now

ADD REPLY
0
Entering edit mode

Can you please explain how to run that command. I keep trying to find total read count by using bbtool stats.ph program but I keep failing it.

ADD REPLY
1
Entering edit mode

I don't know what the problem is, so it is difficult to help. If you have Java and have downloaded and installed BBtools, it is as simple as:

stats.sh myfile.fastq
ADD REPLY
0
Entering edit mode

it says fixu@DESKTOP-KJMSKGU:/mnt/c/bbmap$ stats.sh 19213R-08-01_S16_L002_R1_001.fastq stats.sh: command not found

ADD REPLY
0
Entering edit mode

I suggest you spend some time learning basics of Linux, specifically how to set up a $PATH variable in order to tell the system where to look for programs.

I am assuming from your command that you unpacked the files in /mnt/c/bbmap directory, which would mean adding this command to your startup files for bash shell:

export PATH="/mnt/c/bbmap:$PATH"

Or this one for (t)csh shell:

setenv PATH "/mnt/c/bbmap:${PATH}"

if you want to run it from the directory where it was installed - which is in general not a good idea - you would need to add ./ to the start of your command, to tell the system to look for stats.sh in a current directory:

./stats.sh 19213R-08-01_S16_L002_R1_001.fastq

Please take some time to figure out basic Linux commands and things about system setup, as it is impossible to guide you through all possible problems one command at a time.

ADD REPLY

Login before adding your answer.

Traffic: 1331 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6