Hello Friends !!!!! I am new to biostar community and also in NGS I am facing lot of problem in data analysis of my NGS data Please correct me with following definition. Read length means the number of sequencing cycle is run. Total sequence is the actual length of my genome or target need to be sequenced. reads are bases which are sequenced
if above is correct then in my fastqc file the read length is given as 32-151. if the it means number of cycle then why is giving 32-151
Also can any one explain me fastqc report Per base sequence content Per base sequence content Per sequence quality score Sequence length distribution Kmer content
Welcome to Biostars !
Read length - length of the read (DNA fragment) that has been sequenced.
Read length : 32-151 - shortest read length - 32 and longest read length - 151 (BTW which instrument was used to generate the data?)
Fastqc report explained here
If it means lenght of DNA fragment sequenced then what is total sequence. Does Total sequence means DNA + Adapters ?
I was confused with 'Total sequence', it is actually
Total sequences
. From thefastqc manual
provided aboveSo it is the estimate of total number of reads present in your
fastq
file. Take 4-5 starting letters from a read id(which are same in all read ids), do the following, which gives the total number of reads presentI think it means total number sequences. Each sequence has different length (here sequence length 31-151) or same length (for example sequence length 150). Am i correct?
and what is the meaning of The overall %GC of all bases in all sequences. %GC means content in entire genome then what is the meaning of all bases in all sequences
all bases in all sequences refers to bases that are actually present in your sequence file.
That number should match the value for your genome (unless the sampling was non-uniform or you have contamination).
%GC means GC content in my sample i means sequences. Then here all bases means what?? is it compairing with respect to every bases in every position of my sequence?
Out of the total bases present (A/C/G/T) in your file %GC is percentage of G/C bases (no consideration for their position/location) .
Hi,
I have a illumuna MiSeq dataset for a parasite genome. Machine itself gave paired-end reads as two separate datasets. one forward(R1) and other reverse(R2). When using FASTQC tool for one set e.g. filtering reads <70bp in R1 dataset, should we consider R1 as paired-end or no?
Thanks