Entering edit mode
7.4 years ago
t2g4free
•
0
After QC, I got the different length read, I want to get the reads length distribution. Any suggestions?
After QC, I got the different length read, I want to get the reads length distribution. Any suggestions?
using gnuplot:
$ curl -sL "https://raw.githubusercontent.com/MedicineAndTheMicrobiome/AnalysisTools/master/FASTQ/Split_Fastq/Example.fastq" | \
paste - - - - | \
awk -F '\t' '{printf("%d\n",10*int(length($2)/10.0));}' |\
sort |uniq -c | sort -n |\
gnuplot -e "set terminal dumb 80 50 ; set title 'Fastq sequence len.'; set xlabel 'length'; set auto x;set style data histogram; plot '-' using 1:xticlabels(2) with lines notitle;"
Fastq sequence len.
20 ++---+----+----+----+----+----+----+-----+----+----+----+----+----+---+*
+ + + + + + + + + + + + + + *
| *
| *
18 ++ *+
| *|
| *|
| *|
| *|
16 ++ *+
| *|
| * |
| * |
14 ++ *++
| * |
| * |
| * |
12 ++ *++
| * |
| * |
| * |
| * |
10 ++ * ++
| * |
| * |
| * |
8 ++ * ++
| * |
| * |
| * |
6 ++ * ++
| * |
| * |
| * |
4 ++ * ++
| ** |
| ** |
| ***************** |
| ** |
2 ++ ************************ ++
| ** |
***************** |
+ + + + + + + + + + + + + + +
0 ++---+----+----+----+----+----+----+-----+----+----+----+----+----+---++
100 120 170 240 110 140 160 180 230 190 200 210 220 150 250
length
reformat.sh
from BBMap Suite used like this: reformat.sh in=your.fq ihist=filename_you_want.txt
FastQC to check read length distribution and if you want to only retain reads with certain length after QC-trimming use cutadapt
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
+1 for creativity. Using BBMap solution may be easier for a giant file.
The X-axis goes "100, 120, 170, 240, 110, 140" etc. I suspect that's not intentional... but it might be related to the two odd jaggies in the graph.