How do I find out the read lenght of a fastq file?
2
2
Entering edit mode
4.2 years ago
sandKings ▴ 40

I'm sorry for asking such a lame question but I have several RNA-Seq datasets and none of the papers have specified the read length of the fastq files. I use STAR for alignment so I need to have a corresponding STAR index of correct read length. Is there a tool that can detect this for us? Thanks!

RNA-Seq • 11k views
ADD COMMENT
5
Entering edit mode
4.2 years ago

How do I find out the read lenght of a fastq file?

if this is what you are asking then

seqkit fx2tab -nl <your.fastq.file>

You can find seqkit here

ADD COMMENT
1
Entering edit mode

I think it worked.

It returned a series of lines in this format.

SRR9108806.4388388 4388388/1    75
SRR9108806.4388389 4388389/1    75

Does that mean my fastq file is 75bp read?

ADD REPLY
0
Entering edit mode

Yes, here is the excerpt from the help section

convert FASTA/Q to tabular format, and provide various information,
like sequence length, GC content/GC skew.

Usage:
  seqkit fx2tab [flags]

Flags:
  -a, --alphabet               print alphabet letters
  -q, --avg-qual               print average quality of a read
  -B, --base-content strings   print base content. (case ignored, multiple values supported) e.g. -B AT -B N
  -I, --case-sensitive         calculate case sensitive base content
  -g, --gc                     print GC content
  -G, --gc-skew                print GC-Skew
  -H, --header-line            print header line
  -h, --help                   help for fx2tab
  -l, --length                 print sequence length
  -n, --name                   only print names (no sequences and qualities)
  -i, --only-id                print ID instead of full head
  -b, --qual-ascii-base int    ASCII BASE, 33 for Phred+33 (default 33)
  -s, --seq-hash               print hash of sequence (case sensitive)
ADD REPLY
1
Entering edit mode
4.2 years ago

seqkit stats -a input.fq/fastq.(gz) should give you the possible basic stats about fastq file. Quick one liner would be:

awk 'NR%4==2 {print length}' input.fastq | sort -n | uniq -c | sort -rh | head -1. First column (with number) read number and second column (with number) read length.

ADD COMMENT

Login before adding your answer.

Traffic: 1673 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6