Hello,
Please tell me about unmapped reads in BLAT run.
If reads are not mapped to reference sequence by BLAT, what is written in output psl file?
How do I extract unmapped reads?
Thank you very much.
Hello,
Please tell me about unmapped reads in BLAT run.
If reads are not mapped to reference sequence by BLAT, what is written in output psl file?
How do I extract unmapped reads?
Thank you very much.
I think there is no straight forward option in BLAT to collect unmapped reads. May be you can try this.
Collect your mapped reads using cut
command first.
cut -d " " -f 10 output.psl | sort -u >mapped_header.txt ## 10 here is my mapped read column.
Now collect all your all read headers.
LC_ALL=C fgrep ':N:' sample.fastq >all_header.txt
I used this pattern ":N:" because it is present in all my headers. If your read also has similar pattern you can probably use this or use something common in all the reads like "@HISEQ" or "@HWI" etc
Now collect those headers which are unmapped using following command.
awk 'NR==FNR{a[$0];next}!($0 in a)' mapped_header.txt all_header.txt >unmapped.txt
Now grep those reads from the original fastq file.
LC_ALL=C grep -A 3 -F -f unmapped.txt sample.fastq >unmapped.fastq
Since, we are searching for fixed strings, the LC_ALL grep
would not take too much time.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.