Unmapped reads in BLAT output ( psl file )
1
2
Entering edit mode
10.5 years ago
syrup16g_TO ▴ 40

Hello,

Please tell me about unmapped reads in BLAT run.

If reads are not mapped to reference sequence by BLAT, what is written in output psl file?

How do I extract unmapped reads?

Thank you very much.

alignment BLAT • 3.2k views
ADD COMMENT
1
Entering edit mode
10.5 years ago
Prakki Rama ★ 2.7k

I think there is no straight forward option in BLAT to collect unmapped reads. May be you can try this.

  1. Collect your mapped reads using cut command first.

    cut -d "       " -f 10 output.psl | sort -u >mapped_header.txt ## 10 here is my mapped read column.
    
  2. Now collect all your all read headers.

    LC_ALL=C fgrep ':N:' sample.fastq >all_header.txt 
    

    I used this pattern ":N:" because it is present in all my headers. If your read also has similar pattern you can probably use this or use something common in all the reads like "@HISEQ" or "@HWI" etc

  3. Now collect those headers which are unmapped using following command.

    awk 'NR==FNR{a[$0];next}!($0 in a)' mapped_header.txt all_header.txt >unmapped.txt
    
  4. Now grep those reads from the original fastq file.

    LC_ALL=C grep -A 3 -F -f unmapped.txt sample.fastq >unmapped.fastq
    

Since, we are searching for fixed strings, the LC_ALL grep would not take too much time.

ADD COMMENT

Login before adding your answer.

Traffic: 1987 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6