Extract sequences which do not have blast hits
1
0
Entering edit mode
6.2 years ago
karthic ▴ 130

Hi,

I have a fasta file with around 1 million sequences. I did a blast search and got hits for around 7500 sequences. Now I want to extract those sequences which do not have a hit and take them for further analysis.

So far am using a custom sed script which is very slow, judging from the speed, it might take several days to complete. Please help me with fast and robust solutions.

the script am using currently is below..

    cat CG061MR_S20_R1_001_AR_filter_un_ren.fa > CG061MR_S20_R1_001_AR_filter_unblasted.fa

    for j in $(cat CG061MR_blastids.txt)    
    do
    sed -i -e '/'$j'/{N;d}' CG061MR_S20_R1_001_AR_filter_unblasted.fa

done

Thank You KK

RNA-Seq sequence blast extraction • 1.4k views
ADD COMMENT
0
Entering edit mode
6.2 years ago
karthic ▴ 130

Sorry guys for bothering.

Found the solution with Jim kent's faSomeRecords

Thank You

KK

ADD COMMENT
0
Entering edit mode

faSomeRecords or GetFaRecords? karthic

ADD REPLY
0
Entering edit mode

Sorry, its faSomeRecords. Corrected it.

ADD REPLY

Login before adding your answer.

Traffic: 1309 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6