Question

Extract sequences which do not have blast hits

0

Entering edit mode

6.2 years ago

karthic ▴ 130

Hi,

I have a fasta file with around 1 million sequences. I did a blast search and got hits for around 7500 sequences. Now I want to extract those sequences which do not have a hit and take them for further analysis.

So far am using a custom sed script which is very slow, judging from the speed, it might take several days to complete. Please help me with fast and robust solutions.

the script am using currently is below..

    cat CG061MR_S20_R1_001_AR_filter_un_ren.fa > CG061MR_S20_R1_001_AR_filter_unblasted.fa

    for j in $(cat CG061MR_blastids.txt)    
    do
    sed -i -e '/'$j'/{N;d}' CG061MR_S20_R1_001_AR_filter_unblasted.fa

done

Thank You KK

RNA-Seq sequence blast extraction • 1.4k views

ADD COMMENT • link 6.2 years ago by karthic ▴ 130

score 0 · Answer 1 · 2018-09-17

0

Entering edit mode

6.2 years ago

karthic ▴ 130

Sorry guys for bothering.

Found the solution with Jim kent's faSomeRecords

Thank You

KK

ADD COMMENT • link 6.2 years ago by karthic ▴ 130

0

Entering edit mode

faSomeRecords or GetFaRecords? karthic

ADD REPLY • link 6.2 years ago by cpad0112 21k

0

Entering edit mode

Sorry, its faSomeRecords. Corrected it.

ADD REPLY • link 6.2 years ago by karthic ▴ 130