I have recently used nhmmer to search for a sequence in a FASTA file, I'd like to compare the output of the nhmmer search to a bed file I have and think it would be easy to do if the nhmmer output is also in bed format.
Is there a tool to convert the nhmmer output? I have had a look in the nhmmer manual and can't find any specific flags for generating either a bed file output or a table output indicating locations of the hits.
Example of nhmmer output:
Annotation for each hit (and alignments):
>> VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence
score bias Evalue hmmfrom hmm to alifrom ali to envfrom env to sq len acc
------ ----- --------- ------- ------- --------- --------- --------- --------- --------- ----
! 134.7 1.6 3.9e-39 1 206 [] 3509332 3509126 .. 3509332 3509126 .. 3744637 0.99
Alignment:
score: 134.7 bits
All_mimps_comb_trimmed.aln 1 AcagTgggatGcaAtaAGtttgaatacgtgtgtaagTagg 40
AcagTgggatGcaA+aAGt+t++++a+gtgt taagT
VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence 3509332 ACAGTGGGATGCAAAAAGTATTCGCAGGTGTATAAGTC-- 3509295
79***********************************8.. PP
All_mimps_comb_trimmed.aln 41 ttcccaagctagcctagtta.tggggtaccttagcttagg 79
+ccc+agct +ctagtt t+g ta+ctt+ cttagg
VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence 3509294 -GCCCTAGCTCCTCTAGTTCgTAGCTTAGCTTGACTTAGG 3509256
.*************************************** PP
All_mimps_comb_trimmed.aln 80 gctttgctacttagatAtcacactgaaattagtataatat 119
+ +ttg+tact +g+tAtc c ctga+ t agt+taatat
VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence 3509255 TACTTGATACTAGGTTATCCCCCTGATCTGAGTGTAATAT 3509216
**************************************** PP
All_mimps_comb_trimmed.aln 120 taaggtaTcgagtacctaaaaaa.aactcctagcTagcta 158
aaggta +gag acct+a aa+ ac+cct++cTagct
VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence 3509215 -AAGGTACTGAGGACCTGACAAGcTACCCCTGACTAGCTT 3509177
.8************************************** PP
All_mimps_comb_trimmed.aln 159 gcgagtgtc...ggtacttcaacacgtattcatacttatt 195
cgag++tc tac+t acac+t++++atactt+tt
VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence 3509176 CCGAGCATCagaCCTACCTGCACACCTGCGAATACTTTTT 3509137
************99************************** PP
All_mimps_comb_trimmed.aln 196 gCatcccaCtg 206
gCatcccaCtg
VMNF01000014.1_Fusarium_oxysporum_f._sp._cubense_strain_TR4_isolate_UK0001_scf_28419_14,_whole_genome_shotgun_sequence 3509136 GCATCCCACTG 3509126
*********96 PP
Thanks,
Jamie
@Fatima, great - that should work. Thank you :)
@Fatima, Strand information would be useful also but I can't work out where it is?
I assume since you're aligning the whole genome, it would be + for all the entries.
.
for score columnI usually get the genes in the genome, and then use
hmmer
. I haven't usednhmmer
.https://bedtools.readthedocs.io/en/latest/content/general-usage.html#:~:text=BED%20format,Galaxy%20browser%20and%20for%20bedtools.
@Fatima, thank you. I'm trying to find a transposable element on specific scaffolds so there is variation in strandedness. Perhaps I will try annotating first using hmmer. Thank you for your help.
Ps I have found also that --tblout will provide location information and strandedness.
Aha! Have you tried ISFinder (for finding transposable elements)?
No, I haven't heard of it! I'll take a look at it. Thank you :)