I have headers from a BLAST output file that I would like to create a subset database from for use in HMMer, so I pulled some code from a class I had to create a subset fasta file from the main fasta by giving a perl script a list of headers and pulling sequences with matching headers from that fasta and creating a subset fasta with the matching sequences. It works on the class test files, but not on my own research files. I've tried different scripts with the same end and all have the same effect. Even if I just use one sequence that I know matches as the input fasta, the output is always empty once I run any of the scripts. I'm not sure what's wrong- my files appear to be formatted just like the test files (multi-line wrapping, > included in the headers file, not tab delimited, etc), and I can't tell if it's my input fasta or the headers file or what. It's not the different extensions either- it worked on other files. Anyone know what's wrong? It has to be something in my files themselves.
For reference my own files:
- My database file: https://www.dropbox.com/s/1dehw638mm6soyo/assembly71_201206_P1.fa?dl=0
- The sequences I want to extract from my database file: https://www.dropbox.com/s/k3tcrxp69xwdx7s/test_2_filtered_GIs.txt?dl=0
A couple of the scripts I've been using:
- https://www.dropbox.com/s/wp2obg0eta4zl56/subset_fasta.pl?dl=0 and
- https://www.dropbox.com/s/86dpnjory6i472i/Mike_HW_part3.pl?dl=0
The test files:
Some of your links are dead. Empty or wrong files might be one of the problems.
Just checked them, they all work (just one wasn't, and it was a Dropbox parsing error) and permissions are there. These are also only copies of what I'm working off of, I only uploaded them to Dropbox for examples. They actual files are stored on my computer.