Hey everyone,
I am working with 16S data and I have removed my chimeric sequences using vsearch. This program outputted a .txt file with all suspect sequences. I am trying to remove these sequences from my original fasta file using qiime filter_fasta.py command.
This is what I have tried: filter_fasta.py -f <filename>.fasta -o <newfilename>.fasta -s chimeraout.txt -n
But when I grep the original fasta file and the new fasta file using the command below, they have the same number of sequences. The chimeras are not being removed from the original file.
grep "^>" <filename>.fasta | wc -l
I have tried troubleshooting this in the following ways:
First, I noticed that my original fasta had this as the header: >M00307:50:000000000-BT3VT:1:1101:15779:1247 1:N:0:GCGTAGTA+CGTCTAAT
while my chimeric sequence txt file had this as the header: >M00307:50:000000000-BT3VT:1:1101:15779:1247
so I edited the original fasta file to remove the barcode portion of the header. No luck.
Then, I realized that my original fasta file had the sequence outputted to one line, while my .txt file outputted as separate lines as below:
M00307:50:000000000-BT3VT:1:1101:15779:1247 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCAACGCCGCGTGAAGGATGAAGGTTTTCGGATCGTAAACTTTT GTCTTAGGGGACGAGGAAGGACGGTACCCTAGGAGGAAGCCACGGCTAATTACGTGCCAGCAGCCGCGGTAACACGTAAG CCCCTAGCGTTGTTCGGAATTATTGGGCGTAAAGGGCATGTAGGCGGTCAGGCAAGTCTGGTGTGAAATCTCGTGGCTCA
so I removed the spacing and tried it again but still, nothing was being removed. If I remove the -n parameter from my command the output file is empty so I know that qiime is reading the command properly it appears to not be recognizing the chimeric sequences. Any suggestions on how I can fix this would be greatly appreciated!!
Have you had a look at the qiime webpage for the filter_fasta.py command?
http://qiime.org/scripts/filter_fasta.html
It looks as if the file passed with the -s parameter should just have a list of the IDs of the sequences you want to remove, rather than the actual sequences. Try and see if this helps.