Question

Assistance Needed: Constructing a Phylogenetic Tree from GATK PathSeqPipelineSpark BAM Files

0

Entering edit mode

17 days ago

xutongzhen1996 • 0

Hi,

I am currently using the GATK PathSeqPipelineSpark to extract microbial sequences from bulk-RNA sequencing data. During this process, a .bam file was generated, which I subsequently filtered using grep -v "*", retaining only the sequences that were successfully matched to microbes, resulting in a bacteria.bam file.

My goal is to construct a phylogenetic tree based on the sequences within this bacteria.bam file. To achieve this, I need to annotate each sequence's header line in the BAM file with the corresponding tax_id. However, I am unable to locate the taxonomic information (tax_id) within the BAM file.

I attempted the following approach:

I used grep -v "*" to retain the successfully aligned microbial sequences.
I extracted the reference sequence IDs ($3) and the sequences ($10) from the BAM file and generated a FASTA file.
I downloaded the accession2taxid file from NCBI and attempted to match the reference sequence IDs from the BAM file with the accessionID in the accession2taxid file. I then replaced the header lines in the FASTA file with the corresponding tax_id.
This resulted in a FASTA file with headers representing tax_id and their corresponding sequences.
To validate this approach, I compared the tax_id in the resulting FASTA file with those in the score.txt file generated by PathSeqPipelineSpark, but I found that the tax_id from score.txt could not be found in the FASTA file.

I am unsure where I went wrong in this process. Could someone please guide me on how to correctly process the bacteria.bam file and score.txt file from PathSeqPipelineSpark to construct a phylogenetic tree? Alternatively, if there is another method to achieve this, I would greatly appreciate your advice.

Thank you for your assistance!

Best regards,
Tongzhen

pathseq Phylogenetic-tree • 159 views

ADD COMMENT • link updated 17 days ago by Ram 44k • written 17 days ago by xutongzhen1996 • 0