Assistance Needed: Constructing a Phylogenetic Tree from GATK PathSeqPipelineSpark BAM Files
0
0
Entering edit mode
17 days ago

Hi,

I am currently using the GATK PathSeqPipelineSpark to extract microbial sequences from bulk-RNA sequencing data. During this process, a .bam file was generated, which I subsequently filtered using grep -v "*", retaining only the sequences that were successfully matched to microbes, resulting in a bacteria.bam file.

My goal is to construct a phylogenetic tree based on the sequences within this bacteria.bam file. To achieve this, I need to annotate each sequence's header line in the BAM file with the corresponding tax_id. However, I am unable to locate the taxonomic information (tax_id) within the BAM file.

I attempted the following approach:

  1. I used grep -v "*" to retain the successfully aligned microbial sequences.
  2. I extracted the reference sequence IDs ($3) and the sequences ($10) from the BAM file and generated a FASTA file.
  3. I downloaded the accession2taxid file from NCBI and attempted to match the reference sequence IDs from the BAM file with the accessionID in the accession2taxid file. I then replaced the header lines in the FASTA file with the corresponding tax_id.
  4. This resulted in a FASTA file with headers representing tax_id and their corresponding sequences.
  5. To validate this approach, I compared the tax_id in the resulting FASTA file with those in the score.txt file generated by PathSeqPipelineSpark, but I found that the tax_id from score.txt could not be found in the FASTA file.

I am unsure where I went wrong in this process. Could someone please guide me on how to correctly process the bacteria.bam file and score.txt file from PathSeqPipelineSpark to construct a phylogenetic tree? Alternatively, if there is another method to achieve this, I would greatly appreciate your advice.

Thank you for your assistance!

Best regards,
Tongzhen

pathseq Phylogenetic-tree • 159 views
ADD COMMENT

Login before adding your answer.

Traffic: 1327 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6