Entering edit mode
21 months ago
annaA
▴
10
Hello,
I am using GATK Pathseq to detect bacterial sequences in rna seq mosue smaples.
The line I am using is the following
gatk --java-options "-Xms750g -Xmx750g" PathSeqPipelineSpark \
--input sample.bam \
--filter-bwa-image mm10_genome.fa.img \
--kmer-file mm10_reference.bfi \
--min-clipped-read-length 60 \
--microbe-dict Bacteria_Sequences_removedPlasmid_Contigs_Scaffolds_Oct232015_Fusonec_Fusoval.dict \
--microbe-bwa-image Bacteria_Sequences_removedPlasmid_Contigs_Scaffolds_Oct232015_Fusonec_Fusoval.fasta.img \
--taxonomy-file Microbe.db \
--output sample.pathseq.complete.bam \
--scores-output sample.pathseq.complete_scores.csv \
--is-host-aligned false \
--filter-duplicates false \
--min-score-identity .7
So I got some sequences that map to Bacillus but when I ma blasting against Bacillus I got no hits.
How could that be explained?
Hi, is your problem solved? Can you refer to this article? https://github.com/SamGa3/microbiome_reconstruction
However, I have a problem with the input bam file. The bam file obtained by star alignment of human paired-end sequencing data cannot be run after input into pathseqpipelinespark. Do you know why?