Hi,
I have created a combined genome index with a human genome and 7 HPV genomes. I am running STAR aligner against this index with a number of cancer samples. From the idxstats file, I can see some mapped reads for a particular HPV chromosome:
M14119.1 7931 6 0
If I view the sorted bam file and grep for this chromosome, I get the following information:
samtools view B01.Aligned.out.sorted.bam | grep M14119
SRR16874293.1084319 99 M14119.1 3414 255 3S21M3S = 3414 21 CCAACGGCGTGTCGGCGCCGCATAAGT CCCCCCCCCCCCCCCCCCCCCCCCCC- NH:i:1 HI:i:1 AS:i:37 nM:i:2 NM:i:1 MD:Z:18C2 jM:B:c,-1 jI:B:i,-1 MC:Z:4S21M2S
SRR16874293.679795 99 M14119.1 3414 255 3S21M3S = 3414 21 CCAACGGCGTGTCGGCGCCGCATAAGT CCCC;C;CCCCCCCCCCCCCCCCCCCC NH:i:1 HI:i:1 AS:i:37 nM:i:2 NM:i:1 MD:Z:18C2 jM:B:c,-1 jI:B:i,-1 MC:Z:4S21M2S
SRR16874293.1375037 99 M14119.1 3414 255 3S21M3S = 3414 21 CCAACGGCGTGTCGGCGCCGCATAAGT CCCCCCCCCCCCCCCCCCCCCCCCCCC NH:i:1 HI:i:1 AS:i:37 nM:i:2 NM:i:1 MD:Z:18C2 jM:B:c,-1 jI:B:i,-1 MC:Z:4S21M2S
SRR16874293.1084319 147 M14119.1 3414 255 4S21M2S = 3414 -21 TCCAACGGCGTGTCGGCGCCGCATAAG -CCCCCCCCCCCCCCCCCCCCCCCCCC NH:i:1 HI:i:1 AS:i:37 nM:i:2 NM:i:1 MD:Z:18C2 jM:B:c,-1 jI:B:i,-1 MC:Z:3S21M3S
SRR16874293.679795 147 M14119.1 3414 255 4S21M2S = 3414 -21 ACCAACGGCGTGTCGGCGCCGCATAAG ;CC;CCCCCCCCCCCCCCCCCCCCCCC NH:i:1 HI:i:1 AS:i:37 nM:i:2 NM:i:1 MD:Z:18C2 jM:B:c,-1 jI:B:i,-1 MC:Z:3S21M3S
SRR16874293.1375037 147 M14119.1 3414 255 4S21M2S = 3414 -21 TCCAACGGCGTGTCGGCGCCGCATAAG -CCCCCCCCCCCCCCCCCCCCCCCC;C NH:i:1 HI:i:1 AS:i:37 nM:i:2 NM:i:1 MD:Z:18C2 jM:B:c,-1 jI:B:i,-1 MC:Z:3S21M3S
Could you please point me to how to link this to the genes in the HPV chromosome?
In terms of workflow, I'm doing STAR align on the combined human-7HPV genome against the samples > unsorted bam > coordinate sorted bam, then reviewing the idx file to spot mapped reads. Is there an easy way to output where there are alignments with HPV and then to get the genes per HPV chromosome? I'm not sure of the best tools to use for this.
Sorry for the total newbie questions.
Thanks,
Richard
you don't have to grep if the bam is sorted and indexed:
samtools view B01.Aligned.out.sorted.bam M14119.1
get a BED with the HPV genes (CHROM/START/END/GENE_NAME) , quey the BAM using
samtools view -M -L hpv.bed B01.Aligned.out.sorted.bam
and then use
bedtools intersect
to associate each read to a gene.Thank you very much Pierre. I will try that.