Identify genes for mapped reads with combined human-7HPV genome index
0
0
Entering edit mode
15 months ago
r.j.lock • 0

Hi,

I have created a combined genome index with a human genome and 7 HPV genomes. I am running STAR aligner against this index with a number of cancer samples. From the idxstats file, I can see some mapped reads for a particular HPV chromosome:

M14119.1        7931    6       0

If I view the sorted bam file and grep for this chromosome, I get the following information:

samtools view B01.Aligned.out.sorted.bam | grep M14119

SRR16874293.1084319     99      M14119.1        3414    255     3S21M3S =       3414    21      CCAACGGCGTGTCGGCGCCGCATAAGT     CCCCCCCCCCCCCCCCCCCCCCCCCC-     NH:i:1  HI:i:1  AS:i:37 nM:i:2  NM:i:1  MD:Z:18C2       jM:B:c,-1       jI:B:i,-1       MC:Z:4S21M2S
SRR16874293.679795      99      M14119.1        3414    255     3S21M3S =       3414    21      CCAACGGCGTGTCGGCGCCGCATAAGT     CCCC;C;CCCCCCCCCCCCCCCCCCCC     NH:i:1  HI:i:1  AS:i:37 nM:i:2  NM:i:1  MD:Z:18C2       jM:B:c,-1       jI:B:i,-1       MC:Z:4S21M2S
SRR16874293.1375037     99      M14119.1        3414    255     3S21M3S =       3414    21      CCAACGGCGTGTCGGCGCCGCATAAGT     CCCCCCCCCCCCCCCCCCCCCCCCCCC     NH:i:1  HI:i:1  AS:i:37 nM:i:2  NM:i:1  MD:Z:18C2       jM:B:c,-1       jI:B:i,-1       MC:Z:4S21M2S
SRR16874293.1084319     147     M14119.1        3414    255     4S21M2S =       3414    -21     TCCAACGGCGTGTCGGCGCCGCATAAG     -CCCCCCCCCCCCCCCCCCCCCCCCCC     NH:i:1  HI:i:1  AS:i:37 nM:i:2  NM:i:1  MD:Z:18C2       jM:B:c,-1       jI:B:i,-1       MC:Z:3S21M3S
SRR16874293.679795      147     M14119.1        3414    255     4S21M2S =       3414    -21     ACCAACGGCGTGTCGGCGCCGCATAAG     ;CC;CCCCCCCCCCCCCCCCCCCCCCC     NH:i:1  HI:i:1  AS:i:37 nM:i:2  NM:i:1  MD:Z:18C2       jM:B:c,-1       jI:B:i,-1       MC:Z:3S21M3S
SRR16874293.1375037     147     M14119.1        3414    255     4S21M2S =       3414    -21     TCCAACGGCGTGTCGGCGCCGCATAAG     -CCCCCCCCCCCCCCCCCCCCCCCC;C     NH:i:1  HI:i:1  AS:i:37 nM:i:2  NM:i:1  MD:Z:18C2       jM:B:c,-1       jI:B:i,-1       MC:Z:3S21M3S

Could you please point me to how to link this to the genes in the HPV chromosome?

In terms of workflow, I'm doing STAR align on the combined human-7HPV genome against the samples > unsorted bam > coordinate sorted bam, then reviewing the idx file to spot mapped reads. Is there an easy way to output where there are alignments with HPV and then to get the genes per HPV chromosome? I'm not sure of the best tools to use for this.

Sorry for the total newbie questions.

Thanks,
Richard

star samtools • 530 views
ADD COMMENT
0
Entering edit mode

samtools view B01.Aligned.out.sorted.bam | grep M14119

you don't have to grep if the bam is sorted and indexed:

samtools view B01.Aligned.out.sorted.bam M14119.1

Could you please point me to how to link this to the genes in the HPV chromosome?

get a BED with the HPV genes (CHROM/START/END/GENE_NAME) , quey the BAM using

samtools view -M -L hpv.bed B01.Aligned.out.sorted.bam

and then use bedtools intersect to associate each read to a gene.

ADD REPLY
0
Entering edit mode

Thank you very much Pierre. I will try that.

ADD REPLY

Login before adding your answer.

Traffic: 1944 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6