retain UMIs when converting bam to vcf file
1
0
Entering edit mode
4.1 years ago

I have 10X data that I would to extract all of the SNPs for a very specific site on one gene to annotate the downstream data. I am having trouble getting the UMIs to appear in the final vcf output of my pipeline. I am using SAMtools to subset the bam to only the site of interest, then use freebayes to generate vcf output:

samtools view -b possorted_genome_bam.bam "3:103060270-103060270" > nras_snp.bam
freebayes -f genome.fa nras_snp.bam > nras.vcf

Does anyone know of additional flags I might need to use for these command line tools to make my UMIs appear in the final vcf output?

vcf bam RNA-Seq sequencing • 775 views
ADD COMMENT
0
Entering edit mode
4.1 years ago

I don't understand how you wish to display "UMIs" in the output. You want to display all the unique UMI sequences? Or the number of unique UMIs for that gene?

You realize that if multiple reads share a UMI, align to the same gene in different places, they each will be giving you correct sequence information, but all but one will be marked as duplicate, which means Freebayes will ignore them. This might not be the behavior you desire.

ADD COMMENT
0
Entering edit mode

I would like to have UMIs as a column in the VCF file and for freebayes to ignore duplicates.

ADD REPLY

Login before adding your answer.

Traffic: 1522 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6