Question

Phased Genome, Want reads From Both Alleles

1

Entering edit mode

7.1 years ago

kpr ▴ 80

I have a phased genome, and I am trying to generate a counts matrix that contains the number of reads that map to both alleles. For example, something like this.

                  Wild1     Wild2     Wild3     MT1      MT2    MT3
C1_000001          0          0         0        0        0      0      
C1_000002          10         9         8        1        1      1      
C1_000003          0          0         0        10       12     10      
C1_000004          6          5         7        0        0      0

Right now I have used bowtie, tophat, and have an accepted_hits file, aln.bam.

I have used samtools to sort the aln.bam but am having trouble with the next step. Based on what I have read so far, I think my next step is to generate a consensus sequence?

samtools mpileup -uf ref.fa aln.bam | bcftools call -c | vcfutils.pl vcf2fq > cns.fq

I want to make sure I am understanding this step, and subsequent steps.

I don't understand this portion of the above line of code, and am having trouble finding documents on it :

vcfutils.pl vcf2fq > cns.fq
I have read documentation saying that we can use some of these functions to generate a consensus across samples, or alleles. I want to make sure I am doing it by alleles.
It looks like this will generate a consensus fastq file. Not sure what the next steps would be.
Any other suggestions besides "Google Allele Specific Pipelines" would be helpful. Perhaps a link of one in particular that you would recommend.

Update: I also don't need the whole matrix. Figuring out a way just to do it for one gene would be sufficient.

Thanks in advance!

rna-seq • 1.5k views

ADD COMMENT • link 7.1 years ago by kpr ▴ 80

1

Entering edit mode

Point 4 is quite demanding. People help whichever way they can.

ADD REPLY • link 7.1 years ago by GouthamAtla 12k

0

Entering edit mode

I'm not trying to be rude, I just need something a little more than that at this point.

ADD REPLY • link 7.1 years ago by kpr ▴ 80

score 1 · Answer 1 · 2018-05-01

1

Entering edit mode

7.1 years ago

GouthamAtla 12k

Use ASEReadCounter from GATK.

ADD COMMENT • link 7.1 years ago by GouthamAtla 12k

score 1 · Answer 2 · 2018-05-02

1

Entering edit mode

7.1 years ago

colindaven 7.5k

This whole project is very demanding. I would recommend looking at Phaser https://github.com/secastel/phaser

However, I spent a long time working on this, and got very little out. Nanopore is a better option for getting phased alleles in my opinion. Good luck.