Question

BAM to gene names

0

Entering edit mode

7.3 years ago

hbrwatkins • 0

I am very new to NGS analysis and need some help.

I have a BAM file which I have filtered to contain only mapped pairs of duplicates. I want to find out the gene names that my reads are covering. What should the next step in my workflow be?

Many, many thanks,

Heather Watkins

BAM targeted • 4.4k views

ADD COMMENT • link updated 7.3 years ago by Bastien Hervé 6.4k • written 7.3 years ago by hbrwatkins • 0

1

Entering edit mode

Do you have a genome annotation (generally in the GTF or GFF3 formats)? If you do, you can count reads mapping to each gene using featureCounts.

ADD REPLY • link 7.3 years ago by h.mon 35k

0

Entering edit mode

Thank you. I mapped against hg38. Will this work?

ADD REPLY • link 7.3 years ago by hbrwatkins • 0

0

Entering edit mode

Yes, that should be fine. Just download a GTF (preferably from the same source as your genome fasta) for featureCounts.

ADD REPLY • link 7.3 years ago by WouterDeCoster 48k

0

Entering edit mode

Thank you very much. I will give this a try.

ADD REPLY • link 7.3 years ago by hbrwatkins • 0

0

Entering edit mode

Is your BAM file mapped against a reference? Is it one of the commonly available genomes?

ADD REPLY • link 7.3 years ago by GenoMax 152k

0

Entering edit mode

I mapped it against hg38.

ADD REPLY • link 7.3 years ago by hbrwatkins • 0

score 2 · Answer 1 · 2018-04-11

As you said in the comment, you got a hg38 reference genome, I can suggest you to download the gtf or gff of this genome ( https://www.gencodegenes.org/releases/current.html ). It will be usefull for the last part.

You can process your bam file with bedtools :

bedtools genomecov -bg -ibam your.bam > bedgraph.csv

That will give you a BedGraph output format (you got chromosomes, positions and even the coverage on position)

Then you can write a script (Python, Perl, whatever your want) that will do the following :

Foreach chromosome/position of your bedgraph.csv (you can even here filter out the low coverage hit)
Research this chromosome/position in your gtf file
Save the corresponding gene in a list if the gene does not exist yet
Out of the foreach save your list in a txt file

Maybe some tools already manage this task but here is my way to go.

or (as said in comments)

Use FeatureCounts on your bam with your gtf
Filter out genes with 0 counts
Use Biomart to get current gene names