I take the BAM file as input and perform RNA-Seq. The program prints out a list of genes to which the reads match. Some of the genes in the list overlapping in the same region on the chromosome. I want to add code to the program that reports overlapping genes. How can I do that ?
For example in the picture below, the blast result of 7 overlapping genes in the gene list.
Thank you for the help.
While you are performing RNAseq/BAM the image you show above is from a blast search. Just to clarify, you are referring to "genes" from different hits(organisms) when you refer to the image above, correct?
Is your question about the RNAseq or are you using
magicblast
to create a BAM file? What is in your RNAseq reference? It would be unusual to have multiple species/transcriptomes at the same time.I've included the blast image just to explain the overlapping genes. I have a program that takes the genome aligned BAM file as input and assigns the reads to the genes. All of the genes belong to a single organism (homo sapiens). The program simply takes a bam file as input and scores the reads from the samples to the genes. I want to add the following feature to the program; finding which of the reported genes overlap and reporting them in a separate tsv etc. format. (My question is about RNA-Seq and for genes from the same organism.)
You also provide it with a list of gene descriptions, such as a GTF or GFF file - correct? (if not, how would your program know the gene locations?). There are many ways to deal with overlapping entries in GFF files, and various ways for a program to count reads on entries originating in GFF files. What program are you using? What is the source of your annotation file (e.g. GFF file)?
I just give the program a bam file and an optional bai index file. It counts the reads and assignment to the genes done with the pysam module.
You're still leaving out information which is important for answering your question - is this a program that you wrote? If you didn't write the program, is it a known program? Publicly available? Whatever program you're using must be reaching for gene descriptions in some format, as these are not contained within a BAM file. Without knowing more details, I don't see how this question can be answered.
I'm tiring you out because I'm a beginner, sorry for that. I want to explain it to you in a more understandable way. My code do this: Quantifies scRNA-seq BAM files. Takes alignment and single cell barcode annotated BAM files as input, and writes the quantification results to barcodes.tsv, genes.tsv and matrix.mtx format files, and produces MatrixMarket coordinate format barcode-by-gene read tables. (Note: I am reading the BAM file with the pysam.AlignmentFile function of the pysam module.) I want to add a new function to the code. I will add a flag that finds out which of the reported genes (written in genes.tsv file) are overlapping in the same region of the chromosome. I want to share my code with you for better understanding. Can you give me your e-mail address?