Extracting reads mapping to CDS regions but not the first 45bp after ATG
1
0
Entering edit mode
5.4 years ago
Adrian Pelin ★ 2.6k

I have an RNA-seq (RPFl Ribosomal Profiling) experiment where I want to focus my analysis on reads mapping to CDS of the human genome, but not the first 15 codons (i.e. first 45bp after ATG start codon).

For this I am trying to build a bed file with coordinates of all CDS and then remove the first 45bp after ATG. There are a lot of issues doing this manually, as for a lot of genes, the first CDS features in a GTF file is shorter than 45bp.

Is there a tool that can do this easily? Thanks.

RNA-Seq bed bam • 1.9k views
ADD COMMENT
2
Entering edit mode
5.4 years ago

using BioAlcidaeJdk http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

(not checked)

$ java -jar dist/bioalcidaejdk.jar -F GTF -f bioalcidae.code  Homo_sapiens.GRCh37.87.gtf.gz 

3   57557938    57558025    ENST00000303436
3   57561274    57561400    ENST00000303436
3   57563042    57563114    ENST00000303436
3   57569624    57569734    ENST00000303436
3   57570110    57570191    ENST00000303436
3   57582800    57582822    ENST00000303436
3   57557938    57558025    ENST00000496292
3   57561274    57561400    ENST00000496292
3   57563042    57563114    ENST00000496292
3   57569624    57569734    ENST00000496292
(...)
ADD COMMENT
0
Entering edit mode

Ok got it to work. It seems that it only picks one transcript per gene. For instance RSAD2 has 5 different transcripts with 2 of these transcripts represented by CDS features (ENST00000382040 and ENST00000442639). However command above only pulls out ENST00000382040. Based on what criteria does it decide which transcript to fish out? Thanks.

ADD REPLY
1
Entering edit mode

ENST00000442639 is discarded because there is no stop codon (Incomplete CDS) http://www.ensembl.org/Homo_sapiens/Transcript/Summary?g=ENSG00000134321;r=2:6865828-6887053;t=ENST00000442639

ADD REPLY
0
Entering edit mode

Understood, makes sense. Pierre, thanks a lot for your help. I expected a simpler code and doubt I would be able to tailor it for an additional function. Is it difficult to also remove the last 5 codons (15 bp before stop codon) simultaneous to removing the first 45bp? Sorry I didn't ask right away, thought I would figure it out myself.

ADD REPLY
1
Entering edit mode

also remove the last 5 codons

should be something like

positions.subList(45,positions.size()-15);

for + strand. The function sublist is https://docs.oracle.com/javase/8/docs/api/java/util/List.html#subList-int-int-

ADD REPLY
0
Entering edit mode

Oh that's simple! And "positions = positions.subList(15,positions.size()-45);" after "else" indicating negative strand?

ADD REPLY
0
Entering edit mode

yep ! :-)

ADD REPLY

Login before adding your answer.

Traffic: 2310 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6