Entering edit mode
5.4 years ago
Adrian Pelin
★
2.6k
I have an RNA-seq (RPFl Ribosomal Profiling) experiment where I want to focus my analysis on reads mapping to CDS of the human genome, but not the first 15 codons (i.e. first 45bp after ATG start codon).
For this I am trying to build a bed file with coordinates of all CDS and then remove the first 45bp after ATG. There are a lot of issues doing this manually, as for a lot of genes, the first CDS features in a GTF file is shorter than 45bp.
Is there a tool that can do this easily? Thanks.
Ok got it to work. It seems that it only picks one transcript per gene. For instance RSAD2 has 5 different transcripts with 2 of these transcripts represented by CDS features (ENST00000382040 and ENST00000442639). However command above only pulls out ENST00000382040. Based on what criteria does it decide which transcript to fish out? Thanks.
ENST00000442639 is discarded because there is no stop codon (Incomplete CDS) http://www.ensembl.org/Homo_sapiens/Transcript/Summary?g=ENSG00000134321;r=2:6865828-6887053;t=ENST00000442639
Understood, makes sense. Pierre, thanks a lot for your help. I expected a simpler code and doubt I would be able to tailor it for an additional function. Is it difficult to also remove the last 5 codons (15 bp before stop codon) simultaneous to removing the first 45bp? Sorry I didn't ask right away, thought I would figure it out myself.
should be something like
for + strand. The function sublist is https://docs.oracle.com/javase/8/docs/api/java/util/List.html#subList-int-int-
Oh that's simple! And "positions = positions.subList(15,positions.size()-45);" after "else" indicating negative strand?
yep ! :-)