identifying 3'utrs for multiple transcripts of genes from a summarized experiment.
1
0
Entering edit mode
23 days ago
RNAseqer ▴ 280

Hello all,

I have a summarized experiment containing salmon-predicted transcript counts. I am primarily interested in getting the sequence for the 3'utrs for all a gene's various alternative transcripts that have a distinct 3'utr. Im wondering what would be the most efficient way of getting this data. Previously, when working on gene counts, I used the biomart mane select option to download a single representative 3'utr, but I am unsure of how to get transcript-specific 3'utrs from biomart, if that is even possible. Alternatively, is there another database you would recommend? Please let me know your thoughts.

Thanks!

biomart transcriptome utr • 316 views
ADD COMMENT
4
Entering edit mode
23 days ago

get the exons from the protein-coding genes and remove the coordinates of the CDS.

bedtools subtract \
     -a <(wget -qO - "https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_47/gencode.v47.annotation.gtf.gz" | gunzip -c | awk -F '\t' '($3=="exon" && $9 ~ /protein_coding/) {printf("%s\t%d\t%s\n",$1,int($4)-1,$5);}' | LC_ALL=C sort -T . -t $'\t' -k1,1 -k2,2n ) \
   -b <(wget -qO - "https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_47/gencode.v47.annotation.gtf.gz" | gunzip -c | awk -F '\t' '($3=="CDS" && $9 ~ /protein_coding/) {printf("%s\t%d\t%s\n",$1,int($4)-1,$5);}' | LC_ALL=C sort -T . -t $'\t' -k1,1 -k2,2n ) |\
 LC_ALL=C sort -T . -t $'\t' -k1,1 -k2,2n | bedtools merge
ADD COMMENT
0
Entering edit mode

Thanks! This worked like a charm. I appreciate the help.

ADD REPLY

Login before adding your answer.

Traffic: 1190 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6