where to find hg38 CDS without UTRs
2
0
Entering edit mode
6.3 years ago
ta_awwad ▴ 350

Hello everyone, I am looking for Human protein-coding transcript sequences WITHOUT UTRs in fasta format if possible.. any Idea where to find such file?

Best, TA

RNA-Seq genome assembly alignment • 1.3k views
ADD COMMENT
1
Entering edit mode
6.3 years ago
GenoMax 147k

You should be able to get them from BioMart at Ensembl. Video tutorial available for BioMart.

ADD COMMENT
1
Entering edit mode
6.3 years ago
  1. Get gtf with all annotations (for all coding genes)
  2. Chuck 3' and 5' UTRs out from gtf
  3. use tools such as getFasta to get protein coding transcript sequences.
ADD COMMENT
2
Entering edit mode

Simply greping or awking for 'CDS' as the feature in the GTF, followed by bedtools getfasta on the resulting coordinates does the job.

ADD REPLY

Login before adding your answer.

Traffic: 2460 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6