Hi Everyone :)
I have a list of genomic coordinates and want to get only exon sequence for them.
I can get the whole sequence (exon+intron) by Bedrolls getfasta but I want JUST exon sequences.
Thank you :)
Hi Everyone :)
I have a list of genomic coordinates and want to get only exon sequence for them.
I can get the whole sequence (exon+intron) by Bedrolls getfasta but I want JUST exon sequences.
Thank you :)
Bedrolls sounds like some new kind of sushi roll :-D It is bedtools. Anyway, what you can do is 1) intersect your genomic coordinates with a GFF/GTF file that contains exonic coordinates. GFF files, depending on the organism you are working on, are available from GENCODE, NCBI etc. For this, first isolate exons from the GFF:
awk 'OFS="\t", $1 ~ /^#/ {print $0;next} {if ($3 == "exon") print $1, $4-1, $5}' in.gff3 | sort -k1,1 -k2,2n > exon.bed
Then intersect this exon file with your coordinates:
bedtools intersect -a your_file.bed -b exon.bed > intersection.bed
If you want the entire exon (even if one part of the exon does not overlap with your_file.bed), then add option -wb
to the command.
Then proceed with getfasta
.
Thank you ATpoint. Sorry for Bedtools which was type error ;) I will give it a go. Just wondering if there there a way to get a single exon sequence (joint multiple exons) so I get only one sequence per each interval ? something like below: because each interval contain multiple exons. THANKS FOR YOUR REPLY :)
chr1 110743176 110749172 gaatctgggtgagcaaatgcttcctgtgaccaacagggtatagtagaagtgatgctatgtgacttccaaggctagattaggaaaggccgtgccacttccacctggtgttctagggatactcattctagaggcagccagctgccatgtaagacagccaaccaccctgagactgccatgctagggaggcgatatgtttgcagatgcttaggttgacagcttcagctgagcttccagccaacagccagtgtcaactgccagccacatgaacacagcatactgaacgtttagcccagctgagcttcagatgtttgcagcccgctgacatctgattgtagctgcataagagaccctaagcaagaactgttcaactgagccctt
The gffread which has special tools for extracting the sequence given a gff and a reference seq http://ccb.jhu.edu/software/stringtie/gff.shtml
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
If you're using R, you could use the biomart package
Thanks caggtaagtat! There is no assemble/annotation for rat (rnor 4) in biomart. There is just rnor 6 available :(
Hi, could you tell me how you solved the last problem about the joining of multiple fasta exons? Thank you!
Please explain what you mean.