cuffcompare output results
1
0
Entering edit mode
9.8 years ago
hana ▴ 190

Hi

I'm interested in identifying potential novel isoforms form my RNA-seq data. I would like to know after running the cuffcompare how I can get only the list of novel isoform ( code "U" and "j" ) and extract their sequences and validate them ?

Thank you

RNA-Seq • 6.6k views
ADD COMMENT
10
Entering edit mode
9.8 years ago
Manvendra Singh ★ 2.2k

Actually , with class codes "x" (cis antisense), "i" (intronic), "u" (intergenic) and "j" (alternatively spliced), given by cuffcompare are those transcripts which are non annotated in gtf files which you are providing during RABT assembly.

You can fetch these transcripts by their class codes e.g. for alternatively spliced

awk '$22 ~ /j/ { print }' cuffcompare_combined.gtf > Alternatively_spliced.gtf

Now you need to do some filtering e.g. length of transcripts more than 200

awk '{ if ($5-$4>200) print $0 }'  Alternatively_spliced.gtf > Alternatively_spliced_200.gtf

You also get separate file as cuffcompare.tracking containing FPKM values for each detected loci

You can then make threshold of FPKM and filter out those which are less abundant

Convert the resultant file in bed format and fetch the sequences from bedtools

bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf> -fo <fasta> -s

-fo would provide you the sequences of coordinates you provide in -bed option from refseq you provide in -fi option, -s is for strandness

HTH

ADD COMMENT

Login before adding your answer.

Traffic: 2594 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6