Question

cuffcompare output results

0

Entering edit mode

9.8 years ago

hana ▴ 190

Hi

I'm interested in identifying potential novel isoforms form my RNA-seq data. I would like to know after running the cuffcompare how I can get only the list of novel isoform ( code "U" and "j" ) and extract their sequences and validate them ?

Thank you

RNA-Seq • 6.6k views

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by hana ▴ 190

Ram · Answer 1 · 2015-01-22

Actually , with class codes "x" (cis antisense), "i" (intronic), "u" (intergenic) and "j" (alternatively spliced), given by cuffcompare are those transcripts which are non annotated in gtf files which you are providing during RABT assembly.

You can fetch these transcripts by their class codes e.g. for alternatively spliced

awk '$22 ~ /j/ { print }' cuffcompare_combined.gtf > Alternatively_spliced.gtf

Now you need to do some filtering e.g. length of transcripts more than 200

awk '{ if ($5-$4>200) print $0 }'  Alternatively_spliced.gtf > Alternatively_spliced_200.gtf

You also get separate file as cuffcompare.tracking containing FPKM values for each detected loci

You can then make threshold of FPKM and filter out those which are less abundant

Convert the resultant file in bed format and fetch the sequences from bedtools

bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf> -fo <fasta> -s

-fo would provide you the sequences of coordinates you provide in -bed option from refseq you provide in -fi option, -s is for strandness

HTH