hello everyone,
I identified some lincRNA from different light treatment and tissue specific data.Now I want to see differential expression of these lincRNA among these tissue and light specific data so for that I align individual raw read (tissue and light specific) to these lincRNA(predicted) then try to generate count file by featureCount but gave only zero while in samtools idxstats gave mapped reads. lincRNA file looks like:
>SL3.0ch00:1230877-1231088
GCCTATATATAGAGTACTAAATTCCTTAAAAAGGCATCTCGGAAGTTCCATAAATAGATCAAGATATCGAATAAGAGAGGTCAAATGTAAATCATCCTAGTTCGAGAAATACGCCACTAACGACCCTCGAATCATACAAAATCATGGAGAGTAGAATTAAGGGATCAATAGAATTGTACCGCTG
>SL3.0ch00:1147815-1150318
GAGCAGCCATAGAACAAAAGCAGTTGTGGGTGAGCTGGTTTAAACCTCCTCAATAAGAGGCGTGCGCACCAACAAGCGAGGGTTTGAATCCCACCAGTAGCATTTATTTTTTTAAAAAAATT
samtools idxstats accepted_hits.sort.bam > samid.txt
which looks like:
SL3.0ch00:1230877-1231088 211 0 0
SL3.0ch00:1147815-1150318 2503 0 0
SL3.0ch00:1147485-1147759 274 0 0
SL3.0ch00:1150608-1151254 646 4 0
because these lincRNA are from intergenic region so would overlap with original gtf file (which available at solgenomic website) so from these lincRNA file itself I created GTF file which looks like this:
SL3.0ch00 myIntergenic intergenic 1230877 1231088 . + . gene_id "xxx"; trancript_id "SL3.0ch00:1230877-1231088";
SL3.0ch00 myIntergenic intergenic 1147815 1150318 . + . gene_id "xxx"; trancript_id "SL3.0ch00:1147815-1150318";
featureCounts -T 60 -F GTF -t intergenic -a intergenic_created.gtf -o feature4.bed SLY_veg06_sorted.bam
this I used for feature Count but I got zero count for all transcript. kindly guide me how to solve this.
As above I showed GTF that is manually created from idxstat file only sir still it did not give.... I changed my database also:
even I paste SL3.0ch00:1230877-1231088 in first column of gtf to get match (I know its not valid) but this also didnot work. all combination I tried.
Get the reads from
samtools idxstats
instead offeatureCounts
Although this would include multimapper in the counting.
But then your coordinates in the gtf are not 1230877 to 1231088 , but rather 1 to 211 .
I tried by -O parameter of featureCount still did not work. and 211 is length of transcript..can I create count file from idxstats output? it will be OK for further differential gene expression analysis?