I have some non-coding region sequence around 5000 and some tissue specific reads data. I want to do differential gene expression analysis. So I want to check which tissue specific reads having these non coding sequences for that I aligned them (where I created database for those non coding sequences) by tophat2. Then I need to have count file from these bam file, used featureCount but it gave zero only.I also used bedtool also it also not worked. I suspect because of those non-coding seq it can't generate count file...may be I don't know. I convert bam to sam file then it looks like this:
> @HD VN:1.0 SO:coordinate
@SQ SN:0 LN:309
@SQ SN:0 LN:381
@SQ SN:1 LN:1085
@SQ SN:1 LN:353
@SQ SN:10 LN:239
@SQ SN:10 LN:205
@SQ SN:10008 LN:659
@SQ SN:1001 LN:291
Can you people tell me what should I do?
I changed "tool" to "question", since tool is meant for promotion of new tools. Also use the
101
button for code, it reads easier for your readers.Your post lacks any details (code) on what you did and which files you used). Please add this. Brief Reminder On How To Ask A Good Question
You first need to figure out why you have duplicate entries with different lengths in your BAM/SAM files. For example, you an a contig 0 and length 309 and another contig 0 with length 381. That's going to break most tools. Further, make sure that your annotations has contig names like 0, 1, 10008 and so on, otherwise featureCounts will always return 0.
Thanks Ryan...I removed duplicate. Here I should mention that I am working novel lincRNA so my annotation file does not have that information.Then how to calculate count? any other way to do it. It may look silly but do not have idea.
Add it to your annotation then.
here I am giving my gtf file which looks like:
from bam file samid generated by following command
samid.txt looks like this:
then give this command
ERROR: no features were loaded in format GTF
and I used bedtool also still it gave zero count.
That "GTF" file isn't in GTF format, you need to fix that. Also, the chromosome names don't match between the two files.
I created GTF file from this command:
so it looks like this:
but how to change bam file ids? in gtf file we can't give "SL3.0ch00:1230877-1231088"...so what should I do?
It looks like your reference fasta file has the contigs split apart into chunks. Honestly I'd try to fix that and realign the data. Then you'd have
SL3.0ch00
and such and the alignments would probably be more accurate.thanks for giving your time Ryan...one question I have..can I create count file from idxstats output? it will be OK for further differential gene expression analysis?
In general that won't produce correct results.