Here is what I plan to do:
- map reads to ref
- count reads by featureCounts
- get DEG by edgeR
- extract -1000 bp promoter region of DEG
now I'm at step 2. After I get the count file, I found some genes have more than one start sites in the 'Start' column in featureCounts output file. Like:
Geneid Chr Start End
4933401J01Rik chr1 3073253 3074322
Gm26206 chr1 3102016 3102125
Xkr4 chr1;chr1;chr1;chr1;chr1;chr1;chr1 3205901;3206523;3213439;3213609;3214482;3421702;3670552 3207317;3207317;3215632;3216344;3216968;3421901;3671498
Gm18956 chr1 3252757 3253236
If each gene has just 1 start site in the 'start' column, i think I could extract the promoter region by using bedtools. But since some genes have more than 1 TSS (eg. Xkr4), how to extract all the promoter regions from them? any suggestions?
Thanks.
I updated my post.
yes, it looks like they are different transcripts. My question is: 1. how do I determine which transcript is dominant? 2. if I want to extract all the promoter region sequence, are there any easy way to do it?