Hello! I downloaded the indexes of GRCh38 human genome from ftp://ftp.ccb.jhu.edu/pub/data/bowtie_indexes/ and aligned my pair ended reads on it. after getting .bam, i converted it into .sam by using
**"samtools view -h -o output.sam input.bam"**
then i run
**"htseq-count output.sam Homo_sapiens.GRCh37.75.gtf "**
then I sorted the .sam file by using command
**"samtools sort -O sam -T 13.sam -o 13.sort.sam 13.sam"**
but it results like
...........................................................................................................................................................
Warning: Read SRR993713.6118094 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
Warning: Read SRR993713.122141 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
Warning: Read SRR993713.10023628 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
...........................................................................................................................................................
and then it shows the result with ensemble gene ids with 0 counts. please tell me whats wrong with it. Am i using the wrong gtf file. or the way of sorting my file is wrong?
why ?
I learnt the pipeline from a person he said that u need to align then i have to get .sam formate but tophat gave me result in bam formate then i thought i should convert it into sam formate. what is the desired output formate of tophat tp get raw counts?
You can use featureCounts to get a count matrix directly from your BAM files.
Its not working at all :(
It is not working. I spent 3 hours on it :(
What is not working?
Yes you are. As @Ram pointed out, why are you mixing genome builds. This is a surefire way of getting erroneous results. Consider yourself fortunate that you got 0 counts, otherwise you may have continued on your merry way with completely incorrect data.
then whats the solution ?
Use a GTF file for GRCh38 (since you used indexes for that build). Did you use Ensembl? The GTF files for that build can be found here.
Why download the GRCh38 indexes and use them with the GRCh37 gtf file? Am I missing something here?