Entering edit mode
3.3 years ago
tul66893
•
0
Hi everyone,
I'm looking for some help because I just ran my alignment code last night for 12 RNAseq samples. After taking a look at the counts that were generated this morning, all of them are reading 0 for every gene. Oddly, here are the results in the terminal...
95887304 reads; of these:
95887304 (100.00%) were unpaired; of these:
2313614 (2.41%) aligned 0 times
76638154 (79.93%) aligned exactly 1 time
16935536 (17.66%) aligned >1 times
97.59% overall alignment rate
[bam_sort_core] merging from 32 files and 16 in-memory blocks...
Seems like the code is working for alignment but giving odd counts. Anyone have any ideas where to look?
Make sure the GTF and fasta reference file have the same chromosome names, not like
1
in one file andchr1
in the other.Here is an example of the first ten lines from of the FASTA files.
I don't know where the chromosome names exactly are. Here is the first couple of lines form the GTF file. I don't see the "chr1" designation. Is it possible I am using an incorrect GTF file? It's titled Rattus_norvegicusRnor_6.0.104.gtf
fasta reference, not fastq, so the file you built the index from.
I'm not actually sure where I would find the fasta file. I'm completely new to coding in general so I apologize. I do have a folder I made called rn6_index which includes all the genome.1.ht2, genome.2.ht2, etc. files. There's an additional file in there called make_rn6.sh. Would it be located in any of these?
No worries :) Ah I see, you downloaded an index. Then please run for any of your bam files
This will show you the chromosome names so you can check whether it matches the GTF.
Thanks for understanding! So here is a couple lines of that result.
Looks like its using the chr notation... should I use a different GTF file then?
Indeed there is the mismatch. Can you link the source of the index (so the download link or website it is from)?
Here is the link to where I obtained the rat genome files.
https://useast.ensembl.org/Rattus_norvegicus/Info/Index
Actually now that I can see the problem, it seems my sam/bam files are in UCSC format (chr1) while the GTF I downloaded is in the Ensemble format (1). I've been searching for rat GTF files in UCSC and cannot seem to find one. This leads me to believe I should maybe convert my sam/bam files to Ensemble? Not sure how to go about that or if that is even the logical answer.
You can't convert like that. You have to start from scratch. Make a genome index with your new matching genome + gtf with STAR, and align with that.
Can you provide your code?
Sure! Here is what I can show. Just for more reference, I have 12 samples, with 6 replicates each all located within a folder titled "Day1". This language is just because the last code I ran included day1 and day2 samples.
index=/hidden/rn6_index/genome
The fasta file is almost certainly sitting with the index files, that's what you need to check. If you really can't find it, you need to start over with genome and gtf files you know match.
Hi tul66893, why did you delete this comment?