human genome files
0
0
Entering edit mode
2.7 years ago
adR ▴ 120

Hi all,

Just wonder to know about these two questions? what is the main difference between the two genome files (Homo_sapiens.GRCh38.dna.primary_assembly.fa and Homo_sapiens.GRCh38.dna.fa) located in the ensemble database? which one should I use for whole-exome sequence alignment?

I used Homo_sapiens.GRCh38.dna.fa for the alignment, and later on, when I did future count using featureCounts function as below, the whole matrix was zero. Just wondering in case Homo_sapiens.GRCh38.dna.fa was the wrong file for my alignment.

featureCounts -t exon -g gene_id -a Homo_sapiens.GRCh38.105.gtf -o Ensembl_counts_gtf.txt *.bam 

Best, amare

alignment featureCounts • 1.1k views
ADD COMMENT
0
Entering edit mode

Have you read the README in ensembl website?

ADD REPLY
0
Entering edit mode

yes, I did but could not able to understand it.

ADD REPLY
2
Entering edit mode

Well, first of all, I don't see any file in the repository called "Homo_sapiens.GRCh38.dna.fa", so I guess the file you have is "Homo_sapiens.GRCh38.dna.toplevel.fa". The difference between Homo_sapiens.GRCh38.dna.toplevel.fa.gz and Homo_sapiens.GRCh38.dna.primary_assembly.fa is that the second excludes the alternative (haplotypes) and unassembled sequences. This link is old, but maybe it helps you understand the files and how to make use of them.

Make sure when you run featureCounts that the GTF and the genome fasta file share the same chromosome names.

ADD REPLY

Login before adding your answer.

Traffic: 1791 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6