human genome files

0

Entering edit mode

2.7 years ago

adR ▴ 120

Hi all,

Just wonder to know about these two questions? what is the main difference between the two genome files (Homo_sapiens.GRCh38.dna.primary_assembly.fa and Homo_sapiens.GRCh38.dna.fa) located in the ensemble database? which one should I use for whole-exome sequence alignment?

I used Homo_sapiens.GRCh38.dna.fa for the alignment, and later on, when I did future count using featureCounts function as below, the whole matrix was zero. Just wondering in case Homo_sapiens.GRCh38.dna.fa was the wrong file for my alignment.

featureCounts -t exon -g gene_id -a Homo_sapiens.GRCh38.105.gtf -o Ensembl_counts_gtf.txt *.bam

Best, amare

alignment featureCounts • 1.1k views

ADD COMMENT • link updated 2.7 years ago by GenoMax 147k • written 2.7 years ago by adR ▴ 120

0

Entering edit mode

Have you read the README in ensembl website?

ADD REPLY • link 2.7 years ago by iraun 6.2k

0

Entering edit mode

yes, I did but could not able to understand it.

ADD REPLY • link 2.7 years ago by adR ▴ 120

2

Entering edit mode

Well, first of all, I don't see any file in the repository called "Homo_sapiens.GRCh38.dna.fa", so I guess the file you have is "Homo_sapiens.GRCh38.dna.toplevel.fa". The difference between Homo_sapiens.GRCh38.dna.toplevel.fa.gz and Homo_sapiens.GRCh38.dna.primary_assembly.fa is that the second excludes the alternative (haplotypes) and unassembled sequences. This link is old, but maybe it helps you understand the files and how to make use of them.

Make sure when you run featureCounts that the GTF and the genome fasta file share the same chromosome names.

ADD REPLY • link 2.7 years ago by iraun 6.2k

0

Entering edit mode

Also see: See: Why is human genome FASTA file on GENCODE much smaller than that on ENSEMBL?

ADD REPLY • link 2.7 years ago by GenoMax 147k

Login before adding your answer.