Primary_assembly and GRCh38.84.gtf.gz are the files for mapping?
1
0
Entering edit mode
8.6 years ago
bxia ▴ 180

Just start to learn, need to confirm this:

Primary_assembly (reference genome) and GRCh38.84.gtf.gz (annotation) are the files for mapping alignment reads?

does sm_Primary_assembly also work?

Have read some tutorial but just want to confirm I am doing the right thing.

Thanks

RNA-Seq ChIP-Seq • 2.2k views
ADD COMMENT
1
Entering edit mode
<sequence type>:
 * 'dna' - unmasked genomic DNA sequences.
  * 'dna_rm' - masked genomic DNA.  Interspersed repeats and low
     complexity regions are detected with the RepeatMasker tool and masked
     by replacing repeats with 'N's.
  * 'dna_sm' - soft-masked genomic DNA. All repeats and low complexity regions
    have been replaced with lowercased versions of their nucleic base
ADD REPLY
0
Entering edit mode

So the Primary_assembly is the reference genome, I don't need to download the entire GRCh38 project files, right?

ADD REPLY
0
Entering edit mode

Yes and probably no (not sure what you mean by GRCh38 project files).

ADD REPLY
0
Entering edit mode

The GRCh38 download link from illumina igenomes website, total files are about 14.2 Gb

ADD REPLY
0
Entering edit mode

That bundle includes pre-made indexes for bowtie v.1, bowtie v. 2.and bwa. You could get that bundle and save yourself the time of building your own indexes.

ADD REPLY
1
Entering edit mode
8.6 years ago
Denise CS ★ 5.2k

Primary assembly contains all chromosomes plus unplaced (and unlocalised) scaffolds. It does not contain patches and haplotypes. As @genomax2 already explained sm means soft masked, where regions of low complexity, such as repetitive regions, are in lower case. This is all explained on the README of our FASTA files from our FTP site.

ADD COMMENT

Login before adding your answer.

Traffic: 2772 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6