How do I input one fasta file for an entire reference genome?
2
0
Entering edit mode
2.2 years ago
James • 0

I'm trying to use xenome, which is a program to differentiate host vs graft sequences in RNAseq data. The input requires that I have one fasta file for the mouse genome, and one fasta file for a human genome:

xenome index -T 8 -P idx -H mouse.fa -G human.fa. where -T is the # of threads, -P is the prefix for the output files, -H is the host fasta file, and -G is the graft fasta file.

I have a few questions.

  1. Are the .fna and .fa file formats the same?
  2. I can download a zipped file of all the separate chromosome fasta files. Can I input that zipped file into xenome?
  3. If I can't do #2, do I have to just combine all of the fasta files? -- How do I do that?
  4. Can I use the newest assemblies with xenome (e.g. mm11), but then use HISAT2 with the older assemblies (e.g. mm10) because they're pre-loaded?
  5. Where can I learn more about this stuff?

New to bioinformatics and couldn't find straightforward answers to these questions on google, so sorry if these are basic. Thank you!

RNAseq assembly xenograft xenome Fasta • 1.0k views
ADD COMMENT
2
Entering edit mode
2.2 years ago

I have not used exome but it doesn't seem to be very important to your questions so here would be my answers below:

Q1. yes, fna is a fasta type file that is specifying that it contains nucleic acids as opposed to amino acids (.faa). So you can just change the suffix '.fna' to '.fa' incase xenome doesn't like the '.fna' suffix

Q2. I would say probably not, I am not sure of any tool that would like this format. Just concatenate the separate chromosome fasta files into a single fasta

Q3. You can simply just join all the files together as below (assuming they have unique fasta headers for each chromosome) which essentially just joins each file one after the other:

cat human_chromosome*.fa > human.fa

Q4. It seems the k-mer databases used for classifying the reads are based on the genomes you provide so you can use any assemblies

Q5. Strangely, probably the toughest question...not sure there is a universal answer to that. Learning on the job?

ADD COMMENT
0
Entering edit mode

Thanks so much! Actually got the indexing running finally. But ya Q5 -- realizing that learning on the job is the only way lol

ADD REPLY
1
Entering edit mode
2.2 years ago

Q2 - if you have separate chromosomes, you can just cat them together with cat chr1.fa chr2.fa > chr1and2.fasta

Q5 - training: theres a lot of material out there these days.

Youtube has a lot on basic bioinformatics.

Another great source of training is Galaxy project

https://training.galaxyproject.org/

ADD COMMENT

Login before adding your answer.

Traffic: 1569 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6