Question

Help with Output of FASTA Files From Excel

1

Entering edit mode

4.1 years ago

joseph.landry ▴ 50

Hi All,

I am going to create a custom Bowtie2 index for a sgRNA library. I have the sequences of the sgRNA "barcodes" as an excel file shown below....

>ACTL6A_1
GGATAGTTTCCAAGCTATTT

>ACTL6A_3
TTTGCTAATGGTCGTTCTAC

>ACTL6A_5
GTTGAAGGACATAGCCATCG

>ACTL6A_7
ACTGCAATTCCAGTCCACGA

This goes on for 7000 sgRNA sequences. I would like to output these as individual FASTA files. One FASTA sequence per file, with the file named after the sgRNA bar code identifier in the FASTA header. So for example file 1 would contain..

 >ACTL6A_1
GGATAGTTTCCAAGCTATTT

and be named ACTL6A_1.fa. Can someone help me figure out how to do this using terminal commands?

Any help would be greatly appreciated.

Thanks,

Joe

sequence • 1.8k views

ADD COMMENT • link updated 4.1 years ago by Aimin Li ▴ 30 • written 4.1 years ago by joseph.landry ▴ 50

0

Entering edit mode

Why do you need to output them as individual files? Just copy and the paste the data into a programmers editor (use Notepad or Notepad++ on Windows or textpad on macOS). Save the file as pain text and use it as input for bowtie2 indexing. A multi-fasta format file is the input for aligner indexing programs.

ADD REPLY • link 4.1 years ago by GenoMax 147k

0

Entering edit mode

I am assuming that I need to have a separate file for each sgRNA to get Bowtie2 to add the FASTA header information to the alignment in the BAM file. That is what I will use with Feature Counts to count how many reads aligned to each sgRNA from the FastQ sequencing file.

ADD REPLY • link 4.1 years ago by joseph.landry ▴ 50

1

Entering edit mode

Your assumption is crazy. Bowtie will take the name of each sequence from the header of each sequence (the part after the ">"), not from the file name!

And you hardly need featureCounts to count up how many reads aligned to each sequence. samtools idxstats will do that.

ADD REPLY • link 4.1 years ago by swbarnes2 14k

0

Entering edit mode

Thanks for insulting me.

ADD REPLY • link 4.1 years ago by joseph.landry ▴ 50

0

Entering edit mode

a separate file for each sgRNA to get Bowtie2 to add the FASTA header information to the alignment in the BAM file.

No. That information comes from the fasta headers in the multi-fasta file.

ADD REPLY • link 4.1 years ago by GenoMax 147k

0

Entering edit mode

Thanks. I think I understand now. Previous creation of Bowtie2 custom index required that I load in each Chr as a separate .fa file. I did not know that I could have in theory uploaded a single multi-FASTA file.

ADD REPLY • link 4.1 years ago by joseph.landry ▴ 50

0

Entering edit mode

That is correct. You can use a multi-fasta file for index creation with all aligners.

ADD REPLY • link 4.1 years ago by GenoMax 147k

score 0 · Answer 1 · 2020-10-30

0

Entering edit mode

4.1 years ago

Aimin Li ▴ 30

I have just tried to do it using shell command 'awk', FYI:

https://www.cnblogs.com/emanlee/p/13905310.html

ADD COMMENT • link 4.1 years ago by Aimin Li ▴ 30