Hi All,
I am going to create a custom Bowtie2 index for a sgRNA library. I have the sequences of the sgRNA "barcodes" as an excel file shown below....
>ACTL6A_1
GGATAGTTTCCAAGCTATTT
>ACTL6A_3
TTTGCTAATGGTCGTTCTAC
>ACTL6A_5
GTTGAAGGACATAGCCATCG
>ACTL6A_7
ACTGCAATTCCAGTCCACGA
This goes on for 7000 sgRNA sequences. I would like to output these as individual FASTA files. One FASTA sequence per file, with the file named after the sgRNA bar code identifier in the FASTA header. So for example file 1 would contain..
>ACTL6A_1
GGATAGTTTCCAAGCTATTT
and be named ACTL6A_1.fa. Can someone help me figure out how to do this using terminal commands?
Any help would be greatly appreciated.
Thanks,
Joe
Why do you need to output them as individual files? Just copy and the paste the data into a programmers editor (use Notepad or Notepad++ on Windows or textpad on macOS). Save the file
as pain text
and use it as input forbowtie2
indexing. A multi-fasta format file is the input for aligner indexing programs.I am assuming that I need to have a separate file for each sgRNA to get Bowtie2 to add the FASTA header information to the alignment in the BAM file. That is what I will use with Feature Counts to count how many reads aligned to each sgRNA from the FastQ sequencing file.
Your assumption is crazy. Bowtie will take the name of each sequence from the header of each sequence (the part after the ">"), not from the file name!
And you hardly need featureCounts to count up how many reads aligned to each sequence. samtools idxstats will do that.
Thanks for insulting me.
No. That information comes from the fasta headers in the multi-fasta file.
Thanks. I think I understand now. Previous creation of Bowtie2 custom index required that I load in each Chr as a separate .fa file. I did not know that I could have in theory uploaded a single multi-FASTA file.
That is correct. You can use a multi-fasta file for index creation with all aligners.