Converting FASTQ to FASTA

A little Perl one liner I borrowed from The Edwards Lab that converts FASTQ to FASTA. Please note I had to truncate the line to make it show properly in this blog entry.

 

$ cat file_to_covert.fq | perl -e \
'$i=0;while(<>){if(/^\@/&&$i==0){s/^\@/\>/;print;}elsif($i==1){print;$i=-3}$i++;}' \
> output.fasta

Thanks Edwards Lab!

I wrote the above post and a reader rightly pointed out that it is incorrect (see post comments below). I solved the problem of converting FASTQ to FASTA with the following script, which seems to work fine:

use Bio::SeqIO; 
my ($file1,$file2)=@ARGV; 
my $seqin = Bio::SeqIO -> new (-format => 'fastq',-file => $file1); 
my $seqout = Bio::SeqIO -> new (-format => 'fasta',-file => ">$file2"); 

while (my $seq_obj = $seqin -> next_seq) 
{ 
   $seqout -> write_seq($seq_obj); 
}

4 comments

  1. kbradnam

    An awk solution:

    cat file.fq |awk ‘{print “>” substr($0,2);getline;print;getline;get line}’ > file.fa

    1. gasstationwithoutpumps

      Also buggy. There is no guarantee that FASTQ files have a simple alternation of lines. Line wrap can occur (and is allowed). The FASTQ format is rather a crummy design. As it says on the Wikipedia FASTQ page:
      “The original Sanger FASTQ files also allowed the sequence and quality strings to be wrapped (split over multiple lines), but this is generally discouraged as it can make parsing complicated due to the unfortunate choice of “@” and “+” as markers (these characters can also occur in the quality string).”

Leave a Reply

%d bloggers like this: