Questions tagged [fasta]

FASTA is a software package for sequence alignment of proteins and nucleic acids. FASTA is also the name of the file format used by these programs to represent sequences of peptides or nucleotides. The format is a de facto standard in bioinformatics.

The FASTA format (read as "fast A format") is a text-based format used by the FASTA software for representing nucleic acids and proteins. It represents each nucleotide and amino-acid as a letter. The FASTA format also supports naming of sequences.

The format achieved great popularity, becoming the de facto standard for representing biological sequences.

A bioinformatical record in FASTA format consists of the header (comment) string followed by one or more strings describing the sequence (one letter per nucleotide or amino acid). Header strings begin with >. The sequence that follows is wrapped at a fixed width (often 60, but generally no more than 80).

> Sample nucleotide sequence
AGCACTGAGTAACGTATAAGCAGTCCCCGGACGCGTA
> Nucleotide sequence #2
GCCACGGGAGTTGAAGAACATCGAGAATGCCACTAGTTTTCACCCTTCATAGATATCCTA
GCGCCGTACATGTATACGAGATCTTTGTCACGCAGTATGGAGGATTGTGGCCAGCAATAC
GTCGTGTCCCGCAATGCTTCATTAGATCCCCGTATATCCATCCTGAGTCATTGTCTGTTG
TCCGTTTTGAAGGAGTCTAGCAGCTTGATA

743 questions

votes

13 answers

Converting FASTQ to FASTA with SED/AWK

I have a data in that always comes in block of four in the following format (called FASTQ): @SRR018006.2016 GA2:6:1:20:650 length=36 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGN +SRR018006.2016 GA2:6:1:20:650…

asked Oct 09 '09 at 07:22

neversaint

50,277
118
274
437

votes

1 answer

Perl6 : What is the best way for dealing with very big files?

Last week I decided to give a try to Perl6 and started to reimplement one of my program. I have to say, Perl6 is so the easy for object programming, an aspect very painfull to me in Perl5. My program have to read and store big files, such as whole…

performance parsing grammar fasta raku

asked Aug 24 '18 at 13:03

Beuss

votes

3 answers

Read FASTA into a dataframe and extract subsequences of FASTA file

I have a small fasta file of DNA sequences which looks like this: >NM_000016 700 200 234 ACATATTGGAGGCCGAAACAATGAGGCGTGATCAACTCAGTATATCAC >NM_000775 700 124 236 CTAACCTCTCCCAGTGTGGAACCTCTATCTCATGAGAAAGCTGGGATGAG >NM_003820 700 111…

r subset bioinformatics fasta

asked Jan 21 '14 at 16:23

Paul.j

votes

3 answers

Sequence length of FASTA file

I have the following FASTA file: >header1 CGCTCTCTCCATCTCTCTACCCTCTCCCTCTCTCTCGGATAGCTAGCTCTTCTTCCTCCT TCCTCCGTTTGGATCAGACGAGAGGGTATGTAGTGGTGCACCACGAGTTGGTGAAGC >header2 GGT >header3 TTATGAT My desired output: >header1 117 >header2 3 >header3 7 # 3…

bash awk fasta

asked Jun 02 '14 at 10:44

cucurbit

votes

9 answers

Remove line breaks in a FASTA file

I have a fasta file where the sequences are broken up with newlines. I'd like to remove the newlines. Here's an example of my file: >accession1 ATGGCCCATG GGATCCTAGC >accession2 GATATCCATG AAACGGCTTA I'd like to convert it into…

unix awk newline bioinformatics fasta

asked Apr 06 '13 at 23:14

chimeric

votes

4 answers

parsing a fasta file using a generator ( python )

I am trying to parse a large fasta file and I am encountering out of memory errors. Some suggestions to improve the data handling would be appreciated. Currently the program correctly prints out the names however partially through the file I get a…

python file parsing fasta

asked Oct 04 '11 at 22:57

Lamar B

votes

4 answers

Efficient file buffering & scanning methods for large files in python

The description of the problem I am having is a bit complicated, and I will err on the side of providing more complete information. For the impatient, here is the briefest way I can summarize it: What is the fastest (least execution time) way to…

python performance io bioinformatics fasta

asked Jan 26 '11 at 03:55

eblume

1,518
2
16
21

votes

4 answers

Writing fasta files using R package seqinr?

When I use write.fasta in seqinr, the file that it outputs looks like this: >Sequence name 1 >Sequence name 2 >Sequence name 3 ...etc Sequence 1 Sequence 2 Sequence 3 ...etc In other words, the sequence names are all at the beginning of the…

r fasta

asked Aug 06 '12 at 00:00

Jennifer Collins

votes

2 answers

extract sequences from multifasta file by ID in file using awk

I would like to extract sequences from the multifasta file that match the IDs given by separate list of IDs. FASTA file…

search awk bioinformatics multiline fasta

asked Apr 09 '18 at 11:01

Dalibor Miklík

votes

2 answers

FASTA Algorithm Explanation

I'm trying to understand the basic steps of FASTA algorithm in searching similar sequences of a query sequence in a database. These are the steps of the algorithm: Identify common k-words between I and J Score diagonals with k-word matches,…

bioinformatics fasta

asked Dec 03 '11 at 08:47

conmadoi

2,103
3
13
5

votes

1 answer

chaos game for DNA sequences

I have tried the mathematica code for making the chaos game for DNA sequences posted in this address: http://facstaff.unca.edu/mcmcclur/blog/GeneCGR.html which is like this: genome = Import["c:\data\sequence.fasta", "Sequence"]; genome =…

wolfram-mathematica dna-sequence fasta chaos

asked Nov 04 '11 at 12:10

Layla

4,654
14
48
64

votes

3 answers

multiFASTA file processing

I was curious to know if there is any bioinformatics tool out there able to process a multiFASTA file giving me infos like number of sequences, length, nucleotide/aminoacid content, etc. and maybe automatically draw descriptive plots. Also an R…

bioinformatics biopython fasta bioconductor bioperl

asked Nov 24 '09 at 10:55

Federico Giorgi

9,409
9
38
50

votes

3 answers

Reading in file block by block using specified delimiter in python

I have an input_file.fa file like this (FASTA format): > header1 description data data data >header2 description more data data data I want to read in the file one chunk at a time, so that each chunk contains one header and the corresponding data,…

python python-3.x bioinformatics fasta

asked Jul 29 '16 at 09:25

Chris_Rands

30,797
12
66
100

votes

2 answers

Biopython SeqIO to Pandas Dataframe

I have a FASTA file that can easily be parsed by SeqIO.parse. I am interested in extracting sequence ID's and sequence lengths. I used these lines to do it, but I feel it's waaaay too heavy (two iterations, conversions, etc.) from Bio import…

python pandas biopython fasta

asked Oct 17 '13 at 20:38

Sara

votes

4 answers

How to find inverted repeated pattern in a FASTA sequence?

Suppose my long sequence looks like: 5’-AGGGTTTCCC**TGACCT**TCACTGC**AGGTCA**TGCA-3 The two italics subsequences (here within the two stars) in this long sequence are together called as inverted repeat pattern. The length and the combination of…

python fasta

asked Jan 12 '13 at 21:27

user1964587

2 3

…

49 50 Next