Questions tagged [fasta]

FASTA is a software package for sequence alignment of proteins and nucleic acids. FASTA is also the name of the file format used by these programs to represent sequences of peptides or nucleotides. The format is a de facto standard in bioinformatics.

The FASTA format (read as "fast A format") is a text-based format used by the FASTA software for representing nucleic acids and proteins. It represents each nucleotide and amino-acid as a letter. The FASTA format also supports naming of sequences.

The format achieved great popularity, becoming the de facto standard for representing biological sequences.

A bioinformatical record in FASTA format consists of the header (comment) string followed by one or more strings describing the sequence (one letter per nucleotide or amino acid). Header strings begin with >. The sequence that follows is wrapped at a fixed width (often 60, but generally no more than 80).

> Sample nucleotide sequence
AGCACTGAGTAACGTATAAGCAGTCCCCGGACGCGTA
> Nucleotide sequence #2
GCCACGGGAGTTGAAGAACATCGAGAATGCCACTAGTTTTCACCCTTCATAGATATCCTA
GCGCCGTACATGTATACGAGATCTTTGTCACGCAGTATGGAGGATTGTGGCCAGCAATAC
GTCGTGTCCCGCAATGCTTCATTAGATCCCCGTATATCCATCCTGAGTCATTGTCTGTTG
TCCGTTTTGAAGGAGTCTAGCAGCTTGATA

743 questions

votes

4 answers

Printing a sequence from a fasta file

I often need to find a particular sequence in a fasta file and print it. For those who don't know, fasta is a text file format for biological sequences (DNA, proteins, etc.). It's pretty simple, you have a line with the sequence name preceded by a…

bash grep fasta

asked Oct 01 '14 at 15:17

Colin

8,627
10
42
50

votes

2 answers

Using Bio.SeqIO to write single-line FASTA

QIIME requests this (here) regarding the fasta files it receives as input: The file is a FASTA file, with sequences in the single line format. That is, sequences are not broken up into multiple lines of a particular length, but instead the entire…

python python-2.7 bioinformatics biopython fasta

asked Jun 11 '14 at 07:01

Korem

9,501
5
46
67

votes

3 answers

Convert table into fasta in R

I have a table like this: >head(X) column1 column2 sequence1 ATCGATCGATCG sequence2 GCCATGCCATTG I need an output in a fasta file, looking like this: sequence1 ATCGATCGATCG sequence2 GCCATGCCATTG So, basically I need all entries of the 2nd…

r fasta

asked Apr 29 '14 at 19:56

user3586764

votes

3 answers

append contents from one file to another with newline separation

I'm trying to, I think, replicate the cat functionality of the Linux shell in a platform-agnostic way such that I can take two text files and merge their contents in the following manner: file_1 contains: 42 bottles of beer on the wall file_2…

python python-2.7 concatenation fasta shutil

asked Dec 16 '13 at 09:45

glarue

votes

2 answers

Using Biopython (Python) to extract sequence from FASTA file

Ok so I need to extract part of a sequence from a FASTA file, using python (biopython, http://biopython.org/DIST/docs/tutorial/Tutorial.html) I need to get the first 10 bases from each sequence and put them in one file, preserving the sequence info…

python python-2.7 biopython fasta

asked Oct 30 '12 at 03:29

user1784467

votes

2 answers

Scala functional way of processing large scala data with lazy collections

I am trying to figure out memory-efficient AND functional ways to process a large scale of data using strings in scala. I have read many things about lazy collections and have seen quite a bit of code examples. However, I run into "GC overhead…

scala memory collections lazy-evaluation fasta

asked Jun 05 '12 at 11:05

Wayne Jhukie

votes

1 answer

Making Blast database from FASTA in Python

How can I do this? I use Biopython and saw manual already. Of course I can make blastdb from FASTA using "makeblastdb" in standalone NCBI BLAST+, but I want to whole process in one program. It seems there are two possible solutions. Find a function…

python biopython fasta blast

asked Feb 18 '12 at 15:38

user1218225

votes

2 answers

Parsing file in parallel

I am thinking about a way to parse a fasta-file in parallel. For those of you not knowing fasta-format an example: >SEQUENCE_1 MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG …

parsing concurrency parallel-processing bioinformatics fasta

asked Nov 24 '11 at 14:53

peri4n

1,303
13
21

votes

3 answers

How do I merge two FASTA files (one file with line break) in Perl?

I have two following Fasta file: file1.fasta >0 GAATAGATGTTTCAAATGTACCAATTTCTTTCGATT >1 GTTAAGTTATATCAAACTAAATATACATACTATAAA >2 GGGGCTGTGGATAAAGATAATTCCGGGTTCGAATAC file2.qual >0 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40…

perl bioinformatics fasta

asked Apr 10 '09 at 06:12

neversaint

50,277
118
274
437

votes

2 answers

Removing lines which match with specific pattern from another file

I've got two files (I only show the beginning of these files)…

awk grep fasta

asked Feb 18 '21 at 16:41

Paillou

votes

2 answers

Find length of a contig in one fasta, using the header of another fasta as query in python

I'm trying to find a python solution to extract the length of a specific sequence within a fasta file using the full header of the sequence as the query. The full header is stored as a variable earlier in the pipeline (i.e. "CONTIG"). I would like…

python bioinformatics biopython fasta

asked Jul 26 '20 at 22:29

Gunther

votes

1 answer

How to remove duplicates from fasta file but keep at least one per group based on header

I have a multifasta file that looks like this: ( all sequences are >100bp, more than one line, and same lenght…

python fasta

asked Jul 25 '20 at 19:52

Xela Vi

votes

1 answer

Pairwise alignment of multi-FASTA file sequences

I have multi-FASTA file containing more than 10 000 fasta sequences resulted from Next Generation Sequencing and I want to do pairwise alignment of each sequence to each sequence inside the file and store all the results in the same new file in…

python bioinformatics biopython fasta pairwise

asked Aug 05 '19 at 16:07

Aurora

votes

1 answer

Is there a way to collect many multiline strings delineated by a specific character into an Arraylist using the data stream in Java 8?

I have a fasta file that I want to parse into an ArrayList, each position having an entire sequence. The sequences are multiline strings, and I don't want to include the identification line in the string that I store. My current code splits each…

arraylist collections java-8 fasta multilinestring

asked Apr 27 '19 at 16:35

Sam

votes

1 answer

Directly calling SeqIO.parse() in for loop works, but using it separately beforehand doesn't? Why?

In python this code, where I directly call the function SeqIO.parse() , runs fine: from Bio import SeqIO a = SeqIO.parse("a.fasta", "fasta") records = list(a) for asq in SeqIO.parse("a.fasta", "fasta"): print("Q") But this, where I first…

python bioinformatics biopython fasta

asked Feb 21 '19 at 02:32

Abraham Ahmad

Prev 1 2

…

49 50 Next