Questions tagged [biopython]

Biopython is a set of freely available tools for biological computation written in Python. Please only use this tag for issues relating to the Biopython suite of tools.

Biopython is a set of freely available tools for biological computation written in Python. It is developed by The Biopython Project, an international association of developers of Python tools for computational molecular biology. It includes a range of bioinformatics functionalities such as:

  • Parsing bioinformatics files into data structures usable by Python

  • Interfaces to commonly used bioinformatics programs (BLAST, Clustalw, EMBOSS among others)

  • Class for dealing with DNA, RNA and protein sequences. This includes feature annotations.

  • Tools for performing common operations on sequences, such as translation, transcription and weight calculations

amongst many, many others.

The biopython tag

Questions with tag should relate to issues involving the Biopython package of tools.

Learning More

The web site http://www.biopython.org provides an online resource for modules, scripts, and web links for developers of Python-based software for life science research. It also has a useful wiki site.

The Biopython Cookbook provides many examples of Biopython being used as well as installation instructions and a FAQ section.

1128 questions
41
votes
9 answers

How to call module written with argparse in iPython notebook

I am trying to pass BioPython sequences to Ilya Stepanov's implementation of Ukkonen's suffix tree algorithm in iPython's notebook environment. I am stumbling on the argparse component. I have never had to deal directly with argparse before. How…
Niels
  • 1,265
  • 1
  • 12
  • 20
14
votes
13 answers

biopython no module named Bio

FYI: this is NOT a duplicate! Before running my python code I installed biopython in the cmd prompt: pip install biopython I then get an error saying 'No module named Bio' when try to import it in python import Bio The same thing happens…
Gabriel
  • 375
  • 1
  • 3
  • 16
12
votes
3 answers

SeqIO.parse on a fasta.gz

New to coding. New to Pytho/biopython; this is my first question online, ever. How do I open a compressed fasta.gz file to extract info and perform calcuations in my function. Here is a simplified example of what I'm trying to do (I've tried…
MelBel88
  • 135
  • 1
  • 6
12
votes
2 answers

How to find a open reading frame in Python

I am using Python and a regular expression to find an ORF (open reading frame). Find a sub-string a string that is composed ONLY of the letters ATGC (no spaces or new lines) that: Starts with ATG, ends with TAG or TAA or TGA and should consider the…
Nodnin
  • 409
  • 1
  • 7
  • 20
12
votes
11 answers

How do I convert the three letter amino acid codes to one letter code with python or R?

I have a fasta file as shown below. I would like to convert the three letter codes to one letter code. How can I do this with python or R? >2ppo ARGHISLEULEULYS >3oot METHISARGARGMET desired output >2ppo RHLLK >3oot MHRRM your suggestions would…
user1725152
  • 121
  • 1
  • 1
  • 4
9
votes
7 answers

Reverse complement of DNA strand using Python

I have a DNA sequence and would like to get reverse complement of it using Python. It is in one of the columns of a CSV file and I'd like to write the reverse complement to another column in the same file. The tricky part is, there are a few cells…
user3783999
  • 521
  • 2
  • 7
  • 16
9
votes
1 answer

Traceback in Smith-Wateman algorithm with affine gap penalty

I'm trying to implement the Smith-Waterman algorithm for local sequence alignment using the affine gap penalty function. I think I understand how to initiate and compute the matrices required for calculating alignment scores, but am clueless as to…
jonwells
  • 213
  • 2
  • 7
8
votes
3 answers

Why can't python find some modules when I'm running CGI scripts from the web?

I have no idea what could be the problem here: I have some modules from Biopython which I can import easily when using the interactive prompt or executing python scripts via the command-line. The problem is, when I try and import the same biopython…
Dave
  • 2,008
  • 1
  • 19
  • 24
8
votes
1 answer

Is there a way with biopython to obtain the full abstract from a pubmed article?

I currently have the following code which queries pubmed: from Bio import Entrez Entrez.email = "kuharrw@hiram.edu" # Always tell NCBI who you are handle = Entrez.esearch(db="pubmed", term="bacteria") record = Entrez.read(handle) list =…
7
votes
2 answers

Is there a function that can calculate a score for aligned sequences given the alignment parameters?

I try to score the already-aligned sequences. Let say seq1 = 'PAVKDLGAEG-ASDKGT--SHVVY----------TI-QLASTFE' seq2 = 'PAVEDLGATG-ANDKGT--LYNIYARNTEGHPRSTV-QLGSTFE' with given parameters substitution matrix : blosum62 gap open penalty : -5 gap…
Jessada Thutkawkorapin
  • 1,276
  • 3
  • 16
  • 32
7
votes
3 answers

multiFASTA file processing

I was curious to know if there is any bioinformatics tool out there able to process a multiFASTA file giving me infos like number of sequences, length, nucleotide/aminoacid content, etc. and maybe automatically draw descriptive plots. Also an R…
Federico Giorgi
  • 9,409
  • 9
  • 38
  • 50
6
votes
1 answer

Issue with parsing publication data from PubMed with Entrez

I am trying to use Entrez to import publication data into a database. The search part works fine, but when I try to parse: from Bio import Entrez def create_publication(pmid): handle = Entrez.efetch("pubmed", id=pmid, retmode="xml") …
apiljic
  • 446
  • 4
  • 12
6
votes
1 answer

Phylo BioPython building trees

I trying to build a tree with BioPython, Phylo module. What I've done so far is this image: each name has a four digit number followed by - and a number: this number refer to the number of times that sequence is represented. That means 1578 - 22,…
psoares
  • 4,014
  • 7
  • 34
  • 52
6
votes
2 answers

How can I extract the abstract from efetch (Biopython, Entrez)?

I am new to python and would like to extract abstracts from pubmed using the entrez system from the bio package. I got the esearch to give me my UIDs (stored in my_list_ges) and I can also download an entry using efetch. Now, however, the result is…
MaxS
  • 678
  • 1
  • 10
  • 28
6
votes
1 answer

Can Biopython perform Seq.find() accounting for ambiguity codes

I want to be able to search a Seq object for a subsequnce Seq object accounting for ambiguity codes. For example, the following should be true: from Bio.Seq import Seq from Bio.Alphabet.IUPAC import IUPACAmbiguousDNA amb = IUPACAmbiguousDNA() s1 =…
Malonge
  • 1,700
  • 5
  • 19
  • 29
1
2 3
75 76