Questions tagged [bioinformatics]

For programming-related questions related to Bioinformatics. Other questions do not belong here, but might be on-topic at https://bioinformatics.stackexchange.com/.

Bioinformatics is an interdisciplinary scientific field that develops methods and software tools for understanding biological data. Bioinformatics combines computer science, statistics, mathematics, and engineering to study and process various types of biological data.

There is a former Stack Exchange site specific to bioinformatics at Biostars and a new Stack Exchange site dedicated to bioinformatics

3434 questions
90
votes
11 answers

How much storage would be required to store a human genome?

I'm looking for the amount of storage in bytes (MB, GB, TB, etc.) required to store a single human genome. I read a few articles on Wikipedia about DNA, chromosomes, base pairs, genes, and have some rough guess, but before disclosing anything I'd…
Milan Babuškov
  • 55,232
  • 47
  • 119
  • 176
82
votes
4 answers

Remove part of string after "."

I am working with NCBI Reference Sequence accession numbers like variable a: a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2") To get information from the biomart package I need to remove the .1, .2…
Lisann
  • 5,075
  • 11
  • 37
  • 49
45
votes
10 answers

Dictionary style replace multiple items

I have a large data.frame of character data that I want to convert based on what is commonly called a dictionary in other languages. Currently I am going about it like so: foo <- data.frame(snp1 = c("AA", "AG", "AA", "AA"), snp2 = c("AA", "AT",…
Stedy
  • 6,841
  • 14
  • 51
  • 73
42
votes
3 answers

Count occurrences of given character per cell

Question For example if I wanted to count the number of Ns in a column of strings how can I do this in Google Spreadsheets at a per cell basis (i.e. a formula that points at one cell at a time that I can drag down)? Background I'm having to decide…
hello_there_andy
  • 1,821
  • 2
  • 17
  • 48
41
votes
9 answers

How to call module written with argparse in iPython notebook

I am trying to pass BioPython sequences to Ilya Stepanov's implementation of Ukkonen's suffix tree algorithm in iPython's notebook environment. I am stumbling on the argparse component. I have never had to deal directly with argparse before. How…
Niels
  • 1,265
  • 1
  • 12
  • 20
35
votes
6 answers

How to plot a gene graph for a DNA sequence say ATGCCGCTGCGC?

I need to generate a random walk based on the DNA sequence of a virus, given its base pair sequence of 2k base pairs. The sequence looks like "ATGCGTCGTAACGT". The path should turn right for an A, left for a T, go upwards for a G and downwards for a…
29
votes
12 answers

Why is Perl used so extensively in biology research?

I work as support staff in a biology research institute as a student, and Perl seems to be used everywhere. Not for every single project, but it seems that more than half the people here have a few Perl books in/on their office/desk. Why is Perl…
Kevin
  • 1,943
  • 2
  • 18
  • 20
28
votes
9 answers

Clojure or Scala for bioinformatics/biostatistics/medical research

I am not a professional programmer (my area is medical research), but I am quite capable in C/C++, and various scripting languages. A while back I got intrigued by Lisp, but I never got the time to seriously learn it. After a brief exposure to R I…
kliron
  • 3,895
  • 3
  • 28
  • 43
26
votes
11 answers

Finding matching keys in two large dictionaries and doing it fast

I am trying to find corresponding keys in two different dictionaries. Each has about 600k entries. Say for example: myRDP = { 'Actinobacter': 'GATCGA...TCA', 'subtilus sp.': 'ATCGATT...ACT' } myNames = { 'Actinobacter': '8924342' } I want…
Austin Richardson
  • 6,968
  • 10
  • 39
  • 44
26
votes
15 answers

Encouraging good development practices for non-professional programmers?

In my copious free time, I collaborate with a number of scientists (mostly biologists) who develop software, databases, and other tools related to the work they do. Generally these projects are built on a one-off basis, used in-house, and eventually…
Meredith L. Patterson
  • 4,513
  • 25
  • 29
24
votes
4 answers

Save complete web page (incl css, images) using python/selenium

I am using Python/Selenium to submit genetic sequences to an online database, and want to save the full page of results I get back. Below is the code that gets me to the results I want: from selenium import webdriver URL =…
Max Power
  • 6,732
  • 6
  • 42
  • 78
24
votes
4 answers

How to subtract strings in python

Basically, if I have a string 'AJ' and another string 'AJYF', I would like to be able to write 'AJYF'-'AJ' and get 'YF'. I tried this but got a syntax error. Just on a side note the subtractor will always will be shorter than the string it is…
jay a
  • 434
  • 2
  • 5
  • 11
24
votes
3 answers

WinError 2 The system cannot find the file specified (Python)

I have a Fortran program and want to execute it in python for multiple files. I have 2000 input files but in my Fortran code I am able to run only one file at a time. How should I call the Fortran program in python? My Script: import…
Jone
  • 311
  • 1
  • 3
  • 8
23
votes
0 answers

Differential gene expression analysis in Python

It seems that most differential gene expression packages for RNA-Seq are written in R. Examples include: - edgeR - limma - DESeq Are any similar (and easy to use) packages available for Python, or have any of the R packages been ported? The best I…
ljc
  • 741
  • 2
  • 8
  • 21
22
votes
6 answers

Inverse of Hamming Distance

*This is a brief introduction, the specific question is in bold at the last paragraph. I'm trying to generate all strings with a given Hamming Distance to solve efficiently a bioinformatic assignment. The idea is, given a string (ie.…
JackS
  • 383
  • 2
  • 10
1
2 3
99 100