Biology, Computer, programming and statistics in bioinformatics

Bioinformatics as an integrated field of science is the broad field of using math, computer science, and statistics to study biology. Bioinformatics include protein structure analysis, genome sequence assembly, and programming for data analysis and storage: bioinformatics is a center point where multiple variable number of skills converge. The more knowledge and background in the necessary skills, the better situated one may be in the field. Although there may be too many subsets of skills to itemize in great detail, as a general rule of thumb I like to think of the field as the convergence of three broad categories: biology, computer science, and statistics but every science can touch bioinformatics any way, from physics and biochemistry to biophysics and ethic and philosophy.

Biological Sciences in bioinformatics

At the center of bioinformatics is of course the Biological Sciences. This category can be further broked down into genetics, molecular biology, chemistry, structural biology, cancer biology, and a whole lot more. Biology is the center of the hub through which the other skills outlined below plug into. A strong knowledge of the subject is an important skill in bioinformatics, facilitating communication with biologists and helping to understand the underlying problem being investigated. While a deep knowledge of a particular field may not be required, a fundamental understanding of the Central Dogma of Molecular Biology is extremely important, as is a grasp of the concepts of organic chemistry.

Computers and Programming in bioinformatics

Bioinformatics is typically about accomplishing tasks ordinarily difficult (or impossible) to do by hand. Enter into the equation computers. Knowing how to ‘script’ is an essential skill, and knowing how to write production software an added bonus.

Biological Sciences has entered the realm of ‘big data’, typically in the form of ‘omics’ – genomics, metabolomics, proteomics, transcriptomics – techniques which produce huge amounts of data. The size of the data creates a downstream requirement: the necessity to analyze, store, and visualize the results. None of these are trivial tasks, and all require knowledge of how to use software as well as how to deal with these large sets of data. Knowing how data is structured for storage (typically in a SQL database), how to parse the data (scripting and database communications), and how to visualize the data (user interface design and/or knowledge of statistical tools and software).

Of course the basics of computer science are important: data structures, algorithms, code syntax, command line control, etc…Bioinformatics is not a single language field: while historically the language of choice has been perl due to its easy to use text parsing capability, any suitable language is often sufficient (python, java, R, C, etc…) – and given the many different problems one may face, often the more languages the better.


Statistics is inherent to bioinformatic data analysis, and virtually required for experimental design and interpretation in the biological sciences. While this may often not be a requirement, it is a very useful skill. The basics should be known (mean, standard deviation). More advanced topics (linear regression, baysian probability) may be harder to learn but also valuable skills. As biology is entering the realm of ‘big data’, machine learning, statistical learning, and data mining are all important statistical categories.

Leave a Reply

Your email address will not be published. Required fields are marked *