For bioinformatics, which language should I learn first?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi I'm Maria naddis dad and welcome to the very first episode of the om genomics show our first question is when I've been asked multiple times at conferences I want to learn bioinformatics but which programming language should I learn first when we talk about learning bioinformatics it's useful to divide the students up into two groups the ones who don't want to make their own software and the ones who do both of these groups will do data analysis around statistical tests make plots and use bioinformatics software made by other scientists but the second group will also make their own bio informatics software for the community to use if you need to make some specialized scripts for your own research but you're not releasing anything for other researchers in your field to use then you're in the first group for the first group you are likely to going to get the most use out of our some people are a little stuck up about our saying it's not a real programming language but it definitely is and it has a lot of cool things built into it that also make it ideal for bioinformatics it has a built in data type called data frame that has the same column and row setup as an Excel spreadsheet where your gene cells people time points etc will be rows while your variables are columns this makes a lot of sense as a way to think about most kinds of data so the Python people have actually made a package called pandas to copy some of this functionality into Python though it doesn't work as smoothly as the data frames do natively and are the packages available for our to do bioinformatics are also great ranging from RNA seek analysis to making phylogenetic trees and these are super easy to install from cran or from the bioconductor if you use the free our studio software as your programming environment and it's even easier to manage what you're doing and I would highly recommend our studio especially for beginners when you're just starting out another major advantage of our is ggplot2 which is the awesome package for making plots that gives you results really quickly with even minimal coding skills I made a video course about ggplot on my personal YouTube channel Marina de stad just search for plotting and are for biologists and that course includes a good getting started guide for our own generals you can figure out how to get your data in there and just to kind of set up your environment and everything like that so you can get started and are really quickly now for buy on some attrition to make their own software I would recommend that you start by learning either R or Python but also bash R is great for all the reasons I just described but if you like coding more than statistics you may enjoy Python style a lot more how could you possibly know you enjoy coding more the sophistic when you're choosing your first programming language but I would suggest trying them both and seeing what you like the best I personally enjoy coding and taesan more than an R because it's rules make more sense to me that feels more like a programming language even though are technically also is one in my experience it's also much easier to make a command-line - and Python than an R and Python also have some package for bioinformatics that are quite useful just like R does so they'll have different packages and you can decide based on your application which one of these languages are actually going to be useful for you right now as you can probably tell I've used both R and Python a lot in my work and I actually use R for plotting and statistics while I use Python for basically everything else ranging from merging variant call sets to providing back-end algorithms from my web applications so I mention that you should also learn - so it's very important for balance of magicians to learn bash which for all of our interns and purposes as biown petitions is interchangeable with shell the command line or the terminal so Bash is the primary way to access your data or your institutions cluster and to run most genomics and bioinformatics software it's also very powerful for manipulating your data like sorting filtering and doing calculations between columns like subtracting or adding columns together to get new numbers and all these are available through various utilities in bash in my experience and from everyone else I've talked to about it bash was confusing and scary at first but when you get the hang of it you start to feel this power surging through you and you can do things in a second that would normally take you hours to do by hand even two years into it I will still learn something new and bash that will blow my mind and I'll kick myself for wasting my time previously having programmed it from scratch in Python for like an hour but bash has all these built-in things that just make this really easy for you and as I said for many things that you're doing on your institutions cluster you may need bash in order to even access your files so it can be very useful also for running things like Sam tools and the liners and variant callers you're just going to need Bosch so I highly recommend that you learned that as one of your first languages in summary for wet lab people who want to add bioinformatics to their toolbox I would focus on learning are first and then applying it to your own work as much as possible for people who want to focus on file and cymatics as their main focus of their career and they want to make their own tools - for the rest of the community to use I would actually recommend learning the trifecta of our Python and bash so you could get away with choosing between arne python as long as you still learn bash in addition to one of them i can go into more depth on any of these topics or give an introduction to any of these languages and how to get started with them if you let me know in the comments below there are many other languages out there too so thus why end here I'm going to give a brief reason why each of these are not recommended for bioinformatics for beginners or for anyone at all in some cases C and C++ are great for making super optimized command-line tools like a liners and variant callers but you'll have a much easier time learning Python as your first language and then going to these high-performance languages for a particular problem in the future since they are a lot harder to learn they're more finicky and they take a lot more code to do the same thing that you can normally do in Python with a few lines another one is pearls Perl is still what a lot of people use but it's fading out of youth because Python can accomplish the same tasks and is easier to write code for especially for beginners ruby is one of the hot languages right now for good reason largely because of the power of Ruby on Rails from making database driven web applications like blogs and Twitter Ruby however is not great for bioinformatics because it lacks the community support in terms of the packages that are on Python half for bioinformatics so you would be better off learning a probably Python instead of Ruby JavaScript and PHP are great languages for web applications but if by automatics web application should never be your first project when you start learning bioinformatics and learning to code you could make a computational method in Python or R and then later make it into a web application but that's not a project for a beginner also HTML and CSS by the way are not programming languages if you were thinking about them but they actually markup and styling languages respectively that you will use along with JavaScript and PHP for that web application you might make someday but in the meantime don't worry about learning JavaScript PHP HTML or CSS until you have some kind of computational method you may want to turn into a web application later so this is not something for beginners but I highly recommend it for people who already have some piles of scripts that they want to make more accessible to the community and that's something that I can talk about in more detail later just let me know in the comments if that's something that you want to see more of Java is a popular language that most people have heard of and in bioinformatics a notable example of using Java is the genome browser igv however I wouldn't recommend Java for beginners because it has some issues including memory management that are not great for the data intensive area of bioinformatics and also Python and RSS have many more by own petitions who are using them so they're building packages and they can answer your questions online so there's a lot more help available there I would only recommend learning MATLAB if you're a neuroscientist in your lab already has several scripts written in MATLAB for you to use and so in the field of neuroscience it can be really great but because it is proprietary so you have to pay money to use it and because R and Python do we can do many of the same things I would only recommend MATLAB for neuroscientist who or where they have a wealth of scripts to use all right that's all I have to say about bioinformatics programming languages for now if you want to see more videos like this about bioinformatics then make sure to subscribe here on youtube and sign up for updates at or genome XCOM slash subscribe to get new videos guides and scripts about bioinformatics delivered to your email inbox every week and if you have a questions you'd like me to answer on the show you can send it to me by going over to own genomics comm slash TV and typing in your question there thank you for watching and I hope to see you next time on the om genomic show that's all I have to say about that
Info
Channel: OMGenomics
Views: 77,361
Rating: 4.9657307 out of 5
Keywords: bioinformatics, python, R (programming language), computational biology, bash, command-line, getting started in bioinformatics, biology, programming for biology, omgenomics
Id: ZZz9HROAONA
Channel Id: undefined
Length: 9min 42sec (582 seconds)
Published: Fri Mar 03 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.