In my quest to better understand bioinformatics, I have found myself on the topic of algorithms. After thumbing through Cormen's classic text, and a couple other resources online, I decided that a better grasp of mathematical proofs would give me the tools needed to tackle algorithms. I picked up The Nuts and Bolts of Proofs by Antonella Cupillari, and worked my way through in about a month. The topics covered were: Direct proof, proof by contrapositive, equivalence theorems, use of counterexamples, mathematical induction, uniqueness/existence theorems, and equality of sets/numbers.
Cupillari does an excellent job explaining the how to construct proofs, and leaves the reader with plenty of exercises after each chapter, as well as an hefty section of exercises without solutions and a collection proofs. I felt he did a good job covering the concepts of proofs, using no math beyond high school algebra to place the emphasis on constructing arguments using mathematical logic. The allowed me to start thinking in proofs sooner, without being bogged down by mathematical theory. Having completed the chapter exercises, and with no other training on mathematical proofs, I feel very comfortable constructing proofs which make up a large portion of the exercises in many algorithm text books. If you are interested in learning proofs and aren't afraid to put a little work in, I would definitely recommend this book!
A blog documenting the topics I have chosen to study in an effort improve my technical skill and knowledge for success in the field of bioinformatics. All entries are designed to communicate the extent to which I have prepared myself in each area and the materials/methods used to do so.
Tuesday, January 24, 2012
Stanford's Machine Learning Course, ml-class.org
After two months of octave and online videos, I finished Stanford's online course on machine learning. I am very pleased with the course, and wish it could continue indefinitely. The course consisted of online videos, content quizzes, and finally programming assignments in octave. Topics covered include linear regression, neural networks, support vector machines, anomaly detection, and were presented in a way consistent with practical applications of these algorithms. The programming exercises were taught using Octave, GNU's version of Matlab, and stressed vectorization for efficient computations. This enabling the students to implement and train a computational expensive neural network, and train it with backpropagation for optical character recognition.
I learned a lot about linear algebra in the course, and developed an appreciation for its broad uses in numerical problems. Learning more about linear algebra is a priority, and I plan to take a course on it during graduate school or sooner but for right now a handle on matrix multiplication and 3d coordinates will have to do. Beyond basic linear algebra, the course was not very math heavy, requiring no proofs or formalism. That being said, some of the vectorization required for the homework assignments was a little tricky, but nothing beyond a little pen and paper work to figure out.
If you are looking to learn more about machine learning in general, or would like to learn more about a specific algorithm covered in the course, I would recommend going to ml-class.org and watching some of the videos. I am glad I took this course, and look forward to using the knowledge I gained in my research.
I learned a lot about linear algebra in the course, and developed an appreciation for its broad uses in numerical problems. Learning more about linear algebra is a priority, and I plan to take a course on it during graduate school or sooner but for right now a handle on matrix multiplication and 3d coordinates will have to do. Beyond basic linear algebra, the course was not very math heavy, requiring no proofs or formalism. That being said, some of the vectorization required for the homework assignments was a little tricky, but nothing beyond a little pen and paper work to figure out.
If you are looking to learn more about machine learning in general, or would like to learn more about a specific algorithm covered in the course, I would recommend going to ml-class.org and watching some of the videos. I am glad I took this course, and look forward to using the knowledge I gained in my research.
Monday, December 12, 2011
Introduction To Databases: a db-class.org course review.
I just finished the final for Stanford's free online database course, and am satisfied with my experience. First, the course was open to everybody and ran from Oct. 14 to Dec 12. It consisted of video lectures, quizzes, two online tests(final and a midterm) and exercises, which were executed in an online sandbox. The topics covered could be broken down into two groups, use of existing databases, and database design. With my background, I was mostly interested in use of existing databases, so I am going to talk about that.
SQL was covered very well, greatly expanding my knowledge and teaching me some relational algebra theory to understand its mathematical basis. The online exercises in SQL were most helpful, and I felt pretty challenged by a few of the exercises. They course admin's at Stanford were even able to implement a DSL for relational algebra exercises, which I thought was pretty cool. Going forward, I feel pretty confident using SQL to extract information from a database, and the relational algebra way of thinking has served me well manipulating data.frames in R.
XML was also covered, and was a major reason for me taking the course. In my research I deal with many different types of file formats, and would ideally like to see them go to XML for the sake of consistency. We covered XPath, XQuery and I have already utilized those in my work, while XSLT was a little bit more confusing. My go to XPath application is now Kernow for sandboxing, and perl's XML::Simple and XML::Xpath for scripting. Unfortunately, some of the XML files I tried to parse were very large( > 500MB) and I ditched these tools to use the tried and true method of regular expression parsing in perl. I suppose some more looking would help me find an solution for large XML files, and am currently open to suggestions.
I was interested in taking the database course to increase my computational skills, particularly SQL and XML, and through the rest of the course feel like I have gained a broad, although relatively shallow, understanding of different aspect of database design, use, and theory. I look forward to implementing an SQLite database on my own data someday, and encourage anyone interested in databases to enroll in this course if it is offered again.
Tuesday, November 29, 2011
perl Programming: a script to run background Java programs
One task I often find myself doing is running a java program on each member of a dataset. If the java program is configured to run in the background, you will initiated enough java calls to crash your computer. I hacked together a solution using perl that waits for each call to java to finish before initiating the next.
#!/usr/bin/perl
use strict;
my (@fileA,@out);
#put java command into string
my $command = "./java.sh";
#simulate multipul calls with foreach loop
foreach my $wait (1..20){#represents the iterations of inputs run via a java program
#open a handle for the command, piping into $java
open my $java, '-|', $command or die "Could not run java ... - $!";
#while handle is still be written to, print .
print "running command ${wait}";
while (<$java>) {
print ".";
#push the output of the command into file array
@fileA = <$java>;
}
push(@out, @fileA);
#close the pipe when the command is done
close($java);
}
#print to some output file
my $file = "out.txt";
#open it, print to it the @out array, close it, print \n
open my $f, '>', $file;
print $f @out;
close $f;
print "\n";
#java.sh
#############################################
#!/bin/bash
#java -classpath /home/user/pathto/jarFile.jar ducks.main &
############################################
Day 0
Hello and welcome to my blog. I intend to make this blog a collection of my ideas pertaining to all things bioinformatics. I am in the middle of applying to graduate school for bioinformatics, and will keep you updated on the status of my applications,good news only :).
Subscribe to:
Posts (Atom)