Bioinformatics, My Technical Learnings: 2012

Monday, June 18, 2012

Automata: A coursera.org course on the theory of computation

Just recently I finished my 4th Coursera online class, titled Automata. Taught by Jeff Ullman, this course covered the basic theory of computation and automata theory and ran for 6 weeks. Ullman used his lecture notes from the equivalent course taught at Stanford, and 6 weeks felt like a very short period of time to fully internalize the concepts behind automata. Still, I think this is the most informative and rigorous online course I have taken to date.

The course consisted of video lectures, homework quizzes, and a single final exam. Their were also some extra credit problems and programming assignments, which were of lower priority for me. The per week video material was about 2 hrs straight through, and would usually take me around 10-15 hours taking notes and pausing to reflect on the material. Unlike other online courses, the video material was presented in larger chunks, usually 20 to 30 minutes long, and I have to say I like this format better. Ullman was very active on the discussion forums, leaving no question unanswered, and should be recognized for his outstanding dedication to helping students learn the material. If he teaches another course I will consider taking it for this reason alone.

The content covered included regular expressions, deterministic finite automata, non-deterministic finite automata, context free grammars, push-down automata, Turing Machines, and complexity classes P and NP. A major theme was properties and equivalences among and between these types of languages. Perhaps the most fascinating material was that of Turing Machines and computability, specifically how we prove the undecidability of the halting problem. Also, the complexity classes P and NP were well explained, and a few reductions of common problems like knapsack were given. If I spend more time learning about this topic, I will surely study Arora and Barak's Computational Complexity: A Modern Approach, which would not be possible without first learning the topics presented in Ullman's course.

Many proofs along the way where given, and I have developed a much better sense at the theoretical limits of computation along with sharpening my math and logic skills. Understanding whether a given problem is in P, is essential in knowing if an optimal algorithm can be written in polynomial time without making many assumptions about the nature of the problem instance. Any algorithmic development I undertake will have to take this into consideration. Further, I have learned the tools to understand many topics in computer science at a deeper level than possible before the course, and am looking forward to learning about topics built upon automata thoery, like approximation algorithms and the limits of parallel computation.

Wednesday, April 25, 2012

Design and Analysis of Algorithms I: a coursera.org course with Tim Roughgarden

Looking for an algorithms for a little over a year, I was very thrilled to find out coursera.org was going to offer an algorithms course, albeit part I of II. My objective in taking the class was to gain the skills needed for the mathematical analysis of algorithms, exposure to complex problems and computational solutions, and increase my programming skill and capability. Though not a master in any of these areas, I feel the course has definitely improved my skills, and am eager to sign up for Design and Analysis of Algorithms II.

The course consisted of 5 programming assignments, 5 problems sets, a final, and some optional theory problems. The material was presented in 5 week long blocks of about 2 hours worth of video material. The programming assignments ranged for pretty difficult(calculate strongly connected components on a graph of over 800k nodes) to relatively easy, (find if two elements of an array sum to a given number). The course discussion boards and theory problems really helped solidify my understand of the material. Additional "Theory" problems were also included, and were of the same difficulty as test problems given to Stanford students taking the course. These additional problems required much thought, and solidified the concepts presented in lecture.

The content of the class consisted of basic mathematical analysis methods(big-Oh notation), the divide and conquer paradigm, master method, randomized algorithms, a handful of data structures, and graph algorithms. Analysis of each presented algorithm that included a proof of its correctness and runtime complexity. Each algorithm presented was explained in a very intuitive way, much to the credit of our instructor Tim Roughgarden.

For this course, I purchased Cormen's algorithm text book, which turned out to be a very good investment. Compared to other textbooks, like Klienberg and Tardos, Cormen was more in depth, with greatly detailed analysis of algorithms and the mathematical rigor I need to convince myself of an argument. If there is one take home skill I got from the course, its ability to sit down with Cormen, and internalize a concept. A couple of the theory problems were drawn from Cormen, and I am glad to have the some help learning how to solve problems that once seem intimidated.

In conclusion, The course did an excellent job explaining the concepts required for algorithmic analysis, and given me the skills to continue my learning by using sources such as Cormen's book. Often used in proofs in the class, and in Cormen for that matter, are generating functions. This is a topic I need to look further in to if I want to generate proofs for my own nascent algorithms. I am eagerly awaiting the next installment of the class, and look forward to taking Coursera's Automata with Jeffrey Ullman in the meantime.

Thursday, March 8, 2012

Learning Clojure and the art of functional programming...

This post describes an overview of my experience with the Clojure programming language, why I learned it, how I learned it, what it's like, and where am I going with it.

Over the past few months, I have been programming a bit in Clojure. A direct descendant of McCarthy's eternal Lisp, Clojure is a functional language that runs on the JVM, fully interlopes with Java, and has excellent concurrency support with its Software Transactional Memory. Clojure trades all the boilerplate code of brackets and semicolons common to a language like C or Java, for function calls, parens, and macros. The level of expressiveness in code is truly helpful when hashing out a complex idea or algorithm. For example, I implemented a Burrows-Wheeler transformation https://gist.github.com/2004382 in under 40 lines.

I became comfortable with Clojure by taking a problem solving approach to learn the language, with the help of the website 4Clojure.org. 4Clojure presents a programming challenge, for example implementing library functions like 'map', that requires the user to write a function to satisfies a set of unit tests. By working the problems in increasing order of difficulty, the concepts of functional programming, IMO the hardest part to learning lisp, came naturally. For a period I became active on freenode's #clojure IRC channel, where I meet some fellows who I now share Clojure advice and code with via email chain and google group, http://groups.google.com/group/clojure-hackers?hl=en As for Clojure literature, I find The Joy of Clojure most helpful for technical questions and best practices. For theory, Functional Programming Practice and Theory by MacLennan really helped me grasp the concepts of on a more abstract, mathematical rigorous level. If you want to understand the logic and correctness of FP, this book is it! Finally, Pearls of Functional Algorithm Design contains a wealth of algorithms collected from mostly academic journals and other sources implemented in Haskell and easily converted to Clojure.

Clojure, as a young language, (2007) lacks the development tools and packages of a more mature production language although using Java classes and libraries solves the problem of a small code base. The source code I have read is well documented(clojuredocs.org), and between IRC and stackoverflow.com there are enough participants to field most questions. As for an editor, I have found VimClojure very suitable for syntax highlighting, making it easy to write functions and sample data to a file that is then loaded into an adjacent REPL and evaluated. For the community, I imagine Clojure now to be a lot like Ruby before Rails came out, the people involved are highly motivated, and language shows great promise. Fortunately for me, Clojure already does everything I need, even without a killer app.

In conclusion, Clojure has opened my eyes to a completely new way of thinking about a program: as the art of manipulating program state though time, transforming the logical identity of the input into that of the desired output (more here, http://clojure.org/state ) using the most powerful tool and rudimentary tool, the function. The practice of learning a new language in a different paradigm has helped me challenge the conventions and assumptions of procedural coding, opening the door for me to learn R programming on a much deeper level. When it comes down to choosing the next language for a future project, Clojure's ability to bring functional programming to the JVM/Java is very enticing, offering a very easy way to express my ideas as programs in a way no other language can.

Tuesday, January 24, 2012

The Nuts and Bolts of Proofs by Antonella Cupillari

In my quest to better understand bioinformatics, I have found myself on the topic of algorithms. After thumbing through Cormen's classic text, and a couple other resources online, I decided that a better grasp of mathematical proofs would give me the tools needed to tackle algorithms. I picked up The Nuts and Bolts of Proofs by Antonella Cupillari, and worked my way through in about a month. The topics covered were: Direct proof, proof by contrapositive, equivalence theorems, use of counterexamples, mathematical induction, uniqueness/existence theorems, and equality of sets/numbers.
Cupillari does an excellent job explaining the how to construct proofs, and leaves the reader with plenty of exercises after each chapter, as well as an hefty section of exercises without solutions and a collection proofs. I felt he did a good job covering the concepts of proofs, using no math beyond high school algebra to place the emphasis on constructing arguments using mathematical logic. The allowed me to start thinking in proofs sooner, without being bogged down by mathematical theory. Having completed the chapter exercises, and with no other training on mathematical proofs, I feel very comfortable constructing proofs which make up a large portion of the exercises in many algorithm text books. If you are interested in learning proofs and aren't afraid to put a little work in, I would definitely recommend this book!

Stanford's Machine Learning Course, ml-class.org

After two months of octave and online videos, I finished Stanford's online course on machine learning. I am very pleased with the course, and wish it could continue indefinitely. The course consisted of online videos, content quizzes, and finally programming assignments in octave. Topics covered include linear regression, neural networks, support vector machines, anomaly detection, and were presented in a way consistent with practical applications of these algorithms. The programming exercises were taught using Octave, GNU's version of Matlab, and stressed vectorization for efficient computations. This enabling the students to implement and train a computational expensive neural network, and train it with backpropagation for optical character recognition.
I learned a lot about linear algebra in the course, and developed an appreciation for its broad uses in numerical problems. Learning more about linear algebra is a priority, and I plan to take a course on it during graduate school or sooner but for right now a handle on matrix multiplication and 3d coordinates will have to do. Beyond basic linear algebra, the course was not very math heavy, requiring no proofs or formalism. That being said, some of the vectorization required for the homework assignments was a little tricky, but nothing beyond a little pen and paper work to figure out.
If you are looking to learn more about machine learning in general, or would like to learn more about a specific algorithm covered in the course, I would recommend going to ml-class.org and watching some of the videos. I am glad I took this course, and look forward to using the knowledge I gained in my research.