Monday, December 12, 2011

Introduction To Databases: a db-class.org course review.


I just finished the final for Stanford's free online database course, and am satisfied with my experience.  First, the course was open to everybody and ran from Oct. 14 to Dec 12.  It  consisted of video lectures, quizzes, two online tests(final and a midterm) and exercises, which were executed in an online sandbox.  The topics covered could be broken down into two groups, use of existing databases, and database design. With my background, I was mostly interested in use of existing databases, so I am going to talk about that.  

SQL was covered very well, greatly expanding my knowledge and teaching me some relational algebra theory to understand its mathematical basis.  The online exercises in SQL were most helpful, and I felt pretty challenged by a few of the exercises. They course admin's at Stanford were even able to implement a DSL for relational algebra exercises, which I thought was pretty cool.   Going forward, I feel pretty confident using SQL to extract information from a database, and the relational algebra way of thinking has served me well manipulating data.frames in R.    

XML was also covered, and was a major reason for me taking the course.  In my research I deal with many different types of file formats, and would ideally like to see them go to XML for the sake of consistency.  We covered XPath, XQuery and I have already utilized those in my work, while XSLT was a little bit more confusing.   My go to XPath application is now Kernow for sandboxing, and perl's XML::Simple and XML::Xpath for scripting.  Unfortunately, some of the XML files I tried to parse were very large( > 500MB) and I ditched these tools to use the tried and true method of regular expression parsing in perl.  I suppose some more looking would help me find an solution for large XML files, and am currently open to suggestions.  

I was interested in taking the database course to increase my computational skills, particularly SQL and XML, and through the rest of the course feel like I have gained a broad, although relatively shallow, understanding of different aspect of database design, use, and theory. I look forward to implementing an SQLite database on my own data someday, and encourage anyone interested in databases to enroll in this course if it is offered again. 
 

Tuesday, November 29, 2011

perl Programming: a script to run background Java programs

One task I often find myself doing is running a java program on each member of a  dataset. If the java program is configured to run in the background, you will initiated enough java calls to crash your computer. I hacked together a solution using perl that waits for each call to java to finish before initiating the next.


 
#!/usr/bin/perl
use strict;
my (@fileA,@out);
#put java command into string
my $command =  "./java.sh";
#simulate multipul calls with foreach loop
foreach my $wait (1..20){#represents the iterations of inputs run via a java program
  #open a handle for the command, piping into $java
  open my $java, '-|',  $command or die "Could not run java ... - $!";
  #while handle is still be written to, print .
  print "running command ${wait}";
  while (<$java>) {
       print ".";
  #push the output of the command into file array
  @fileA = <$java>;
   }
   push(@out, @fileA);
  #close the pipe when the command is done
  close($java);
}

#print to some output file
my $file = "out.txt";
#open it, print to it the @out array, close it, print \n 
open my $f, '>', $file;
print $f @out;
close $f;
print "\n";

#java.sh
#############################################
#!/bin/bash
#java -classpath /home/user/pathto/jarFile.jar ducks.main &
############################################


 

Day 0

Hello and welcome to my blog. I intend to make this blog a collection of my ideas pertaining to all things bioinformatics. I am in the middle of applying to graduate school for bioinformatics, and will keep you updated on the status of my applications,good news only :).