Monday, December 12, 2011

Introduction To Databases: a db-class.org course review.


I just finished the final for Stanford's free online database course, and am satisfied with my experience.  First, the course was open to everybody and ran from Oct. 14 to Dec 12.  It  consisted of video lectures, quizzes, two online tests(final and a midterm) and exercises, which were executed in an online sandbox.  The topics covered could be broken down into two groups, use of existing databases, and database design. With my background, I was mostly interested in use of existing databases, so I am going to talk about that.  

SQL was covered very well, greatly expanding my knowledge and teaching me some relational algebra theory to understand its mathematical basis.  The online exercises in SQL were most helpful, and I felt pretty challenged by a few of the exercises. They course admin's at Stanford were even able to implement a DSL for relational algebra exercises, which I thought was pretty cool.   Going forward, I feel pretty confident using SQL to extract information from a database, and the relational algebra way of thinking has served me well manipulating data.frames in R.    

XML was also covered, and was a major reason for me taking the course.  In my research I deal with many different types of file formats, and would ideally like to see them go to XML for the sake of consistency.  We covered XPath, XQuery and I have already utilized those in my work, while XSLT was a little bit more confusing.   My go to XPath application is now Kernow for sandboxing, and perl's XML::Simple and XML::Xpath for scripting.  Unfortunately, some of the XML files I tried to parse were very large( > 500MB) and I ditched these tools to use the tried and true method of regular expression parsing in perl.  I suppose some more looking would help me find an solution for large XML files, and am currently open to suggestions.  

I was interested in taking the database course to increase my computational skills, particularly SQL and XML, and through the rest of the course feel like I have gained a broad, although relatively shallow, understanding of different aspect of database design, use, and theory. I look forward to implementing an SQLite database on my own data someday, and encourage anyone interested in databases to enroll in this course if it is offered again.