Blog_03!

Posted by Phil Colgan

Relational databases can help manage large, complex datasets where many types or groups of variables are used to describe a common subject. I am a microbial ecologist that uses metagenomics and high-throughput DNA sequencing to measure anthropogenic impact on a variety of different environments. My studies generate TONS of sequence data and metadata with way too many variables to keep track of on one spreadsheet. For example, say i want to study the impact of prophylactic antibiotic use in animal agriculture on the spread of antibiotic resistance genes in the environment. My main subject is going to be the metagenomic DNA that i extract from fecal or soil samples. Those metagenomes can be directly broken down into several complex tables based on sequence information including abundance and diversity of antibiotic resistance genes, abundance and diversity of phylogenetic marker genes, and specific taxonomic information about individual species identified. Thats just the tip of the iceberg though. I also have tons of metadata associated with each metagenome including information about the animal the sample came from, treatment conditions, and environmental condtions. Having these various types of data broken up into individual tables could make subsequent analysis of these data much easier so that i dont have to parse through one giant table. However, you can also do these things using other programming languages like R or python. I am not good enough with R or python yet to say with confidence that parsing can be just as easy using these other languages, but its possible. If that is the case, i’d rather just get really good with the languages that i currently use for many different things. There is another type of database called a non-relational database which is apparently better for horizontal scaling across clusters of servers, and can be faster and cheaper than relational databases. I think this most likely ventures far beyond what i will ever need a database for, but maybe i am wrong.