Skip to content

Session 6(A): Data Servers on the Cloud

MSU Mara Hyena Project Database Entity Relationship Diagram

Introduction

Data servers (like Relational Databases) can be a powerful tool for even small research projects. When we say "Data Servers" or "Data Systems" we mean any server with data processing capabilities that you connect to with a client to send data, commands and receive results. The most widely used and classic example is the relational database management system (RDBMs) invented in the 1970s, but there are many other types. A central advantage of data servers is ability to handle many conncurrent connections. Connections can be from many users, a web application serving many uers, or many other concurrent processes. Like other systems (such as VMs, File storage servers, big data tools, etc), these data systems don't require cloud computing, but cloud companies offer database services such taht with a few clicks you can have a server that would take a week to provision and years to maintain. A data server could be a productive addition to your cloud architecturee, or the central aspect of your fellowship project.

Databases are used in thosands of research projects that had significant data entry burden requiring many work-hours of students typing in data, or shared systems.

Databases vs Storage

A Comparison of Databases and storage may help understand the role of a database vs simply keeping your data in files (for example Excel of CSV files).

Readings

Difference between SQL and 'NoSQL' style databases

Towards DataScience blog Introduction to Databases for Data Scientists Note this blog post may require a free account but reading in an incognito or private window worked for me

A description of SQL vs NoSQL from a company called xplenty Which Modern Database Is Right for Your Use Case? this has many adds/pop-ups but is a good read

Activities

Azure offers a database user interface for your computer called Azure Data Studio and has a tutorial for using, but you need to create a database server first.

If you are interested in using SQL for your project, consider the following two activities based on the Open Source database system Postgresql

  1. Quickstart: Create an Azure Database for PostgreSQL server by using the Azure portal
    • I suggest using a password rather than ssh key for now to make it easier, but not recommend for long-term use
    • record the admin user name and password you used when creating the database
    • in the Azure portal, visit the Postgresql Database resource page, and look at the connection strings page to use in the next tutorial
  2. Quickstart: Use Azure Data Studio to connect and query PostgreSQL

  3. In depth SQL Tutorial:
    After completing the two activities above, If you are interested in starting with SQL, this free tutorial looks pretty good. Let us know if you tried it and it was not helpful:

  4. PostgreSQL Tutorial from TutorialsPoint.com .
  5. Note: A Database server can house many "databases" , and sometimes a database is called a "schema" so you can use the database server you created above, and the same steps for how to connect to the server, then create a new database inside the server for the tutorial.
  6. In the tutorial above, the chapter "Environmental Setup" that walks you through installing Postgresql server on your laptop which also provides you a command-line interface on your laptop. You could augment that chapter by provisioning a Postgresql server in Azure (see link above) and connecting to it.
  7. If you don't want to install Postgresql on your laptop, you can either 1) use the Azure Data Studio, but only for SQL commands (not commands like \help (see the activity #2 above ), OR 2) if you are familiar with command line, the Azure Cloud shell comes with the postgresql command psql as shown in Activity #1 above.

  8. Delete the database in your resource group when you have finished with the tutorial. If you would like to estimate costs for using database, keep it for a few days, then visit the "Cost Analysis" page of your resource group in the Azure portal. SQL services can get expensive quickly compared to VMs, but installing and running postgresql server software is a daunting task and comes with security risks.

Using a Database Server in the cloud has many layers so if you are interested in using SQL feel free to reach out for help for incorporating SQL into your research.

References

A list of the commands you may use to manage a database for the Postgresql variety: https://zaiste.net/posts/postgresql-primer-for-busy-people/

Optional : Data Analytics on the Google Platform

Google Offers a service called "BigQuery" which does not require you to create a server, only an account. It can process huge amounts of data, and has several datasets available for free. It also has

You may try BigQuery for free in their Sandbox, with only a Google Account. If you have a gmail account, use that to log in with an incognito/private browser window. You may try your msu.edu email but it may or may not work.

In the following tutorial you may ignore mentions of "firebase" which is another database offering for web apps from google

Using the BigQuery sandbox

If you are a python user, Google offeres a free notebook service called "Co-Lab" and you can connect to both google drive and big query from Colab.

Google Colab + Google BigQuery + GoogleDrive is definitely cloud computing but not in the sense we usually think about it as these are all SAAS level services, chained together via a cloud computing backdrop.