Open Science Data Cloud PIRE bio photo

Open Science Data Cloud PIRE

Providing training in data intensive computing using the Open Science Data Cloud.

Email us Twitter Facebook Flickr YouTube Newsletter All Posts

Introduction to Data Intensive Science and Cloud Computing - May 30–31, 2012

A two-day tutorial at the University of Chicago


Scientific instruments are producing unprecedented amounts of data, yet our ability to manage, analyze, integrate and share this data has not been keeping up.   As the amount of data grows, so does our ability to make new discoveries by integrating and analyze existing datasets.

We will give a tutorial introduction to utility clouds and to data clouds, both of which are being used to support data intensive computing.  Utility clouds provide user-provisioned, on-demand infrastructure.   Amazon Web Services (AWS) is a commercial provider of utility cloud services, and groups can set up their own utility clouds with OpenStack, as well as other applications.   Data clouds provide large-scale storage over commodity hardware and simple to use parallel processing over the storage.  Hadoop is the best known example of a data cloud.  Both OpenStack and Hadoop are open source.

In this two-day tutorial, we will give an introduction to these topics using several case studies. The tutorials include hands-on laboratory sessions each day.  The laboratory sessions require that you bring your own laptop.

Day 1

  • 08:30 – 09:30    Introduction to data intensive science and cloud computing
  • 09:30 – 10:30    Managing big data
  • 10:30 – 11:00    Morning break
  • 11:00 – 12:00    Modeling complex networks
  • 12:00 – 01:30 pm    Lunch
  • 01:30 – 02:30    Hands on AWS Laboratory
  • 02:30 – 03:30    Processing genomic data
  • 03:30 – 04:00    Break
  • 04:00 – 05:00    Parallel Processing Frameworks for big data

Day 2

  • 08:30 – 09:30    Introduction to R
  • 09:30 – 10:30    Processing text data
  • 10:30 – 11:00    Morning break
  • 11:00 – 12:00    Processing earth science image data
  • 12:00 – 01:30 pm    Lunch
  • 01:30 – 02:30    Hands on Hadoop Laboratory
  • 02:30 – 03:30    Globus Online
  • 03:30 – 04:00    Break
  • 04:00 – 05:00    Using the Open Science Data Cloud

Location: Computation Institute, Room 240 A/B.  The Computation Institute is located in the Searle Chemistry Laboratory at 5735 South Ellis Avenue, Chicago, IL 60637.

For more information:  Please contact Cindy Rogowski at cindy.rogowski (at)

Sponsors:  The two day workshop is sponsored by the Open Science Data Cloud.  The Open Science Data Cloud is a cloud-based platform for managing, analyzing, integrating and sharing scientific datasets operated by the Open Cloud Consortium.  The workshop is also sponsored by the Computation Institute and the Institute for Genomics and Systems Biology at the University of Chicago.  Additional funding is provided by the NSF through its PIRE Program (NSF Award #1129076).

This event will require registration.  Receipt of a confirmation email is required for complete registration to the tutorial.