Open Science Data Cloud PIRE bio photo

Open Science Data Cloud PIRE

Providing training in data intensive computing using the Open Science Data Cloud.

Email us Twitter Facebook Flickr YouTube Newsletter All Posts

Presentations

Contents

Here are presentations contributed by PIRE fellows showcasing their OSDC PIRE sponsored projects. You can also view presentations of PIRE research on the OSDC-PIRE YouTube Channel.

2015 PIRE fellow presentations

Melissa Bica worked on Rendezview, an interactive visualization tool, at AIST in Japan. [slides] [report]

Rendezview poster Bica 2015
Rendezview: An Interactive Visual Mining Tool for Discerning Flock Relationships in Social Media Data, Melissa Bica and Kyoung-Sook Kim, Supercomputing 2015

Shelby Matlock worked on Hydra, a high-throughput virtual screening data visualization and analysis tool with researchers at AIST in Japan. [report]

Hydra poster Matlock 2015
Hydra, a high-throughput virtual screening data visualization and analysis tool, Shelby Matlock, Curtis Sera, Yasuhiro Watashiba, Kohei Ichikawa, and Jason Haga

Alex Moreno worked on a machine learning project at UvA in Amsterdam. Here is an abstract for his paper in submission on “Approximate Bayesian Computataion.”

Approximate Bayesian Computation (ABC) is a framework for performing likelihood-free posterior inference for simulation models. Variational inference (VI) is an appealing alternative to the inefficient sampling approaches commonly used in ABC. However, stochastic VI is highly sensitive to the variance of the gradient estimators and its ABC log-likelihood is biased. We draw upon recent advances in reparameterizations of the variational lower-bound (Kingma & Welling, 2014) and likelihood-free inference using deterministic simulations (Meeds & Welling, 2015) to produce both low variance and unbiased gradient estimators of the variational lower-bound. By then exploiting automatic differentiation libraries (Bergstra et al., 2010; Abadi et al., 2015) we can avoid nearly all model specific gradient derivations. We demonstrate performance on three problems and compare two to an existing variational ABC algorithm (Tran et al., 2015): inferring the success probability of Bernoulli trials, inferring the rate of an exponential distribution, and inferring parameters of a stochastic simulator representing blowfly populations. Our results demonstrate the correctness and efficiency of our algorithm.

Ryan Mork learned about workflow management systems at UvA in Amsterdam. His paper “Contemporary Challenges to Data Intensive Scientific Workflow Management Systems” was presented at the WORKS 2015 workshop.

  • Ryan Mork, Paul Martin, Zhiming Zhao, Contemporary Challenges for Data-Intensive Scientific Workflow Management Systems, WORKS 2015 [paper]

Nam Pho worked in Brazil at the University of Sao Paulo on data transfer using software defined networking.

  • Nam Pho, et al, Data Transfer in a Science DMZ using SDN with Applications for Precision Medicine in Cloud and High-performance Computing, 2015 [paper]

Genevieve Shattow put together a code repository for fetching data from the 1000 Genomes project, while working with researchers at the University of Edinburgh developing dispel4py. [Github repo]

Theano Stavrinos learned about software defined networking for data transfers at UvA in Amsterdam. Here is an abstract of her work, “Optimization of Data Transfers within Software-defined Networks,” in preparation:

In this paper we look into different performance aspects of data transfers within single-domain software-defined networks. We study how to derive optimal data transfer paths for such networks while simultaneously taking into account different QoS-related performance criteria. We present the model of the given network using an underlying topology graph, and discuss different approaches to solve the problem in which flows are assigned to paths (and not links) while different constraints are satisfied. We first define a multi-constrained path selection/multi-commodity flow algorithm, which may be used by a SDN controller to accommodate traffic requirements known in advance. As the measurements performed indicate that traffic may be characterized as heavy-tailed (“mice” and “elephants”), we derive heuristics that identify short-lived flows from throughput-bound ones. Finally, we discuss the size-dependent flow discrimination solution in which the separation of small and large flows across the same path is used.

2014 PIRE fellow presentations

Comparing Algorithms for Detecting Abrupt Change Points in Data, Cody L. Buntain, Christopher Natoli, Miroslav Zivkovic, 2014
Comparing Algorithms for Detecting Abrupt Change Points in Data, Cody L. Buntain, Christopher Natoli, Miroslav Zivkovic, Supercomputing 2014
Hydra: An HTML5-Based Application for High-Throughput Visualization of Ligand Docking, Yuan Zhao, Jason Haga, 2014
Hydra: An HTML5-Based Application for High-Throughput Visualization of Ligand Docking, Yuan Zhao, Jason Haga, Supercomputing 2014
Static Analysis of dispel4py Workflows and Python Programs to Insure Secure Information Flows
Verify Security: Static Analysis of dispel4py Workflows and Python Programs to Insure Secure Information Flows, Joshua Eisenberg, 2014
Wearable Cloud-based 3D Human Motion Detection Device Using MEMS Inertial/Magnetic Sensors, Fatemeh Abyarjoo, Armando Barreto, Zhiming Zhao, Heidi Alvarez, Naphtali Rishe, Francisco R. Ortega, 2014
Wearable Cloud-based 3D Human Motion Detection Device Using MEMS Inertial/Magnetic Sensors, Fatemeh Abyarjoo, Armando Barreto, Zhiming Zhao, Heidi Alvarez, Naphtali Rishe, Francisco R. Ortega, 2014
OSDC-ENVRI-GBIF Interoperation
OSDC-ENVRI-GBIF Interoperation, Weiwei Zhang, Ana Oprescu, Massimo Argenti, Lourens Veen, Heidi Alvarez, 2014
Experience with data transfer applications between OSDC and USP Science DMZ
Experience with Data Transfer Applications Between OSDC and USP Science DMZ, Nathaniel Butler, Michael Lewis, Fernando Redigolo, Dino Magri, Teresa Cristina Carvalho, 2014
  • Rosa Filguiera, Iraklis Klampanos, Amrey Krause, Mario David, Alexander Moreno, and Malcolm Atkinson. 2014. dispel4py: a Python framework for data-intensive scientific computing. In Proceedings of the 2014 International Workshop on Data Intensive Scalable Computing Systems (DISCS ‘14). IEEE Press, Piscataway, NJ, USA, 9-16. (doi:10.1109/DISCS.2014.12)
    [Paper Abstract]

  • Alexander Moreno and Tucker Balch, “Speeding up large-scale financial recomputation with memoization”, In Proceedings of the 7th Workshop on High Performance Computational Finance, WHPCF 2014, pages 17–22, Piscataway, NJ, USA, 2014. IEEE Press. (doi:10.1109/WHPCF.2014.9)
    [Paper Abstract]

2013 PIRE fellow presentations

Extending Rock Physics to the Cloud and Beyond Ashley Zebrowski & Shantenu Jha, 2013
Extending Rock Physics to the Cloud and Beyond, Ashley Zebrowski & Shantenu Jha, 2013
RockPy- A Python Library for Rock Physics Joshua Eisenberg, 2013
RockPy: A Python Library for Rock Physics, Joshua Eisenberg, 2013
Using Large Scale Clustering To Detect Potential Fire Regions Michael Lewis, 2013
Using Large Scale Clustering To Detect Potential Fire Regions, Michael Lewis, 2013
Distributed Processing of Workflow Applications Using the Storm Framework Joseph Korpela, 2013
Distributed Processing of Workflow Applications Using the Storm Framework, Joseph Korpela, 2013
Visualizing high dimensional music collections using t-SNE Kevin Crimi, 2013
Visualizing high dimensional music collections using t-SNE, Kevin Crimi, 2013
Performance Assessment of Large Astronomical Databases Maria Patterson, 2013
Performance Assessment of Large Astronomical Databases, Maria Patterson, 2013
Uva Data Service: Towards an Unified Architecture for Scientific Data Aggregation Pedro D  Maldonado, 2013
Uva Data Service: Towards an Unified Architecture for Scientific Data Aggregation, Pedro D Maldonado, 2013, (XSEDE14)
  • Eric Griffis, Paul Martin, James Cheney, “Semantics and Provenance for Processing Element Composition in Dispel Workflows”, Accepted Workshop Paper for WORKS 2013, 8th Workshop On Workflows in Support of Large-Scale Science, Nov. 17, 2013, Denver, Colorado
    [Paper Abstract] 

  • Ana-Maria Oprescu, Paola Grosso, Pedro Bello-Maldonado, Yuri Demchenko, Cees de Laat, “BigDataBus: Towards a Big Data Aggregation and Exchange Platform for eScience”, Proceedings of the Cracow ‘13 Grid Workshop, November 4-6, 2013, Krakow, Poland, page 97-99, Oct 2013, ISBN 978-83-61433-08-8.
    [Paper Abstract]

  • Ashley Zebrowski & Shantenu Jha, “Managing Complex Infrastructure requirements when computing in the Cloud”, Submitted to IEEE CCGrid 2014, 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Chicago, Illinois
    [Paper Abstract] 

  • M.T. Patterson, I. Klampanos, R. Collins, N. Cross, R. Mann, M. Holliman, and M. Atkinson, “Scalability Performance Assessment of Large Astronomical Databases: A comparison of row-oriented vs column-oriented databases for the Vista Variables in the Via Lactea (VVV) Survey”, Submitted, Supercomputing 2013, Denver, CO
    [Paper Abstract]

  • Noah Dunkan, “Anomalous sounds in real-time audio streams”, Submitted to  Interspeech 2014, 15th Annual Conference of the International Speech Communication Association, 14-18 September 2014, MAX Atria @ Singapore EXPO

2012 PIRE fellow presentations

PRAS-DT: Portable, Reliable and Automatic Streaming Data Transfer with Globus Online Christine Harvey, 2012
PRAS-DT: Portable, Reliable and Automatic Streaming Data Transfer with Globus Online Christine Harvey, 2012
  • Sandra Gesing, Malcolm Atkinson, Iraklis Klampanos, Michelle Galea, Michael R. Berthold, Roberto Barbera, Diego Scardaci, Gabor Terstyanszky, Tamas Kiss, and Peter Kacsuk. 2013. “The demand for consistent web-based workflow editors”, In Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science (WORKS ‘13). ACM, New York, NY, USA, 112-123. DOI=10.1145/2534248.2534260
    [Paper Abstract]

2011 PIRE fellow presentations

Workshop challenge presentations

Starting in 2013, we initiated a student workshop “challenge” in which PIRE fellows work in teams on small projects to kickstart scientific collaboration and innovation around big datasets and data intensive technologies. The challenge is announced during the international workshop, and the students present their work to all on the last day.

2015 Workshop student challenge

(see also Cees de Laat’s PIRE 2015 Page)

Team Title Presentation Score Rank
Steven Rapp, Melissa Bica, Theano Stavrinos, Race Clark gui4dispel4py: A dispel4py graphical user interface on the OSDC (slides) (paper) (github) 264 4
Nam Pho, Josh Miller A Reproducible and Automated Deployment of an HPC Application on a Private Cloud (slides) (paper) (github) 269 3
Ryan Mork, Genevieve Shattow Lasso: A meta search process to find collaborators (slides) (paper) (github) (website) 277 2
Grace Lu, Shelby Matlock, Jennifer Piscionere Daisy: Data Made Easy (slides) (paper) (website) 281 1

2014 Workshop student challenge

(see also Cees de Laat’s PIRE 2014 Page)

Team Title Presentation Score Rank
Race Clark, Chris Natoli, Jill Hardy, William Matthews Using OSDC to Advance Public Understanding of Temperature Extremes (slides) (paper) (github) 32 2
Nelson Auner, Cody Buntain Mayfly - Rapid, Accessible, Reproducible Research (slides) (paper) (github) 38 1
Alexander Moreno, Keval Shah, Yuan Zhao Automatic Variable Detection and Formatting for Cross-disciplinary Data Set Compatibility (slides) (paper) (github) 32 2
Nathiel Butler, Michael Lewis, Weiwei Zhang Tourist Buddy (slides) (paper) 34 2
Eric Griffis, Josh Eisenberg Client-side plug-ins for Tukey (slides) (paper) 31 3

2013 Workshop student challenge

(see also Cees de Laat’s PIRE 2013 Page)

Team Title Presentation Score Rank
Zac Flamig, Warren Cole, and Rafael Suarez Examining Vegetation Recovery Time after a Small Scale Disaster using MODIS Data and the OSDC (Slides) 39 4
Pedro D Bello Maldonado and Matthew Greenway Integrating the UvA Data Service with the OSDC (Slides) (Paper) 28 9
Joshua Miller, Spencer Claxton, Alice Mukora, and Eric Griffis (First Place) Stratosphere: A multilevel data-driven social-network for cloud computing (Paper) 42.5 1
Joseph Korpela and Matt Greenway Augmenting OSDC’s Datascope with a Finderscope (Paper)    
Warren Cole and Sandra Gesing Retroviral Links to Cancer (Slides) (Paper) 32 6
Maria T. Patterson, Joshua D. Eisenburg, and Rafael Suarez (Third Place) The addition of solar activity (space weather) data to the Open Science Data Cloud in order to facilitate cross-disciplinary studies (Slides) (Paper) 39.5 3
Joshua Eisenberg and Maria Patterson Cloud Query (Slides) (Paper) 30 8
Michael Lewis and Matthew Greenway Extending OSDC toolset for cross disciplinary discoveries (Slides) 30.5 7
Christine Harvey and Rafael Suarez (Second Place) Organ Procurement and Transplantation Network (OPTN) Database on the OSDC (Slides) (Paper) 40 2