Here are presentations contributed by PIRE fellows showcasing their OSDC PIRE sponsored projects. You can also view presentations of PIRE research on the OSDC-PIRE YouTube Channel.
2015 PIRE fellow presentations
Shelby Matlock worked on Hydra, a high-throughput virtual screening data visualization and analysis tool with researchers at AIST in Japan. [report]
Alex Moreno worked on a machine learning project at UvA in Amsterdam. Here is an abstract for his paper in submission on “Approximate Bayesian Computataion.”
Approximate Bayesian Computation (ABC) is a framework for performing likelihood-free posterior inference for simulation models. Variational inference (VI) is an appealing alternative to the inefficient sampling approaches commonly used in ABC. However, stochastic VI is highly sensitive to the variance of the gradient estimators and its ABC log-likelihood is biased. We draw upon recent advances in reparameterizations of the variational lower-bound (Kingma & Welling, 2014) and likelihood-free inference using deterministic simulations (Meeds & Welling, 2015) to produce both low variance and unbiased gradient estimators of the variational lower-bound. By then exploiting automatic differentiation libraries (Bergstra et al., 2010; Abadi et al., 2015) we can avoid nearly all model specific gradient derivations. We demonstrate performance on three problems and compare two to an existing variational ABC algorithm (Tran et al., 2015): inferring the success probability of Bernoulli trials, inferring the rate of an exponential distribution, and inferring parameters of a stochastic simulator representing blowfly populations. Our results demonstrate the correctness and efficiency of our algorithm.
Ryan Mork learned about workflow management systems at UvA in Amsterdam. His paper “Contemporary Challenges to Data Intensive Scientific Workflow Management Systems” was presented at the WORKS 2015 workshop.
- Ryan Mork, Paul Martin, Zhiming Zhao, Contemporary Challenges for Data-Intensive Scientific Workflow Management Systems, WORKS 2015 [paper]
Nam Pho worked in Brazil at the University of Sao Paulo on data transfer using software defined networking.
- Nam Pho, et al, Data Transfer in a Science DMZ using SDN with Applications for Precision Medicine in Cloud and High-performance Computing, 2015 [paper]
Genevieve Shattow put together a code repository for fetching data from the 1000 Genomes project, while working with researchers at the University of Edinburgh developing dispel4py.
Theano Stavrinos learned about software defined networking for data transfers at UvA in Amsterdam. Here is an abstract of her work, “Optimization of Data Transfers within Software-defined Networks,” in preparation:
In this paper we look into different performance aspects of data transfers within single-domain software-defined networks. We study how to derive optimal data transfer paths for such networks while simultaneously taking into account different QoS-related performance criteria. We present the model of the given network using an underlying topology graph, and discuss different approaches to solve the problem in which flows are assigned to paths (and not links) while different constraints are satisfied. We first define a multi-constrained path selection/multi-commodity flow algorithm, which may be used by a SDN controller to accommodate traffic requirements known in advance. As the measurements performed indicate that traffic may be characterized as heavy-tailed (“mice” and “elephants”), we derive heuristics that identify short-lived flows from throughput-bound ones. Finally, we discuss the size-dependent flow discrimination solution in which the separation of small and large flows across the same path is used.
2014 PIRE fellow presentations
Rosa Filguiera, Iraklis Klampanos, Amrey Krause, Mario David, Alexander Moreno, and Malcolm Atkinson. 2014. dispel4py: a Python framework for data-intensive scientific computing. In Proceedings of the 2014 International Workshop on Data Intensive Scalable Computing Systems (DISCS ‘14). IEEE Press, Piscataway, NJ, USA, 9-16. (doi:10.1109/DISCS.2014.12)
Alexander Moreno and Tucker Balch, “Speeding up large-scale financial recomputation with memoization”, In Proceedings of the 7th Workshop on High Performance Computational Finance, WHPCF 2014, pages 17–22, Piscataway, NJ, USA, 2014. IEEE Press. (doi:10.1109/WHPCF.2014.9)
2013 PIRE fellow presentations
Eric Griffis, Paul Martin, James Cheney, “Semantics and Provenance for Processing Element Composition in Dispel Workflows”, Accepted Workshop Paper for WORKS 2013, 8th Workshop On Workflows in Support of Large-Scale Science, Nov. 17, 2013, Denver, Colorado
Ana-Maria Oprescu, Paola Grosso, Pedro Bello-Maldonado, Yuri Demchenko, Cees de Laat, “BigDataBus: Towards a Big Data Aggregation and Exchange Platform for eScience”, Proceedings of the Cracow ‘13 Grid Workshop, November 4-6, 2013, Krakow, Poland, page 97-99, Oct 2013, ISBN 978-83-61433-08-8.
Ashley Zebrowski & Shantenu Jha, “Managing Complex Infrastructure requirements when computing in the Cloud”, Submitted to IEEE CCGrid 2014, 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Chicago, Illinois
M.T. Patterson, I. Klampanos, R. Collins, N. Cross, R. Mann, M. Holliman, and M. Atkinson, “Scalability Performance Assessment of Large Astronomical Databases: A comparison of row-oriented vs column-oriented databases for the Vista Variables in the Via Lactea (VVV) Survey”, Submitted, Supercomputing 2013, Denver, CO
Noah Dunkan, “Anomalous sounds in real-time audio streams”, Submitted to Interspeech 2014, 15th Annual Conference of the International Speech Communication Association, 14-18 September 2014, MAX Atria @ Singapore EXPO
2012 PIRE fellow presentations
- Sandra Gesing, Malcolm Atkinson, Iraklis Klampanos, Michelle Galea, Michael R. Berthold, Roberto Barbera, Diego Scardaci, Gabor Terstyanszky, Tamas Kiss, and Peter Kacsuk. 2013. “The demand for consistent web-based workflow editors”, In Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science (WORKS ‘13). ACM, New York, NY, USA, 112-123. DOI=10.1145/2534248.2534260
2011 PIRE fellow presentations
- José Ricardo da Silva Juniora, Esteban W. Gonzalez Clua, Anselmo Montenegroa, Marcos Lagea, Marcelo de Andrade Dreuxb, Mark Josellia, Paulo A. Pagliosac & Christine Lucille Kuryla, “A heterogeneous system based on GPU and multi-core CPU for real-time fluid and rigid body simulation”, International Journal of Computational Fluid Dynamics, Volume 26, Issue 3, 2012, pages 193-204, 18 May 2012, DOI:10.1080/10618562.2012.683789
Workshop challenge presentations
Starting in 2013, we initiated a student workshop “challenge” in which PIRE fellows work in teams on small projects to kickstart scientific collaboration and innovation around big datasets and data intensive technologies. The challenge is announced during the international workshop, and the students present their work to all on the last day.
2015 Workshop student challenge
|Steven Rapp, Melissa Bica, Theano Stavrinos, Race Clark||gui4dispel4py: A dispel4py graphical user interface on the OSDC||(slides) (paper) (github)||264||4|
|Nam Pho, Josh Miller||A Reproducible and Automated Deployment of an HPC Application on a Private Cloud||(slides) (paper) (github)||269||3|
|Ryan Mork, Genevieve Shattow||Lasso: A meta search process to find collaborators||(slides) (paper) (github) (website)||277||2|
|Grace Lu, Shelby Matlock, Jennifer Piscionere||Daisy: Data Made Easy||(slides) (paper) (website)||281||1|
2014 Workshop student challenge
|Race Clark, Chris Natoli, Jill Hardy, William Matthews||Using OSDC to Advance Public Understanding of Temperature Extremes||(slides) (paper) (github)||32||2|
|Nelson Auner, Cody Buntain||Mayfly - Rapid, Accessible, Reproducible Research||(slides) (paper) (github)||38||1|
|Alexander Moreno, Keval Shah, Yuan Zhao||Automatic Variable Detection and Formatting for Cross-disciplinary Data Set Compatibility||(slides) (paper) (github)||32||2|
|Nathiel Butler, Michael Lewis, Weiwei Zhang||Tourist Buddy||(slides) (paper)||34||2|
|Eric Griffis, Josh Eisenberg||Client-side plug-ins for Tukey||(slides) (paper)||31||3|
2013 Workshop student challenge
|Zac Flamig, Warren Cole, and Rafael Suarez||Examining Vegetation Recovery Time after a Small Scale Disaster using MODIS Data and the OSDC||(Slides)||39||4|
|Pedro D Bello Maldonado and Matthew Greenway||Integrating the UvA Data Service with the OSDC||(Slides) (Paper)||28||9|
|Joshua Miller, Spencer Claxton, Alice Mukora, and Eric Griffis (First Place)||Stratosphere: A multilevel data-driven social-network for cloud computing||(Paper)||42.5||1|
|Joseph Korpela and Matt Greenway||Augmenting OSDC’s Datascope with a Finderscope||(Paper)|
|Warren Cole and Sandra Gesing||Retroviral Links to Cancer||(Slides) (Paper)||32||6|
|Maria T. Patterson, Joshua D. Eisenburg, and Rafael Suarez (Third Place)||The addition of solar activity (space weather) data to the Open Science Data Cloud in order to facilitate cross-disciplinary studies||(Slides) (Paper)||39.5||3|
|Joshua Eisenberg and Maria Patterson||Cloud Query||(Slides) (Paper)||30||8|
|Michael Lewis and Matthew Greenway||Extending OSDC toolset for cross disciplinary discoveries||(Slides)||30.5||7|
|Christine Harvey and Rafael Suarez (Second Place)||Organ Procurement and Transplantation Network (OPTN) Database on the OSDC||(Slides) (Paper)||40||2|