Themes

The four themes of the Polar RCN are designed to yield technical results while also promoting the facilitation, coordination, and exchange of information and skills, among an increasingly cognizant, communicative, and collaborative polar-HPDC community. Embedded within all of these themes is the unifying mission to understand and articulate the HPDC needs of the polar science community.

Current and Future Challenges in Polar Science Using High-Resolution Imagery

One of the major current challenges with polar cyberinfrastructure is managing and fully exploiting the volume of high-resolution commercial imagery now being collected over the polar regions. The data volume, bandwidth, and computing-intensive processing routines needed to geo-rectify and stitch together high-resolution images, currently manual work duplicated in research labs across the globe, creates a substantial demand for computing resources. Meeting this demand will require efficient connections between data centers and HPDC resources, and subsequent science use of HPDC resources.

The Polar RCN will provide critical links between the HPDC community and the polar remote sensing community to ensure that the potential of polar imagery is realized. The computational resources, for instance super-, cloud, and cluster computing, required to expedite processing of polar imagery exist, but the polar remote sensing community is largely unaware of these resources and lacks the tools to fully utilize them. Many organizations (ex. PGC, XSEDE, etc.) provide considerable resources that can contribute to this goal, but each alone cannot provide the training or networking (among PIs, or between senior researchers and students/post-docs) required to ensure sufficient computing awareness, competency, interoperability, and tools across the bulk of its user base. The Polar RCN aims to integrate and coordinate across many of these existing activities to better serve the polar science community.

Build and Strengthen Partnerships between Polar and Cyber/Computer Scientists

The Polar RCN aims to be a preeminent forum for the creation of partnerships amongst HPDC, polar science, and data management experts. The Steering Committee and PI team, both with a balanced mix of representatives from each discipline, will work together to facilitate capture of the emerging cyberinfrastructure needs of the polar science community. This activity can only be performed through the close collaboration of key stakeholders in polar, data, and HPDC communities, enabled through the Polar RCN activities.

Effective collaboration among the HPDC, data and polar communities does not currently exist, but is crucial to successfully address problems in the cross-discipline data intensive scenarios, for example, the huge availability of raw image data yet to be transformed into usable continent-scale products. On the other hand, new implementations of data intensive applications have not yet been adapted for HPDC environments, mainly because of limited communication. Collaboration is needed between polar scientists and those with expertise in parallel and distributed implementations of data intensive applications, the latter being key to HPDC applications. The Polar RCN will organize Community Workshops and Hackathons, with the participation of polar, HPDC and data researchers emphasizing EarthCube RCN for High-Performance Distributed Computing in the Polar Sciences. These activities, integrated and coordinated by the Steering Committee, will build relationships and lay the socio-technical groundwork for enhanced, HPDC-enabled polar science.

Education and Training

Among the numerous barriers to effective development and use of HPDC in polar sciences is a lack of relevant training for polar scientists. In scientific domains with heavy integration of HPDC resources, faculty and students work in teams with computing professionals to take advantage of, and contribute to, emerging capabilities in HPDC, data and software. Two training levels are important. The first is training in using basic HPDC environments and techniques, such as understanding the uses of, and the methods for accessing and “logging in” to national and local supercomputing systems, writing simple job submission and control scripts, managing files, using data analysis tools, and loading appropriate modules.

The second level is one of participatory and bi-directional learning, creating an environment for ongoing collaborations through a multidisciplinary computational community. A diverse community where distributed teams can conduct interdisciplinary research is different than the environment often found in a single academic department. This type of collaborative science requires a fundamental change in how we train the future generation of researchers. While the RCN cannot create the new programs of study needed to provide both the depth of polar science domain knowledge, as well as the computer science and applied mathematics to provide a sufficient breadth in applying HPDC, such potential programs will be discussed as part of this RCN roadmap development exercise. These discussions would align with, and build on, similar conversations at the March 2015 Intelligent Systems for Geosciences (IS-Geo) Workshop held in Washington DC.

In addition to strategic technical development and broad consultation with stakeholders, educating the next generation of polar scientists in the opportunities and potential of HPDC resources is key to eventual uptake of HPDC in the polar science. To this end the RCN will facilitate awareness of, and participation in, existing HPDC training opportunities. While these education activities all aim to reduce barriers, the RCN offers a direct partnership and collaborative venue for education and training. We will take advantage of this opportunity to share understanding of a minimal set of skills. These skills include core an elementary understanding of computer science concepts such as shared computing environments, simple scripting (e.g., python), data management, and visualization techniques. Given that these are necessary (but not sufficient) skills, the RCN will ensure that connections to adequate training in these areas are provided through a specialized Workshop.

Data Management and Cyberinfrastructure

In an ideal case, any scientist anywhere in the world could go to any data center interface and get on-the-fly processed high-resolution images using HPDC resources, or interact with an HPDC resource and point to any dataset for inclusion in a model. Moving in that direction, we see data management challenges and opportunities as intertwined with all of the RCN’s themes. The emerging Polar-HPDC conversation is supported in part by advances in open data sharing, web accessible repository platforms, standardization of metadata, and interoperable systems. The open data sharing progress from the International Polar Year and advances in technology set the stage for web-accessible data resources. That said, the idealized case described above involves more than just web-accessible data. Data management issues are critical to future Polar HPDC advances.

International, interdisciplinary work in data sharing, discovery, transfer, handling, processing, representation, comparability, and standardization will impact, and will be impacted by, HPDC uptake in the polar science community. Integrating polar data management perspectives and expertise through the RCN activities will ensure science-driven alignment and coordination in solutions. For example, workflows for HPDC research would benefit from data provenance stretching back through initial data capture, description, processing, and storage. Data centers and repositories capturing and sharing this lineage can support completeness in workflow documentation. Also, repository understanding of HPDC system needs and connections will support ongoing data infrastructure design and development. In the opposite direction, maintaining proper citation through HPDC-based data reuse will be important for data reuse metrics analyses.