It has become something of a cliché for IT environments to claim that they offer a hybrid cloud environment when in most cases they are actually providing parallel environments with little functional crossover between local and elastic resources. Without a shared “namespace” users must duplicate data and modify applications should they wish to compute in the cloud. Of course, this might be understandable in the presence of sensitive data or intellectual property but those exceptions aside organizations should work towards a seamless access environment that doesn’t require end users to move data and alter workflows simply to run remotely. While Virtual Private Clouds can help local IT shops carve out cloud resources that appear to be extensions of the local network not much has been accomplished if it requires a lengthy data upload over a network choked with general campus traffic. (Lack of a dedicated research network is a big problem but that’s another article). It is always easier to bring the computation to the data than vice versa especially when it relates to genomic and biomedical computing.
What Are Your Use Cases?
One of the first things a classic IT support person will ask a user is “What is your use case?” which, in a typical IT shop, is a reasonable question since the workloads are usually static, access patterns are predictable, and storage performance is rarely an issue. Thus, a suitable architecture can easily be defined that also doesn’t require an adjustment in existing governance policies. (Music to an IT manager’s ears!). However, it is largely lost on the classic IT analyst that in research computing the determination of an appropriate environment can be in and of itself an ongoing research project involving frequent experimentation prior to the identification of the ideal setup. And even that might require subsequent change. This is why research computation deserves its own leadership who understand that “spikey” and volatile workloads, in addition to obnoxiously large data sources, are de rigueur in research computation. Unfortunately, this is a persistent problem and is also a reason that many investigators choose to access cloud resources independently of any institutional path because they have a direct route to an elastic resource without having to jump through local policy hoops that were probably designed for a generic web server or departmental database as opposed to a dynamic, high throughput computational resource.
Overcoming the Past
All of this said it is not trivial to overlay an existing heterogeneous environment with a common namespace. Outside of the technical realm it takes careful planning and a willingness to adjust policies to accommodate wider variation workloads. At a technical level, it involves the reconciliation of naming conventions, filesystem parameters, and any APIs that might be in use. This is in addition to linkage to virtual machines and instances (both local and remote). The good news is that there are software stacks to help address these issues while providing clean management tools and interfaces for intelligent provisioning and allocation of resources. It’s been my experience that the various technical solutions can be applied as long as up-front identification of key resources is carefully accomplished. This can sometimes be a primary obstacle to adoption of a true hybrid cloud as it involves looking at longstanding policies and procedures that were probably developed for an earlier time when resources were static. It also forces IT shops to examine their inventory and legacy services. But this is going to be essential anyway as organizations moved towards the cloud.
In wrapping this up, computational researchers have the ability to go direct to the cloud to accomplish their work but this assumes they have the requisite knowledge to do so. Ideally there would be institutional resources available to help make the management and analysis of their data easier and within the policies of the associated institution. Moreover, accessing data and writing code should not involve changes on the part of the investigator which is why a true hybrid setup is necessary to animate campus research computation. Need help with this ? Let me know.