Keynotes | BDCAT 2020

New Horizons in IoT Workflows Provisioning in Edge and Cloud Datacentres for Fast Data Analytics: The Osmotic Computing Approach

Rajiv Ranjan

Newcastle University, UK

Abstract: Supporting Internet of Things (IoT) workflow enactment/execution on a combination of computational resources at the network edge and at a datacentre remains a challenge. Increasing volumes of data being generated through smart phones, IoT (Internet of Things) devices (which can vary significantly in scope and capability), need to be processed in a timely manner. Current practice involves using edge nodes (e.g. sensors or other low-capacity devices) as a means to acquire/collect data (i.e. as an "observation" mechanism). Subsequently, this data is transmitted to a datacentre/ cloud for analysis/ insight. Increasingly, the limitation with the use of a large-scale, centralised datacentre is being realised (such as speed of response for latency-sensitive applications), with the emergence of a number of paradigms to address this concern -- such as fog computing, edge computing, Cloud-of-Things, etc. All of these propose the use of dedicated servers (with varying capacity and capability) within micro/nano datacentres at the network edge, to overcome latency constraints associated with moving data to a central facility, and (lack of use of) increasing computational capability within edge devices.
Osmotic Computing is a paradigm to 'holistically plan and provision' time and network latencysensitive IoT applications across cloud and edge, depending on type and scope of data analytics activity, the underlying programming model, location and configuration of available edge and cloud resources, run-time uncertainties and IoT application-specific performance objectives. Osmotic Computing represents a paradigm shift, as it extends he realm of Cloud to the network edge to enable enactment and provisioning of IoT applications over the combined infrastructure. The keynote talk will have the following outline:
1. Overview of the research challenges involved with composing and orchestrating complex IoT workflows in Osmotic Computing (cloud-edge continuum) infrastructure.
2. Discuss two case studies in smart cities domain to understand how Osmotic Computing can be applied to create/compose next-generation IoT applications.
3. Discuss our experience with running United Kingdomâs largest IoT infrastructure, namely, the Urban Observatory (http://www.urbanobservatory.ac.uk)

Biography: Professor Rajiv Ranjan is an Australian-British computer scientist, of Indian origin, known for his research in Distributed Systems (Cloud Computing, Big Data, and the Internet of Things). He is University Chair Professor for the Internet of Things research in the School of Computing of Newcastle University, United Kingdom. He is an internationally established scientist in the area of Distributed Systems (having published about 300 scientific papers). He has secured more than $24 Million AUD (Â£12 Million+ GBP) in the form of competitive research grants from both public and private agencies. He is an innovator with strong and sustained academic and industrial impact and a globally recognized R&D leader with the proven track record. He serves on the editorial boards of top quality international journals including IEEE Transactions on Computers (2014-2016), IEEE Transactions on Cloud Computing, ACM Transactions on the Internet of Things, The Computer (Oxford University), The Computing (Springer) and Future Generation Computer Systems. He led the Blue Skies section (department, 2014-2019) of IEEE Cloud Computing, where his principal role was to identify and write about most important, cutting-edge research issues at the intersection of multiple, interdependent research disciplines within distributed systems research area including Internet of Things, Big Data Analytics, Cloud Computing, and Edge Computing. He is one of the highly cited authors in computer science and software engineering worldwide (h-index=54, gindex=180, and 18000+ google scholar citations; h-index=42 and 10000+ scopus citations; and h-index=35 and 6700+ Web of Science citations).

AI and Science Workflow Automation

Ewa Deelman

University of Southern California Information Sciences Institute, USA

Abstract:AI is making inroads into science domains, changing the way people analyze data, make predictions, and new discoveries. This talk will examine the science workflow from the research question formulation, to through hypothesis generation and experimentation to the generation of findings and publication. As in other areas of our lives, automation is increasing scientific productivity and is enabling researchers to analyze vast amounts of data (from remote sensors, instruments, etc.) and to conduct large-scale simulations of underlying physical phenomena. These applications comprise thousands of computational tasks and process large, heterogeneous datasets, which are often distributed across the globe.
The talk will explore where AI-driven automation is impacting that workflow today and how it may impact it in the future. The talk will particularly focus on the computational experimentation and data analysis, where today scientific workflow management systems are often used. Computational workflows have emerged as a flexible representation to declaratively express the complexity of such applications with data and control dependencies. Automation technologies have enabled the execution of these workflows in an efficient and robust fashion. Up to now automation was based on a variety of algorithms and heuristics that transformed the workflows to optimize their performance and improve their fault tolerance. However, with the recent increased use of AI for automation, new solutions for workflow management systems can be explored. This talk describes some of the unsolved problems in workflow management and considers potential application of AI to address these challenges.

Biography: Ewa Deelman received her PhD in Computer Science from the Rensselaer Polytechnic Institute in 1998. Following a postdoc at the UCLA Computer Science Department she joined the University of Southern Californiaâs Information Sciences Institute (ISI) in 2000, where she is serving as a Research Director and is leading the Science Automation Technologies group. She is also a Research Professor at the USC Computer Science Department and an AAAS and IEEE Fellow.
The USC/ISI Science Automation Technologies group explores the interplay between automation and the management of scientific workflows that include resource provisioning and data management. Dr. Deelman pioneered workflow planning for computations executing in distributed environments. Her group has lead the design and development of the Pegasus Workflow Management software and conducts research in job scheduling and resource provisioning in distributed systems, workflow performance modeling, provenance capture, and the use of cloud platforms for science.
Dr. Deelman is the founder of the annual Workshop on Workflows in Support of Large-Scale Science (WORKS), which is held in conjunction with the SC conference. In 2015, Dr. Deelman received the HPDC Achievement Award for her contributions to the area of scientific workflows.

Integrating Big Data, Data Science and Cyber Security with
Applications in Internet of Transportation and Infrastructures

Bhavani Thuraisingham

The University of Texas at Dallas

Abstract:The collection, storage, manipulation, analysis and retention of massive amounts of data have resulted in new technologies including big data analytics and data science. It is now possible to analyze massive amounts of data and extract useful nuggets. However, the collection and manipulation of this data has also resulted in serious security and privacy considerations. Various regulations are being proposed to handle big data so that the privacy of the individuals is not violated. Furthermore, the massive amounts of data being stored may also be vulnerable to cyber attacks.
Big Data, Data Science and Security are being integrated to solve many of the security and privacy challenges. For example, machine learning techniques are being applied to solve security problems such as malware analysis and insider threat detection. However, there is also a major concern that the machine learning techniques themselves could be attacked. Therefore, the machine learning techniques are being adapted to handle adversarial attacks. This area is known as adversarial machine learning. In addition, privacy of the individuals is also being violated through these machine learning techniques as it is now possible to gather and analyze vast amounts of data and therefore privacy enhanced data science techniques are being developed.
With the advent of the web, computing systems are now being used in every aspect of our lives from mobile phones to autonomous vehicles. It is now possible to collect, store, manage, and analyze vast amounts of sensor data emanating from numerous devices and sensors including from various transportation systems. Such systems collectively are known as the Internet of Transportation, which is essentially the Internet of Things for Transportation, where multiple autonomous transportation systems are connected through the web and coordinate their activities. However, security and privacy for the Internet of Transportation and the infrastructures that support it is a challenge. Due to the large volumes of heterogeneous data being collected from numerous devices, the traditional cyber security techniques such as encryption are not efficient to secure the Internet of Transportation. Some Physics-based solutions being developed are showing promise. More recently, the developments in Data Science are also being examined for securing the Internet of Transportation and its supporting infrastructures.
To assess the developments on the integration of Big Data, Data Science and Security over the past decade and apply them to the Internet of Transportation, the presentation will focus on three aspects. First it will examine the developments on applying Data Science techniques for detecting cyber security problems such as insider threat detection as well as the advances in adversarial machine learning. Some developments on privacy aware and policy-based data management frameworks will also be discussed. Second it will discuss the developments on securing the Internet of Transportation and its supporting infrastructures and examine the privacy implications. Finally, it will describe ways in which Big Data, Data Science and Security could be incorporated into the Internet of Transportation and Infrastructures.

Biography: Dr. Bhavani Thuraisingham is the Founders Chair Professor of Computer Science and the Executive Director of the Cyber Security Research and Education Institute at the University of Texas at Dallas (UTD). She is also a visiting Senior Research Fellow at Kings College, University of London and an elected Fellow of the ACM, IEEE, the AAAS, the NAI and the BCS. Her research interests are on integrating cyber security and artificial intelligence/data science for the past 35 years (where it used to be computer security and data management/mining). She has received several awards including the IEEE CS 1997 Technical Achievement Award, ACM SIGSAC 2010 Outstanding Contributions Award, the IEEE Comsoc Communications and Information Security 2019 Technical Recognition Award, the IEEE CS Services Computing 2017 Research Innovation Award, the ACM CODASPY 2017 Lasting Research Award, the IEEE ISI 2010 Research Leadership Award, the 2017 Dallas Business Journal Women in Technology Award, and the ACM SACMAT 10 Year Test of Time Awards for 2018 and 2019 (for papers published in 2008 and 2009). She co-chaired the Women in Cyber Security Conference (WiCyS) in 2016 and delivered the featured address at the 2018 Women in Data Science (WiDS) at Stanford University serves as the Co-Director of both the Women in Cyber Security and Women in Data Science Centers at UTD. Her 40-year career includes industry (Honeywell), federal research laboratory (MITRE), US government (NSF) and US Academia. Her work has resulted in 130+ journal articles, 300+ conference papers, 150+ keynote and featured addresses, six US patents, fifteen books as well as technology transfer of the research to commercial products and operational systems. She received her PhD from the University of Wales, Swansea, UK, and the prestigious earned higher doctorate (D. Eng) from the University of Bristol, UK.

Big Data, Internet of Things, and AI âThree Sides of the Same Coin?

Samee U. Khan

Mississippi State University & National Science Foundation, USA

Abstract: All around us we see decisions being undertaken courtesy of data generated by myriad of devices (or things) connected to the Internet. These devices, come in various shapes, sizes, forms, and capabilities to form the Internet of Things. These devices are producing data at almost an alarming rate â rich, complex, and correlated data; but to what end? This is the main question we will attempt to address in this talk, by revisiting some topics of interest in big data, internet of things, and AI, which I consider tightly coupled entities.

Biography: Samee U. Khan received a PhD in 2007 from the University of Texas. Currently, he is the Department Head and the James W. Bagley Chair in Electrical & Computer Engineering at the Mississippi State University (MSU). Before arriving at MSU, he was Cluster Lead (2016- 2020) for Computer Systems Research at National Science Foundation and the Walter B. Booth Professor at North Dakota State University. His research interests include optimization, robustness, and security of computer systems. His work has appeared in over 400 publications. He is associate editor-in-chief of IT Professional, and an associate editor of IEEE Transactions on Cloud Computing, Journal of Parallel and Distributed Computing, and ACM Computing Surveys.

Distributed Network Big Data Processing for Knowledge Discovery

Geyong Min

University of Exeter, U.K.

Abstract: With the ever-increasing migration of business services to the Cloud, the past years have witnessed an explosive growth in the volume of network data driven by the popularization of smart mobile devices and pervasive content-rich multimedia applications, creating a critical issue of Internet traffic flooding. How to handle the ever-increasing network traffic has become a pressing challenge. This talk will present innovative big data modelling and processing technologies as well as a distributed data processing platform developed to support data acquisition from different domains and achieve effective representation and efficient analysis of heterogeneous big data. This open big data processing platform has the potential to discover valuable insights and knowledge hidden in rich network big data for improving the design, operation, management, and intelligence of Cloud computing systems and future Internet.

Biography: Professor Geyong Min is a Chair in High Performance Computing and Networking. His research interests include Computer Networks, Cloud and Edge Computing, Mobile and Ubiquitous Computing, Systems Modelling and Performance Engineering. His recent research has been supported by European Horizon-2020, UK EPSRC, Royal Society, Royal Academy of Engineering, and industrial partners. He has published more than 200 research papers in leading international journals including IEEE/ACM Transactions on Networking, IEEE Journal on Selected Areas in Communications, IEEE Transactions on Computers, IEEE Transactions on Parallel and Distributed Systems, and IEEE Transactions on Wireless Communications, and at reputable international conferences, such as SIGCOMM-IMC, INFOCOM, and ICDCS. He is an Associated Editor of several international journals, e.g., IEEE Transactions on Computers, and IEEE Transactions on Cloud Computing. He served as the General Chair or Program Chair of a number of international conferences in the area of Information and Communications Technologies.

Cloud Computing for Sprinting Peak Services

Minyi Guo

Shanghai Jiao Tong University, China

Abstract: Many internet applications have the characteristics of âsprinting peak loadâ, that is the requests could be significantly increased thousand times in adjacent time unit. For example, Wechat red packet on New Yearâs Eve, and Alibaba âDouble Elevenâ shopping carnival of ecommence platforms are such kind of applications. To support these internet services, the traditional cloud systems could not satisfy the requirements due to lack of many efficient special means. In this talk, aim at such applications, the principal faultiness is designated for traditional cloud systems first. Then we try to improve in request latency, storage throughout capacity, container expansion speed, and fault-tolerance, to satisfy sprinting peak load service requirements. The system we developed has been applied in many real sprinting peak load scenarios.

Biography: Minyi Guo received the BSc and ME degrees in computer science from Nanjing University, China; and the PhD degree in computer science from the University of Tsukuba, Japan. He is currently a Chair professor of Shanghai Jiao Tong University (SJTU), China. Before joined SJTU, Dr. Guo had been a professor of the school of computer science and engineering, University of Aizu, Japan. Dr. Guo received the national science fund for distinguished young scholars from NSFC in 2007. His present research interests include parallel/distributed computing, compiler optimizations, big data and cloud computing. He has more than 400 publications in major journals and international conferences, and published 5 books in these areas. He received 7 best/highlight paper awards from international conferences including ALSPOS 2017 and ICCD 2018. He is now on the editorial board of IEEE Transactions on Parallel and Distributed Systems, IEEE Transactions on Cloud Computing and Journal of Parallel and Distributed Computing. Dr. Guo is a fellow of IEEE, a distinguished member of ACM, and a member of The Academy of Europe.

Blockchain and Confidential Computing

Gilles Fedak

IExec

Abstract: Confidential Computing refers to isolating hardware and software solutions to isolate user data within his computer and prevent access from other cloud servers or applications. It has become critical as most companies are moving from a computing environment to a public cloud. Thus, new technologies are needed to ensure usersâ privacy in cloud networks. As companies shift their computing to the public cloud, it becomes necessary to upgrade security features that ensure privacy and data security are not compromised on the cloud network. Top technology companies, both in the cloud and computer domains, have joined hands to pool their resources to meet these challenges. iExec is a decentralized blockchain-based cloud-computing marketplace that connects cloud resource sellers with cloud-resource buyers. Thus, it provides scalability, security, and ease of access to cloud computing resources that companies need. It relies on Ethereum smart contracts to create a virtual cloud infrastructure for high-performance computing services.

Biography: Gilles Fedak is co-founder and CEO of IExec and permanent INRIA research scientist at ENS-Lyon, France. After receiving his Ph.D degree from University Paris Sud in 2003, he followed a postdoctoral fellowship at University California San Diego. He produced pioneering software and algorithms in the field of Grid and Cloud Computing that allow people to easily harness large parallel systems consisting of thousands of machines distributed on the Internet (XtremWeb, MPICH-V, BitDew, SpeQulos, Xtrem-MapReduce, Active Data).

Enabling Personalized Medicine with Data Science
- Sharing Experiences from Large Pharma Research Organization

Asif Jan

Roche, Switzerland

Abstract: Given the enormous increase in healthcare data volumes, our ability to effectively share, integrate and analyze is critical to advancing our understanding of the disease and bringing affordable and efficacious treatments to patients. Due to the breadth and depth of the healthcare data across various modalities such as clinical, genomics, imaging and digital sensors, we need to move beyond traditional methods and bring advanced ML/AI implementations to maximally benefit from the richness of the collected data. As part of the drug development life-cycle vast amounts of clinical trials data are collected in order to identify targets of interest, discover biomarkers to stratify patients who could benefit from the drug, and to study the safety and benefit profile of the drug. Furthermore, after the drug is brought to the market its use in a broader population is collected in a wide range of real-world data sources including, but not limited to, electronic medical records, disease registries, health insurance claims, and digital devices. In my talk, I will share with you the opportunities and challenges of deploying data science within the Pharma industry to leverage wealth of medical data generated in clinical trials and real-world clinical environment.

Biography: Dr. Asif Jan is a Group Director in Personalized Healthcare (PHC) Data Science at Roche, Switzerland, where he leads a multi-disciplinary team of scientists specializing in computer science, neuroscience, and statistics. The team implements a variety of statistical and machine learning methods on real-world datasets (e.g. Electronic Medical Records, Health Insurance Claims, Disease Registries) to fulfill evidence and data analysis needs of the Neuroscience disease area at Roche. Previously, he was Head of Data Science at Roche Diagnostics leading a team of quantitative scientists supporting In-vitro Diagnostics (IVD) and clinical decision support (CDS) product development, and defined data strategy enabling use of real-world data in Roche Diagnostics. Earlier he has had a number of roles overseeing technology strategy development, enterprise and solution architecture and program management at Roche and in other research organizations. Asif has vast experience of building and leading data science teams in Pharma, Diagnostics, and industrial research institutes, tackling complex scientific and business problems. He strongly believes that the intersection of strategy, technology and data science is needed to fight diseases thereby enabling better care for everyone.
Asif holds a PhD in Neuroscience (informatics) from Swiss Federal Institute of Technology (EPFL) Lausanne, Switzerland, MPhil in Computer Science from the University of West of England (UWE), Bristol, United Kingdom.