Filter by type:

Sort by year:

Rashnu: Data-Dependent Order-Fairness

Mohammad Javad Amiri, Heena Nagda, Shubhendra Pal Singhal, Boon Thau Loo
In Submission

Abstract. Distributed data management systems use state Machine Replication (SMR) to provide fault tolerance. The SMR algorithm enables Byzantine fault-tolerant (BFT) protocols to guarantee safety and liveness despite the malicious failure of nodes. However, SMR does not prevent the adversarial manipulation of the order of the transactions, where the order assigned by a malicious leader differs from the order in that transactions are received from clients. While orderfairness has been recently studied in a few protocols, such protocols rely on synchronized clocks, suffer from liveness issues, or incur significant performance overhead. This paper presents Rashnu, a high-performance fair ordering protocol. Rashnu is motivated by the fact that fair ordering among two transactions is needed when both transactions access a shared resource. Based on this observation, we define the notion of data-dependent order fairness where replicas capture only the order of data-dependent transactions and the leader uses these orders to propose a dependency graph that represents fair ordering among transactions. Replicas then execute transactions using the dependency graph resulting in the parallel execution of independent transactions. We implemented a prototype of Rashnu on top of HotStuff, where our experimental evaluation reveals the efficiency of Rashnu compared to the state-of-the-art order-fairness protocol and its low overhead compared to HotStuff.

The Bedrock of Byzantine Fault Tolerance: A Unified Platform for BFT Protocol Design and Implementation

Mohammad Javad Amiri, Chenyuan Wu, Divyakant Agrawal, Amr El Abbadi, Boon Thau Loo, Mohammad Sadoghi
21st USENIX Symposium on Networked Systems Design and Implementation (NSDI), Santa Clara, CA, 2024.

Abstract. Byzantine Fault-Tolerant (BFT) protocols have recently been extensively used by decentralized data management systems with non-trustworthy infrastructures, e.g., permissioned blockchains. BFT protocols cover a broad spectrum of design dimensions from infrastructure settings such as the communication topology, to more technical features such as commitment strategy and even fundamental social choice properties like order-fairness. The proliferation of different BFT protocols has rendered it difficult to navigate the BFT landscape, let alone determine the protocol that best meets application needs. This paper presents Bedrock, a unified platform for BFT protocols design, analysis, implementation, and experiments. Bedrock proposes a design space consisting of a set of design choices capturing the trade-offs between different design space dimensions and providing fundamentally new insights into the strengths and weaknesses of BFT protocols. Bedrock enables users to analyze and experiment with BFT protocols within the space of plausible choices, evolve current protocols to design new ones, and even uncover previously unknown protocols. Our experimental results demonstrate the capability of Bedrock to uniformly evaluate BFT protocols in new ways that were not possible before due to the diverse assumptions made by these protocols. The results validate Bedrock's ability to analyze and derive BFT protocols.

AdaChain: A Learned Adaptive Blockchain

Chenyuan Wu, Bhavana Mehta, Mohammad Javad Amiri, Ryan Marcus, Boon Thau Loo
The 49th International Conference on Very Large Data Bases (VLDB), Vancouver, Canada, 2023
DOI: 10.14778/3594512.3594531

Abstract. This paper presents AdaChain, a learning-based blockchain framework that adaptively chooses the best permissioned blockchain architecture in order to optimize effective throughput for dynamic transaction workloads. AdaChain addresses the challenge in the Blockchain-as-a-Service (BaaS) environments, where a large variety of possible smart contracts are deployed with different workload characteristics. AdaChain supports automatically adapting to an underlying, dynamically changing workload through the use of reinforcement learning. When a promising architecture is identified, AdaChain switches from the current architecture to the promising one at runtime in a way that respects correctness and security concerns. Experimentally, we show that AdaChain can converge quickly to optimal architectures under changing workloads, significantly outperform fixed architectures in terms of the number of successfully committed transactions, all while incurring low additional overhead.

Towards Adaptive Fault-Tolerant Sharded Databases

Bhavana Mehta, Neelesh C. A, Prashanth S. Iyer, Mohammad Javad Amiri, Boon Thau Loo, Ryan Marcus
Applied AI for Database Systems and Applications Workshop (AIDB), in conjunction with VLDB, Vancouver, 2023.

Abstract. Data fragmentation and replication schemes play an important role in making parallel and transactional databases scalable and reliable. Existing data schemes generally assume a trusted environment where a node may fail, but no node will act adversarially. Here, we present our vision for RLShard, a reinforcement learning-powered fragmentation and replication scheme for transactional databases in Byzantine environments capable of adapting to dynamic workloads. We first describe the implications of Byzantine environments on data fragmentation schemes. Then, we explore two different system architectures for RLShard: a centralized architecture that relies on a trusted administrative domain and a fully decentralized architecture that uses collaborative reinforcement learning. Based on our first-cut design, we outline open research challenges towards our vision of adaptive fault-tolerant sharded databases.

Chemistry behind Agreement

Suyash Gupta, Mohammad Javad Amiri, Mohammad Sadoghi
The 13th Conference on Innovative Data Systems Research (CIDR), Amsterdam, The Netherlands, 2023

Abstract. Agreement protocols have been extensively used by distributed data management systems to provide robustness and high availability. The broad spectrum of design dimensions, applications, and fault models have resulted in different flavors of agreement protocols. This proliferation of agreement protocols has made it hard to argue their correctness and has unintentionally created a disparity in understanding their design. To address this disparity, we study the chemistry behind agreement and present a unified framework that simplifies expressing different agreement protocols. Specifically, we extract essential elements of the agreement and define atoms that connect these elements. We illustrate how these elements can help to explain and design various agreement protocols.

FlexChain: An Elastic Disaggregated Blockchain

Chenyuan Wu, Mohammad Javad Amiri, Jared Asch, Heena Nagda, Qizhen Zhang, Boon Thau Loo
The 49th International Conference on Very Large Data Bases (VLDB), PVLDB 16(01), pp. 23-36, Vancouver, Canada, 2023
DOI: 10.14778/3342263.3342275

Abstract. While permissioned blockchains enable a family of data center applications, existing systems suffer from imbalanced loads across compute and memory, exacerbating the underutilization of cloud resources. This paper presents FlexChain, a novel permissioned blockchain system that addresses this challenge by physically disaggregating CPUs, DRAM, and storage devices to process different blockchain workloads efficiently. Disaggregation allows blockchain service providers to upgrade and expand hardware resources independently to support a wide range of smart contracts with diverse CPU and memory demands. Moreover, it ensures efficient resource utilization and hence prevents resource fragmentation in a data center. We have explored the design of XOV blockchain systems in a disaggregated fashion and developed a tiered key-value store that can elastically scale its memory and storage. Our design significantly speeds up the execution stage. We have also leveraged several techniques to parallelize the validation stage in FlexChain to further improve the overall blockchain performance. Our evaluation results show that FlexChain can provide independent compute and memory scalability, while incurring at most 12.8% disaggregation overhead. FlexChain achieves almost identical throughput as the state-of-the-art distributed approaches with significantly lower memory and CPU consumption for compute-intensive and memory-intensive workloads respectively.

Saguaro: An Edge Computing-enabled Hierarchical Permissioned Blockchain

Mohammad Javad Amiri, Ziliang Lai, Liana Patel, Boon Thau Loo, Eric Lo, Wenchao Zhou
The 39th International Conference on Data Engineering (ICDE), pp. 259-272, Anaheim, California, 2023.
DOI: 10.1109/ICDE55515.2023.00027

Abstract. We present Saguaro, a permissioned blockchain system designed specifically for edge computing networks. Saguaro leverages the hierarchical structure of edge computing networks to reduce the overhead of wide-area communication by presenting several techniques. First, Saguaro proposes coordinator-based and optimistic protocols to process cross-domain transactions with low latency where the lowest common ancestor of the involved domains coordinates the protocol or detects inconsistency. Second, data are collected over hierarchy enabling higher-level domains to aggregate their sub-domain data. Finally, transactions initiated by mobile edge devices are processed without relying on high-level fog and cloud servers. Our experimental results across a wide range of workloads demonstrate the scalability of Saguaro in supporting a range of cross-domain and mobile transactions.

Ziziphus: Scalable Data Management Across Byzantine Edge Servers

Mohammad Javad Amiri, Daniel Shu, Sujaya Maiyya, Divyakant Agrawal, Amr El Abbadi
The 39th International Conference on Data Engineering (ICDE), pp. 490-502, Anaheim, California, 2023.
DOI: 10.1109/ICDE55515.2023.00044

Abstract. Edge computing while bringing computation and data closer to users in order to improve response time, distributes edge servers in wide area networks resulting in increased communication latency between the servers. Synchronizing globally distributed edge servers, especially in the presence of Byzantine servers, becomes very costly due to the high communication complexity of Byzantine fault-tolerant consensus protocols. In this paper, we present Ziziphus, a geo-distributed system that partitions edge servers into fault-tolerant zones where each zone processes transactions initiated by nearby clients locally. Global synchronization among zones is required only in special situations, e.g., migration of clients from one zone to another. On one hand, the two-level architecture of Ziziphus confines the malicious behavior of nodes within zones requiring a much cheaper protocol at the top level for global synchronization. On the other hand, Ziziphus processes local transactions within zones by edge servers closer to clients resulting in enhanced performance. Ziziphus further introduces zone clusters to enhance scalability where instead of running global synchronization among all zones, only zones of a single cluster are synchronized.

Declarative Smart Contracts

Haoxian Chen, Gerald Whitters, Mohammad Javad Amiri, Yuepeng Wang, Boon Thau Loo
The ACM Joint European Software Engineeting Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), Singapore, 2022.
DOI: 10.1145/3540250.3549121

Abstract. This paper presents Decon, a declarative programming language for implementing smart contracts and specifying contract-level properties. Driven by the observation that smart contract operations and contract-level properties can be naturally expressed as relational constraints, Decon models each smart contract as a set of relational tables that store transaction records. This relational representation of smart contracts enables convenient specification of contract properties, facilitates run-time monitoring of potential property violations, and brings clarity to contract debugging via data provenance. Specifically, a Decon program consists of a set of declarative rules and violation query rules over the relational representation, describing the smart contract implementation and contract-level properties, respectively. We have developed a tool that can compile Decon programs into executable Solidity programs, with instrumentation for run-time property monitoring. Our case studies demonstrate that Decon can implement realistic smart contracts such as ERC20 and ERC721 digital tokens. Our evaluation results reveal the marginal overhead of Decon compared to the open-source reference implementation, incurring 14% median gas overhead for execution, and another 16% median gas overhead for run-time verification.

Qanaat: A Scalable Multi-Enterprise Permissioned Blockchain System with Confidentiality Guarantees

Mohammad Javad Amiri, Boon Thau Loo, Divyakant Agrawal, Amr El Abbadi
The 48th International Conference on Very Large Data Bases (VLDB), PVLDB 15(11), pp. 2839-2852, Sydney, Australia, 2022.
DOI: 10.14778/3551793.3551835

Abstract. Today's large-scale data management systems need to address distributed applications' confidentiality and scalability requirements among a set of collaborative enterprises. This paper presents Qanaat, a scalable multi-enterprise permissioned blockchain system that guarantees the confidentiality of enterprises in collaboration workflows. Qanaat presents data collections that enable any subset of enterprises involved in a collaboration workflow to keep their collaboration private from other enterprises. A transaction ordering scheme is also presented to enforce only the necessary and sufficient constraints on transaction order to guarantee data consistency. Furthermore, Qanaat supports data consistency across collaboration workflows where an enterprise can participate in different collaboration workflows with different sets of enterprises. Finally, Qanaat presents a suite of consensus protocols to support intra-shard and cross-shard transactions within or across enterprises.

PReVer: Towards Private Regulated Verified Data

Mohammad Javad Amiri, Tristan Allard, Divyakant Agrawal, Amr El Abbadi
25th International Conference on Extending Database Technology (EDBT), pp. 454-461, Edinburgh, UK [online], 2022.
DOI: 10.48786/edbt.2022.40

Abstract. Data privacy has garnered significant attention recently due to diverse applications that store sensitive data in untrusted infrastructure. From a data management point of view, the focus has been on the privacy of stored data and the privacy of querying data at a large scale. However, databases are not solely query engines on static data, they must support updates on dynamically evolving datasets. In this paper, we lay out a vision for privacy-preserving dynamic data. In particular, we focus on dynamic data that might be stored remotely on untrusted providers. Updates arrive at a provider and are verified and incorporated into the database based on predefined constraints. Depending on the application, the content of the stored data, the content of the updates and the constraints may be private or public. We then propose PReVer, a universal framework for managing regulated dynamic data in a privacy-preserving manner. We explore a set of research challenges that PReVer needs to address in order to guarantee the privacy of data, updates, and/or constraints and address the consistent and verifiable execution of updates. This opens the space of privacy-preserving data management from the narrow perspective of private queries on static datasets to the larger space of private management of dynamic data.

Separ: Towards Regulating Future of Work Multi-Platform Crowdworking Environments with Privacy Guarantees

Mohammad Javad Amiri, Joris Duguépéroux, Tristan Allard, Divyakant Agrawal, Amr El Abbadi
The 30th Web Conference (WWW’21), pp. 1891-1903, Ljubljana, Slovenia [online], 2021.
DOI: 10.1145/3442381.3449858

Abstract. Crowdworking platforms provide the opportunity for diverse workers to execute tasks for different requesters. The popularity of the "gig" economy has given rise to independent platforms that provide competing and complementary services.Workers as well as requesters with specific tasks may need towork for or avail from the services of multiple platforms resulting in the rise of multi-platform crowdworking systems. Recently, there has been increasing interest by governmental, legal and social institutions to enforce regulations, such as minimal and maximal work hours, on crowdworking platforms. Platforms within multi-platform crowdworking systems, therefore, need to collaborate to enforce cross-platform regulations. While collaborating to enforce global regulations requires the transparent sharing of information about tasks and their participants, the privacy of all participants needs to be preserved. In this paper, we propose an overall vision exploring the regulation, privacy, and architecture dimensions for the future of work multi-platform crowdworking environments. We then present Separ, a multi-platform crowdworking system that enforces a large sub-space of practical global regulations on a set of distributed independent platforms in a privacy-preserving manner. Separ, enforces privacy using lightweight and anonymous tokens, while transparency is achieved using fault-tolerant blockchain ledgers shared among multiple platforms. The privacy guarantees of Separ against covert adversaries are formalized and thoroughly demonstrated, while the experiments reveal the efficiency of Separ in terms of performance and scalability.

SharPer: Sharding Permissioned Blockchains Over Network Clusters

Mohammad Javad Amiri, Divyakant Agrawal, Amr El Abbadi
ACM SIGMOD International Conference on Management of Data, pp. 76-88, Xi'an, Shaanxi, China [online], 2021.
DOI: 10.1145/3448016.3452807

Abstract. Scalability is one of the main roadblocks to business adoption of blockchain systems. Despite recent intensive research on using sharding techniques to enhance the scalability of blockchain systems, existing solutions do not efficiently address cross-shard transactions. In this paper, we introduce SharPer, a permissioned blockchain system that improves scalability by clustering the nodes and assigning different data shards to different clusters where each data shard is replicated on the nodes of a cluster. SharPer supports both intra-shard and cross-shard transactions and processes intrashard transactions of different clusters as well as cross-shard transactions with non-overlapping clusters simultaneously. In SharPer, the blockchain ledger is formed as a directed acyclic graph and each cluster maintains only a view of the ledger. SharPer incorporates decentralized flattened protocols to establish cross-shard consensus. Furthermore, SharPer provides deterministic safety guarantees. The experimental results reveal the efficiency of SharPer in terms of performance and scalability especially in workloads with a low percentage of cross-shard transactions (typical settings).

Permissioned Blockchains: Properties, Techniques and Applications

Mohammad Javad Amiri, Divyakant Agrawal, Amr El Abbadi
ACM SIGMOD International Conference on Management of Data, pp. 2813-2820, Xi'an, Shaanxi, China [online], 2021.
DOI: 10.1145/3448016.3457539

Abstract. The unique features of blockchains such as immutability, transparency, provenance, and authenticity have been used by many large-scale data management systems to deploy a wide range of distributed applications including supply chain management, healthcare, and crowdworking in a permissioned setting. Unlike permissionless settings, e.g., Bitcoin, where the network is public, and anyone can participate without a specific identity, a permissioned blockchain system consists of a set of known, identified nodes that might not fully trust each other. While the characteristics of permissioned blockchains are appealing to a wide range of large-scale data management systems, these systems, have to satisfy four main requirements: confidentiality, verifiability, performance, and scalability. Various approaches have been developed in industry and academia to satisfy these requirements with varying assumptions and costs. The focus of this tutorial is on presenting many of these techniques while highlighting the trade-offs among them. We demonstrate the practicality of such techniques in real-life by presenting three different applications, i.e., Supply Chain Management, Large-scale Databases, and Multi-platform Crowdworking, and show how those techniques can be utilized to meet the requirements of such applications.

Large-Scale Data Management using Permissioned Blockchains

Mohammad Javad Amiri
PHD Thesis, University of California Santa Barbara.
ProQuest ID: Amiri_ucsb_0035D_14775

Abstract. The unique features of blockchain such as transparency, provenance, and authenticity are used by many large-scale data management systems to deploy a wide range of distributed applications including supply chain management, healthcare, and crowdsourcing in a permissioned setting. Unlike permissionless settings, e.g., Bitcoin, where the network is public, and anyone can participate without a specific identity, a permissioned blockchain consists of a set of known, identified nodes that might not fully trust each other. While the characteristics of permissioned blockchains are appealing to a wide range of large-scale data management systems, these systems, have to deal with five important challenges: confidentiality, verifiability, performance, scalability, and fault tolerance. Confidentiality of data is required in many collaborative large-scale data management applications where collaboration between enterprises, e.g., cross-enterprise transactions, should be visible to all enterprises, however, the internal data of each enterprise, e.g, internal transactions, might be confidential. Besides confidentiality, in many multi-enterprise systems, e.g., crowdworking environments, participants need to verify transactions that are initiated by other enterprises to ensure some predefined global constraints on the entire system. Thus, the system needs to support verifiability while preserving the confidentiality of transactions. Verifiability will gain in importance as crowdworking applications increase in popularity, and the need for regulation will arise. Large-scale data management applications also require high performance in terms of throughput and latency. Scalability is one of the main obstacles to business adoption of blockchain systems. To support a large-scale data management application, a blockchain system should be able to scale efficiently by adding more resources to the system. Finally, large-scale data management systems must provide fault tolerance. Fault-tolerant protocols are the main building block of large-scale data management systems. However, in spite of years of intensive research, existing fault-tolerant protocols, do not adequately address hybrid environments consisting of trusted and untrusted servers which are widely used by enterprises. In this dissertation, we propose several techniques and develop different systems to address all five main challenges of large-scale data management using permissioned blockchains. We have developed systems, called CAPER, SEPAR, ParBlockchain, SharPer, and SeeMoRe to deal with the confidentiality, verifiability, performance, scalability, and fault tolerance requirements of large-scale data management respectively.

SeeMore: A Fault-Tolerant Protocol for Hybrid Cloud Environments

Mohammad Javad Amiri, Sujaya Maiyya, Divyakant Agrawal, Amr El Abbadi
The 36th International Conference on Data Engineering (ICDE), pp. 1345-1356, Dallas [online], 2020.
DOI: 10.1109/ICDE48307.2020.00120

Abstract. Large scale data management systems utilize State Machine Replication to provide fault tolerance and to enhance performance. Fault-tolerant protocols are extensively used in the distributed database infrastructure of large enterprises such as Google, Amazon, and Facebook. However, and in spite of years of intensive research, existing fault-tolerant protocols do not adequately address hybrid cloud environments consisting of private and public clouds which are widely used by enterprises. In this paper, we consider a private cloud consisting of nonmalicious nodes (crash-only failures) and a public cloud with possible malicious failures. We introduce SeeMoRe, a hybrid State Machine Replication protocol that uses the knowledge of where crash and malicious failures may occur in a public/private cloud environment to improve overall performance. SeeMoRe has three different modes which can be used depending on the private cloud load and the communication latency between the public and private clouds. SeeMoRe can dynamically transition from one mode to another. Furthermore, an extensive evaluation reveals that SeeMoRe’s performance is close to state of the art crash fault-tolerant protocols while tolerating malicious failures.

Modern Large-Scale Data Management Systems after 40 Years of Consensus

Mohammad Javad Amiri, Divyakant Agrawal, Amr El Abbadi
The 36th International Conference on Data Engineering (ICDE), pp. 1794-1797, Dallas [online], 2020.
DOI: 10.1109/ICDE48307.2020.00172

Abstract. Modern large-scale data management systems utilize consensus protocols to provide fault tolerance. Consensus protocols are extensively used in the distributed database infrastructure of large enterprises such as Google, Amazon, and Facebook as well as permissioned blockchain systems like IBM’s Hyperledger Fabric. In the last four decades, numerous consensus protocols have been proposed to cover a broad spectrum of distributed database systems. On one hand, distributed networks might be synchronous, partially synchronous, or asynchronous, and on the other hand, infrastructures might consist of crashonly nodes, Byzantine nodes or both. In addition, a consensus protocol might follow a pessimistic or optimistic strategy to process transactions. Furthermore, while traditional consensus protocols assume a priori known set of nodes, in permissionless blockchains, nodes are assumed to be unknown. Finally, consensus protocols have explored a variety of performance trade-offs between the number of phases/messages (latency), the number of required nodes, message complexity, and the activity level of participants. In this tutorial, we discuss consensus protocols that are used in modern large-scale data management systems, classify them into different categories based on their assumptions on network synchrony, failure model of nodes, etc., and elaborate on their main advantages and limitations.

Blockchains and Databases: Opportunities and Challenges for the Permissioned and the Permissionless

Divyakant Agrawal, Amr El Abbadi, Mohammad Javad Amiri, Sujaya Maiyya, Victor Zakhary
24th European Conference on Advances in Databases and Information Systems (ADBIS), LNCS 12245, pp. 3-7, Lion, 2020 [Invited paper].
DOI: 10.1007/978-3-030-54832-2_1

Abstract. Bitcoin is a successful and interesting example of a global scale peer-to-peer cryptocurrency that integrates many techniques and protocols from cryptography, distributed systems, and databases. The main underlying data structure is blockchain, a scalable fully replicated structure that is shared among all participants and guarantees a consistent view of all user transactions by all participants in the system. In a blockchain, nodes agree on their shared states across a large network of untrusted participants. Although originally devised for cryptocurrencies, recent systems exploit its many unique features such as transparency, provenance, fault tolerance, and authenticity to support a wide range of distributed applications. Bitcoin and other cryptocurrencies use permissionless blockchains. In a permissionless blockchain, the network is public, and anyone can participate without a specific identity. Many other distributed applications, such as supply chain management and healthcare, are deployed on permissioned blockchains consisting of a set of known, identified nodes that still might not fully trust each other. This paper illustrates some of the main challenges and opportunities from a database perspective in the many novel and interesting application domains of blockchains. These opportunities are illustrated using various examples from recent research in both permissionless and permissioned blockchains. Two main themes unite the various examples: (1) the important role of distribution and consensus in managing large scale systems and (2) the need to tolerate malicious failures. The advent of cloud computing and large data centers shifted large scale data management infrastructures from centralized databases to distributed systems. One of the main challenges in designing distributed systems is the need for fault-tolerance. Cloud-based systems typically assume trusted infrastructures, since data centers are owned by the enterprises managing the data, and hence the design typically only assumes and tolerates crash failures. The advent of blockchain and the underlying premise that copies of the blockchain are distributed among untrusted entities has shifted the focus of fault-tolerance from tolerating crash failures to tolerating malicious failures. These interesting and challenging settings pose great opportunities for database researchers.

Blockchain System Foundations

Mohammad Javad Amiri, Sujaya Maiyya, Victor Zakhary, Divyakant Agrawal, Amr El Abbadi
The 35th Brazilian Symposium on Databases (SBBD), Brazil [online], 2020 [Invited Tutorial].

Abstract. The uprise of Bitcoin and other peer-to-peer cryptocurrencies has opened many interesting and challenging problems in cryptography, distributed systems, and databases. The main underlying data structure is blockchain, a scalable fully replicated structure that is shared among all participants and guarantees a consistent view of all user transactions by all participants in the system. In this tutorial, we discuss the basic protocols used in blockchain, and elaborate on its main advantages and limitations. To overcome these limitations, we provide the necessary distributed systems background in managing large scale fully replicated ledgers, using Byzantine Agreement protocols to solve the consensus problem. Finally, we expound on some of the most recent proposals to design scalable and efficient blockchains in both permissionless and permissioned settings. The focus of the tutorial is on the distributed systems and database aspects of the recent innovations in blockchains.

CAPER: A Cross-Application Permissioned Blockchain

Mohammad Javad Amiri, Divyakant Agrawal, Amr El Abbadi
The 45th International Conference on Very Large Data Bases (VLDB), PVLDB 12(11), pp. 1385-1398, Los Angeles, 2019.
DOI: 10.14778/3342263.3342275

Abstract. Despite recent intensive research, existing blockchain systems do not adequately address all the characteristics of distributed applications. In particular, distributed applications collaborate with each other following service level agreements (SLAs) to provide different services. While collaboration between applications, e.g., cross-application transactions, should be visible to all applications, the internal data of each application, e.g, internal transactions, might be confidential. In this paper, we introduce CAPER, a permissioned blockchain system to support both internal and cross-application transactions of collaborating distributed applications. In CAPER, the blockchain ledger is formed as a directed acyclic graph where each application accesses and maintains only its own view of the ledger including its internal and all cross-application transactions. CAPER also introduces three consensus protocols to globally order crossapplication transactions between applications with different internal consensus protocols. The experimental results reveal the efficiency of CAPER in terms of performance and scalability.

ParBlockchain: Leveraging Transaction Parallelism in Permissioned Blockchain Systems

Mohammad Javad Amiri, Divyakant Agrawal, Amr El Abbadi
The 39th IEEE International Conference on Distributed Computing Systems (ICDCS), pp. 1337-1347, Dallas, Texas, 2019.
DOI: 10.1109/ICDCS.2019.00133

Abstract. Many existing blockchains do not adequately address all the characteristics of distributed system applications and suffer from serious architectural limitations resulting in performance and confidentiality issues. While recent permissioned blockchain systems, have tried to overcome these limitations, their focus has mainly been on workloads with no-contention, i.e., no conflicting transactions. In this paper, we introduce OXII, a new paradigm for permissioned blockchains to support distributed applications that execute concurrently. OXII is designed for workloads with (different degrees of) contention. We then present ParBlockchain, a permissioned blockchain designed specifically in the OXII paradigm. The evaluation of ParBlockchain using a series of benchmarks reveals that its performance in workloads with any degree of contention is better than the state of the art permissioned blockchain systems.

VIEW: An Incremental Approach to Verify Evolving Workflows

Mohammad Javad Amiri, Divyakant Agrawal
The 34th ACM/SIGAPP Symposium on Applied Computing (SAC), pp.85-93, Cyprus, 2019.
DOI: 10.1145/3297280.3297291

Abstract. Business processes are typically the compositions of services (activities and tasks) and play a key role in every enterprise. Business processes need to be changed to react quickly and adequately to internal and external events. Moreover, each business process is required to satisfy certain desirable properties such as soundness, consistency, or some user-defined linear temporal logic (LTL) constraints. This paper focuses on the verification of evolving processes: given a business process, a change operation, and a set of LTL constraints, check whether all execution sequences of the evolved process satisfy all the given constraints. We propose a technique to incrementally check and verify the constraints of evolving business processes. Furthermore, we develop a framework to model, evolve, and verify business processes and conduct a study to evaluate the effect of process characteristics on the performance of verification approaches. Experiments reveal several interesting factors concerning performance and scalability.

On Sharding Permissioned Blockchains

Mohammad Javad Amiri, Divyakant Agrawal, Amr El Abbadi
The Second IEEE International Conference on Blockchain, pp. 282-285, Atlanta, 2019.
DOI: 10.1109/Blockchain.2019.00044

Abstract. Permissioned Blockchain systems rely mainly on Byzantine fault-tolerant protocols to establish consensus on the order of transactions. While Byzantine fault-tolerant protocols mostly guarantee consistency (safety) in an asynchronous network using 3f+1 machines to overcome the simultaneous malicious failure of any f nodes, in many systems, e.g., blockchain systems, the number of available nodes (resources) is much more than 3f + 1. To utilize such extra resources, in this paper we introduce a model that leverages transaction parallelism by partitioning the nodes into clusters (partitions) and processing independent transactions on different partitions simultaneously. The model also shards the blockchain ledger, assigns different shards of the blockchain ledger to different clusters, and includes both intra-shard and cross-shard transactions. Since more than one cluster is involved in each cross-shard transaction, the ledger is formed as a directed acyclic graph.

M-DB: A Continuous Data Processing and Monitoring Framework for IoT Applications

Vaibhav Arora, Mohammad Javad Amiri, Divyakant Agrawal, Amr El Abbadi
The 12th IEEE International Conference on Internet of Things (iThings), pp. 1096-1105, Atlanta, 2019.
DOI: 10.1109/iThings/GreenCom/CPSCom/SmartData.2019.00187

Abstract. IoT devices influence many different spheres of society and are predicted to have a huge impact on our future. Extracting real-time insights from diverse sensor data and dealing with the underlying uncertainty of sensor data are two main challenges of the IoT ecosystem In this paper, we propose a data processing architecture, M-DB, to effectively integrate and continuously monitor uncertain and diverse IoT data. M-DB constitutes of three components: (1) model-based operators (MBO) as data management abstractions for IoT application developers to integrate data from diverse sensors. Model-based operators can support event-detection and statistical aggregation operators, (2) M-Stream, a dataflow pipeline that combines model-based operators to perform computations reflecting the uncertainty of underlying data, and (3) M-Store, a storage layer separating the computation of application logic from physical sensor data management, to effectively deal with missing or delayed sensor data. M-DB is designed and implemented over Apache Storm and Apache Kafka, two open-source distributed event processing systems. Our illustrated application examples throughout the paper and evaluation results illustrate that M-DB provides a realtime data-processing architecture that can cater to the diverse needs of IoT applications.

Towards Global Asset Management in Blockchain Systems

Victor Zakhary, Mohammad Javad Amiri, Sujaya Maiyya, Divyakant Agrawal, Amr El Abbadi
Blockchain and Distributed Ledger Workshop (BCDL), in conjunction with VLDB, Los Angeles, 2019.

Abstract. The uprise of Bitcoin and other peer-to-peer cryptocurrencies has opened many interesting and challenging problems in cryptography, distributed systems, and databases. The main underlying data structure is blockchain, a scalable fully replicated structure that is shared among all participants and guarantees a consistent view of all user transactions by all participants in the system. In this tutorial, we discuss the basic protocols used in blockchain, and elaborate on its main advantages and limitations. To overcome these limitations, we provide the necessary distributed systems background in managing large scale fully replicated ledgers, using Byzantine Agreement protocols to solve the consensus problem. Finally, we expound on some of the most recent proposals to design scalable and efficient blockchains in both permissionless and permissioned settings. The focus of the tutorial is on the distributed systems and database aspects of the recent innovations in blockchains.

Database and Distributed Computing Foundations of Blockchains

Sujaya Maiyya, Victor Zakhary, Mohammad Javad Amiri, Divyakant Agrawal, Amr El Abbadi
ACM SIGMOD International Conference on Management of Data, pp. 2036-2041, The Netherlands, 2019.
DOI: 10.1145/3299869.3314030

Abstract. The uprise of Bitcoin and other peer-to-peer cryptocurrencies has opened many interesting and challenging problems in cryptography, distributed systems, and databases. The main underlying data structure is blockchain, a scalable fully replicated structure that is shared among all participants and guarantees a consistent view of all user transactions by all participants in the system. In this tutorial, we discuss the basic protocols used in blockchain, and elaborate on its main advantages and limitations. To overcome these limitations, we provide the necessary distributed systems background in managing large scale fully replicated ledgers, using Byzantine Agreement protocols to solve the consensus problem. Finally, we expound on some of the most recent proposals to design scalable and efficient blockchains in both permissionless and permissioned settings. The focus of the tutorial is on the distributed systems and database aspects of the recent innovations in blockchains.

On Similarity of Object-Aware Workflows

Mohammad Javad Amiri, Mahnaz Koupaee, Divyakant Agrawal
The 13th IEEE International Conference on Service-Oriented System Engineering (SOSE), pp. 84-89, San Francisco, 2019.
DOI: 10.1109/SOSE.2019.00021

Abstract. Business processes (workflows) are typically the compositions of services (activities and tasks) and play a key role in every enterprise. Finding similar processes in process repositories helps enterprises to reduce their cost and increase their performance. The similarity of different business processes has been measured based on activity labels and structural factors. However, inaccurate and incomplete labels and the existence of multiple labels for similar activities affect the accuracy of the existing methods. Furthermore, with recent advances in business process management and developing innovative paradigms like artifact-centric and decision-aware process modeling, data has become an inseparable part of the process modeling. While data objects and the way they are accessed are recently used to measure the similarity of activities, this approach does not address activities with different granularities. In this paper, we present an approach to measure the similarity of business processes based on the similarity of the life cycles of their objects. The experiments show the effectiveness of the approach to improve the accuracy of the processes similarity task.

Automatic Test Cases Generation from Business Process Models

Arezoo Yazdani, Mohammad Javad Amiri, Saeed Parsa, Mahnaz Koupaee
Journal of Requirements Engineering 24(1), pp. 119-132, 2019.
DOI: 10.1007/s00766-018-0304-3

Abstract. Traditional test case generation approaches focus on design and implementation models while a large percentage of software errors are caused by the lack of understanding in the early phases. One of the most important models in the early phases of software development is business process model which closely resembles the real world and captures the requirements precisely. The aim of this paper is to present a model-based approach to automatically generate test cases from business process models. We first model business processes and convert them to state graphs. Then, the graphs are traversed and transformed to the input format of the “Spec explorer” tool that generates the test cases. Furthermore, we conduct a study to evaluate the impact of process characterizations on the performance of the proposed method.

Object-aware Identification of Microservices

Mohammad Javad Amiri
IEEE International Conference on Services Computing (SCC), pp. 253-256, San Francisco, 2018 [Short Paper].
DOI: 10.1109/SCC.2018.00042

Abstract. Microservices is an architectural style inspired by service-oriented computing that structures an application as a collection of cohesive and loosely coupled components, which implement business capabilities. One of today's problems in designing microservice architectures is to decompose a system into cohesive, loosely coupled, and fine-grained microservices. Identification of microservices is usually performed intuitively, based on the experience of the system designers, however, if the functionalities of a system are highly interconnected, it is a challenging task to decompose the system into appropriate microservices. To tackle this challenge, we present a microservice identification method that decomposes a system using clustering technique. To this end, we model a system as a set of business processes and take two aspects of structural dependency and data object dependency of functionalities into account. Furthermore, we conduct a study to evaluate the effect of process characteristics on the accuracy of identification approaches.

Data-driven Business Process Similarity

Mohammad Javad Amiri, Mahnaz Koupaee
Journal of IET Software 11(6), pp 309-318, 2017.
DOI: 10.1049/iet-sen.2016.0256

Abstract. Although measuring the similarity of business processes based on activity labels, structural and behavioral factors can be effective, defining inexact and incomplete labels and the existence of multiple labels for similar activities cause challenges for determining similar processes. Recent attempts to consider data in business process management and the support of data modeling in business process standards have lead to the creation of multiple business models with data access. In this paper, a method considering data for measuring business process similarity is presented in which first the similarity of activities is measured according to their structures and behaviors in a process and also their data access. Then based on the similarity of activities, the similarity of processes is determined using the proposed algorithm.

Scalable Structure-free Data Fusion on Wireless Sensor Networks

Mahnaz Koupaee, Mohammad Reza Kangavari, Mohammad Javad Amiri
Journal of Supercomputing 73(12), pp 5105-5124, 2017.
DOI: 10.1007/s11227-017-2072-0

Abstract. Recent advancements in sensor technology, wireless networks and consequently wireless sensor networks and the increase of their applications in different fields have led to their great importance. One of the most important challenges of such networks is the distributed management of the huge amount of data produced by sensors in network to reduce data traffic in network and minimize the energy consumption. In this research a distributed, dynamic fusion algorithm is introduced. Since the proposed method is dynamic, the number of neighbors sending data to a node is not known in advance. So in order to increase the chances of different data to meet, the node waiting time is calculated. By the end of waiting time, the node performs data fusion and sends the fused data to the best neighbor chosen by the proposed best neighbor algorithm. This procedure continues until data reaches the sink. The proposed algorithm, while being scalable and convergent, outperforms similar methods in terms of number of transmissions, traffic load and energy consumption.

Multifaceted Service Identifiation: Process, Requirement and Data

Mohammad Javad Amiri, Saeed Parsa, Amir Mohammad Zade Lajevardi
Computer Science and Information Systems 13(2), pp 335-358, 2016.
DOI: 10.2298/CSIS151105011A

Abstract. Service Identification is one of the most important phases in service-oriented development methodologies. Although several service identification methods tried to identify services automatically or semi-automatically, various aspects of business domain are not taken into account simultaneously. To overcome this issue, three strategies from three different aspects of business domain are combined for semi-automated identification of services in this article. At first, the tasks inter-connections within the business processes are considered. Then, based on the common supporting requirements, another tasks dependency has been determined and finally, regarding the significant impact of data in business domain, the last tasks relations are specified. To combine these three strategies, task-task matrices are used as a common language and eventually services are identified by clustering the final task-task matrix.