Publications

Actively Measuring Personal Cloud Storage Actively Measuring Personal Cloud Storage

Raúl Gracia-Tinedo, Marc Sánchez-Artigas, Adrián Moreno-Martínez, Cristian Cotes-González, Pedro García-López

IEEE 6th International Conference on Cloud Computing. June 27-July 2, 2013, Santa Clara Marriott, CA, USA

Abstract

The Personal Cloud model is a mainstream service that meets the growing demand of millions of users for reliable off-site storage. However, despite their broad adoption, very little is known about the quality of service (QoS) of Personal Clouds.

In this paper, we present a measurement study of three major Personal Clouds: DropBox, Box and SugarSync. Actively accessing to free accounts through their REST APIs, we analyzed important aspects to characterize their QoS, such as transfer speed, variability and failure rate. Our measurement, conducted during two months, is the first to deeply analyze many facets of these popular services and reveals new insights, such as important performance differences among providers, the existence of transfer speed daily patterns or sudden service breakdowns.

We believe that the present analysis of Personal Clouds is of interest to researchers and developers with diverse concerns about Cloud storage, since our observations can help them to understand and characterize the nature of these services.

Activity Stereotypes, or How to Cope with Disconnection during Trust Bootstrapping Activity Stereotypes, or How to Cope with Disconnection during Trust Bootstrapping

Marc Sánchez-Artigas, Blas Herrera

IEEE Transactions on Parallel and Distributed Systems

Abstract

Trust-based systems have been proposed as means to fight against malicious agents in peer-to-peer networks, volunteer and grid computing systems, among others. However, there still exist some issues that have been generally overlooked in the literature. One of them is the question of whether punishing disconnecting agents is effective. In this paper, we investigate this question for these initial cases where prior direct and reputational evidence is unavailable, what is referred in the literature as trust bootstrapping. First, we demonstrate that there is not a universally optimal penalty for disconnection and that the effectiveness of this punishment is markedly dependent on the uptime and downtime session lengths. Second, to minimize the effects of an improper selection of the disconnection penalty, we propose to incorporate predictions into the trust bootstrapping process. These predictions based on the current activity of the agents shorten the trust bootstrapping time when direct and reputational information is lacking.

Boosting Content Delivery with BitTorrent in Online Cloud Storage Services Boosting Content Delivery with BitTorrent in Online Cloud Storage Services

Rahma Chaabouni, Pedro García-López, Marc Sánchez-Artigas, Sandra Ferrer-Celma

IEEE International Conference on Peer-to-Peer Computing. September 9-11, 2013 - Trento, Italy.

Abstract

In the last few years, we have witnessed a rush of online storage services entering the market. From a technical point of view, most of them use HTTP as a transfer protocol and miss the opportunity to benefit from the common interest of users in the same content.

In this demo, we demonstrate that the integration of BitTorrent in Cloud systems for content distribution can offload storage servers from doing much of the serving. To assess the performance gain, we have modified a local Cloud based on OpenStack Swift to accommodate BitTorrent and we developed a monitoring tool to visualize transfers in real time.

C3P: Context-Aware Crowdsourced Cloud Privacy C3P: Context-Aware Crowdsourced Cloud Privacy

Hamza Jarkous, Rameez Rahman, Karl Aberer

Privacy Enhancing Technologies Symposium (PETS) 2014

Abstract

Due to the abundance of attractive services available on the cloud, people are placing an increasing amount of their data online on different cloud platforms. However, given the recent large-scale attacks on users data, privacy has become an important issue. Ordinary users cannot be expected to manually specify which of their data is sensitive, or to take appropriate measures to protect such data. Furthermore, usually most people are not aware of the privacy risk that different shared data items can pose. In this paper, we present a novel conceptual framework in which privacy risk is automatically calculated using the sharing context of data items. To overcome ignorance of privacy risk on the part of most users, we use a crowdsourcing based approach. We use Item Response Theory (IRT) on top of this crowdsourced data to determine the sensitivity of items and diverse attitudes of users towards privacy. First, we determine the feasibility of IRT for the cloud scenario by asking workers feedback on Amazon mTurk on various sharing scenarios. We obtain a good fit of the responses with the theory, and thus show that IRT, a well-known psychometric model for educational purposes, can be applied to the cloud scenario. Then, we present a lightweight mechanism such that users can crowdsource their sharing contexts with the server and determine the risk of sharing particular data item(s) privately. Finally, we use the Enron dataset to simulate our conceptual framework and also provide experimental results using synthetic data. We show that our scheme converges quickly and provides accurate privacy risk scores under varying conditions.

Cloud Storage Service Benchmarking: Methodologies and Experimentations Cloud Storage Service Benchmarking: Methodologies and Experimentations

Enrico Bocchi, Marco Mellia, Sofiane Sarni

IEEE CloudNet. October 8-10, 2014.

Abstract

Data storage is one of today’s fundamental services with companies, universities and research centers having the need of storing large amounts of data every day. Cloud storage services are emerging as strong alternative to local storage, allowing customers to save costs of buying and maintaining expensive hardware. Several solutions are available on the market, the most famous being Amazon S3. However it is rather difficult to access information about each service architecture, performance, and pricing. To shed light on storage services from the customer perspective, we propose a benchmarking methodology, apply it to four popular offers (Amazon S3, Amazon Glacier, Windows Azure Blob and Rackspace Cloud Files), and compare their performance. Each service is analysed as a black box and benchmarked through crafted workloads. We take the perspective of a customer located in Europe, looking for possible service providers and the optimal data center where to deploy its applications. At last, we complement the analysis by comparing the actual and forecast costs faced when using each service.

According to collected results, all services show eventual weaknesses related to some workload, with no all-round eligible winner, e.g., some offers providing excellent or poor performance when exchanging large or small files. For all services, it is of paramount importance to accurately select the data center to where deploy the applications, with throughput that varies by factors from 2x to 10x. The methodology (and tools implementing it) here presented is instrumental for potential customers to identify the most suitable offer for their needs.

Cloud-as-a-Gift: Effectively Exploiting Personal Cloud Free Accounts via REST APIs Cloud-as-a-Gift: Effectively Exploiting Personal Cloud Free Accounts via REST APIs

Raúl Gracia-Tinedo, Marc Sánchez-Artigas, Pedro García-López

IEEE 6th International Conference on Cloud Computing. June 27-July 2, 2013, Santa Clara Marriott, CA, USA.

Abstract

Personal Clouds, such as DropBox and Box, provide open REST APIs for developers to create clever applications that make their service even more attractive. These APIs are a powerful abstraction that makes it possible for applications to transparently manage data from user accounts, blurring the lines between a Personal Cloud service and storage IaaS. Jointly, Personal Clouds also offer free accounts to lure new users, that normally include reduced storage space and unlimited transfers.

However, the unintended consequence of combining open APIs and free accounts is that these companies are exposing automated access to a free storage infrastructure, which may lead to abuse by malicious parties. By exploiting the freemium API service, users may fraudulently consume resources or they can use free accounts as a Cloud storage layer to support abusive applications. We call this vulnerability the storage leeching problem.

In this paper, we show how easy it is to implement a file-sharing application able to distribute digital content by abusing Personal Clouds. Making use of open APIs, this application transparently aggregates the limited-space free accounts from multiple providers into a single larger storage layer, while achieving better transfer speed than that received from one provider alone. This demonstrates that free accounts can be easily exploited to obtain a practical Cloud storage service, and therefore, the potential impact of storage leeching.

Continuation Complexity: A Callback Hell for Distributed Systems Continuation Complexity: A Callback Hell for Distributed Systems

Edgar Zamora-Gómez, Pedro García-López, Rubén Mondéjar

Euro-Par 2015 Workshops, LNCS 9523, pp. 1–13, 2015.

Abstract

Designing and validating large-scale distributed systems is still a complex issue. The asynchronous event-based nature of distributed communications makes these systems complex to implement, debug and test. In this article, we introduce the continuation complexity problem, that arises when synchronous invocations must be converted to asynchronous event code. This problem appears in most Actor libraries where communication is mainly asynchronous, and where a synchronous call to other Actor would block the current Actor, precluding the processing of incoming messages. We propose here a novel parallel invocation abstraction that considerably simplifies the continuation complexity problem for distributed actor systems requiring non-blocking synchronous invocations. Our parallel abstraction extends the message passing concurrency model to support concurrent interleavings of method executions within a single Actor. We present here two candidate solutions for implementing such parallel calls: one based on threads and locking, and other based on green threads and continuations. We evaluated the simplicity of our solution implementing a well known distributed algorithm like Chord (ring-based structured overlay). We compare our Actor implementation of Chord with three different simulators (PeerSim, PlanetSim, Macedon). This validation demonstrates that our approach is superior in simplicity (less LoC) and complexity (less McAbe complexity), envisaging its great potential for distributed systems scenarios.

CoShare: A Cost-Effective Data Sharing System for Data Center Networks CoShare: A Cost-Effective Data Sharing System for Data Center Networks

H. Zhuang, I. Filali, R. Rahman, K. Aberer

IEEE CIC 2015

Abstract

Numerous research groups and other organizations collect data from popular data sources such as online social networks. This leads to the problem of data islands, wherein all this data is isolated and lying idly, without any use to the community at large. Using existing centralized solutions such as Dropbox to replicate data to all interested parties is prohibitively costly, given the large size of datasets. A practical solution is to use a Peer-to-Peer (P2P) approach to replicate data in a self-organized manner. However, existing P2P approaches focus on minimizing downloading time without taking into account the bandwidth cost. In this paper, we present CoShare, a P2P inspired decentralized cost effective sharing system for data replication. CoShare allows users to specify their requirements on data sharing tasks and maps these requirements into resource requirements for data transfer. Through extensive simulations, we demonstrate that CoShare finds the desirable tradeoffs for a given cost and performance while varying user requirements and request arrival rates.

Decentralizing the Cloud: How Can Small Datacenters Cooperate? Decentralizing the Cloud: How Can Small Datacenters Cooperate?

Hao Zhuang, Rameez Rahman, Karl Aberer

 IEEE P2P'14. September 8-12, 2014. London. England.

Abstract

Cloud computing has become pervasive due to attractive features such as on-demand resource provisioning and elasticity. Most cloud providers are centralized entities that employ massive data centers. However, in recent times, due to increasing concerns about privacy and data control, many small data centers (SDCs) established by different providers are emerging in an attempt to meet demand locally. However, SDCs can suffer from resource in-elasticity due to their relatively scarce resources, resulting in a loss of performance and revenue. In this paper we propose a decentralized cloud model in which a group of SDCs can cooperate with each other to improve performance. Moreover, we design a general strategy function for the SDCs to evaluate the performance of cooperation based on different dimensions of resource sharing. Through extensive simulations using a realistic data center model, we show that the strategies based on reciprocity are more effective than other involved strategies, e.g., those using prediction on historical data. Our results show that the reciprocity-based strategy can thrive in a heterogeneous environment with competing strategies.

Dissecting UbuntuOne: Autopsy of a Global-scale Personal Cloud Back-end Dissecting UbuntuOne: Autopsy of a Global-scale Personal Cloud Back-end

Raúl Gracia-Tinedo, Yongchao Tian , Josep Sampé, Hamza Harkous, John Lenton, Pedro García-López

Abstract

Personal Cloud services, such as Dropbox or Box, have been widely adopted by users. Unfortunately, very little is known about the internal operation and general characteristics of Personal Clouds since they are proprietary services.In this paper, we focus on understanding the nature of Personal Clouds by presenting the internal structure and a measurement study of UbuntuOne (U1). We first detail the U1 architecture, core components involved in the U1 meta- data service hosted in the datacenter of Canonical, as well as the interactions of U1 with Amazon S3 to outsource data storage. To our knowledge, this is the first research work to describe the internals of a large-scale Personal Cloud. Second, by means of tracing the U1 servers, we provide an extensive analysis of its back-end activity for one month. Our analysis includes the study of the storage workload, the user behavior and the performance of the U1 metadata store. Moreover, based on our analysis, we suggest improvements to U1 that can also benefit similar Personal Cloud systems. Finally, we contribute our dataset to the community, which is the first to contain the back-end activity of a large-scale Personal Cloud. We believe that our dataset provides unique opportunities for extending research in the field.

You are here: Home Publications Publications