Friday, August 19, 2005

Storage and Querying of E-Commerce Data

Here is the summary (written by me) of this paper.



Summary of Paper

Storage and Querying of E-Commerce Data

Muhammad Yahya

Computer and Information System Engineering Department

NED University of Engineering and Technology, Karachi.

muhammad.yahya@gmail.com


1 Abstract

New generation of e-commerce applications require data schemas that are constantly evolving and sparsely populated. The conventional horizontal row representation fails to meet these requirements. The Horizontal Data can be transformed in to Vertical representation, in which each row has Object Identifier and attribute name-value pair.

2 Introduction

In Large e-commerce systems, attributes in some table keeps on increasing with new items. And so the performance was becoming bottle-neck as most of the columns had no values. This problem was also giving rise to change the schema again and again to accommodate new commodities.

2.1 Issues

The following problems were faced in horizontal representation

· Large number of columns: As many columns required as the attributes of objects, and this count keeps on increasing with new objects

· Sparsity: Nulls in most of the fields

· Schema Evolution: Frequent alteration would be required in the design

· Performance: Only few columns are required from wide records

2.2 Vertical Representation

As a solution of above problem, Vertical representation was proposed to be used in large e-commerce systems. And the table will have the following columns in it.

Oid

Key

Val

Oid: Object Identifier

Key: Attribute Name

Val: Attribute Name

The vertical table contains tuples for only those attributes that are present in an object. Writing SQL queries against this vertical representation was difficult and error-prone. Also most of the tools written are for Horizontal representation.

The basic idea behind this research was to develop a logical layer over Vertical data, i.e. horizontal data would be visible to the user, and there would be no need to re-design previously built systems. Few alternatives to this design are given in paper.


Horizontal

Oid

A1

A2

A3

1

a

b

-

2

-

c

d

3

-

-

a

4

b

-

d

- corresponds to null value

Vertical

Oid

Key

Val

1

A1

a

1

A2

b

2

A2

c

2

A3

d

3

A3

a

4

A1

b

4

A3

d

The approach is to define a horizontal view; over a vertical table. The user poses regular SQL queries over these views, which are translated into queries that run against the underlying vertical table. Transformations are given in the form of Relational Algebra. Some standard algebraic operations are needed to be re-written. All the transformations and Re-writings are done without assumptions, so they all are valid and proved.

2.3 Alternative Representations

Following were the alternatives present


· Split a horizontal table into as many 2-ary tables as the number of columns i.e. as many tables as the number of columns

· Create one table for each new category

· Create one table for common attributes and per category separate tables for non-common attributes.

· Represent you data in 3-ary table i.e. the Vertical Representation


3 Transformations

The view to user is Horizontal, and all the queries made by user are against that view, behind this logical view is the vertical representation. This vertical view is transparent to the user.

A Transformation is made to these queries that allow fetching of data from vertical table. In this manner the user and the old running systems are saved from re-writing the queries.

4 Implementation

A non-intrusive enablement layer is built on top of the database engine.

The enablement layer parses the SQL query, validates it, and transforms it to another SQL query that runs against the underlying vertical table.

Three types of implementations are proposed that work with Vertical representation.


· Vertical SQL

· Vertical UDF

· Schema SQL

5 Performance Experiments

Very Large numbers of experiments were performed on different alternatives, and results for projection, selection, join and aggregation operations are given in [1], and the results are shown with the help of graphs.

The performance results showed that VerticalSQL uniformly outperforms HorizontalSQL but slightly under performs Binary. The vertical representation uniformly outperforms horizontal representation for sparse data

The performance of the vertical representation is sensitive to the choice made for clustering the data. Clustering on Key has much higher performance than clustering on Oid.

The Vertical representation can outperform binary representations for other operations as well, if table function could provide some extra functionality.

6 Conclusion

Emerging applications such as e-commerce and portals are creating new threats and opportunities for database technology.

The new generation of applications requires data schemas that are rapidly evolving and sparsely populated.

The flexible design of vertical representation makes it easily manageable as compared to sparsely populated horizontal table. Design of an enablement layer hides the complexity of the queries over the vertical table and gives a horizontal view of the vertical representation to the user (application).

The performance of Vertical System also depends on the type of clustering of data.

Comparison of the three representations


Horizontal

Vertical

Binary

Manageability

+

+

-

Flexibility

-

+

-

Performance

-

+

+


The following enhancements[1] can improve the performance of Vertical representation.

· Partial indices

· Enhanced table functions

· First class treatment of table functions

· Native support for v2h and h2v operations

7 References

[1] Storage and Querying of E-Commerce Data

Rakesh Agrawal Amit Somani Yirong Xu

IBM Almaden Research Center

650 Harry Road, San Jose, CA 95120

http://www.almaden.ibm.com/software/quest/Publications/papers/vldb01_ecom.pdf



[1] See [1] for description of these capabilities

2 comments:

  1. Looks cool! I might read it someday! :)

    ReplyDelete
  2. I'm sure that someday will never come, so just say the summary looks good ... whiz!

    btw, how much did ya get outta 10 for the summary yahya man?

    ReplyDelete