DBMS Performance Evaluation - DS-Project-I

Context

DBMS can help us manage data conveniently and significantly improve the efficiency of data retrieval.

PostgreSQL is a popular open-source RDBMS known for its robustness, advanced features, and strong compliance with SQL standards. openGauss is an enterprise-grade open-source RDBMS developed by Huawei, designed for high performance, security, and scalability in demanding business environments.

Preface

Latest Report

Tsawke's Blog

Github

Environment

System: Alibaba Cloud Linux 3.2104 LTS 64位

IP: 47.115.128.238

openGauss (docker)

PostgreSQL

Datasets

Clickstream

The Clickstream dataset aggregates counts of (referrer, resource) pairs from Wikipedia request logs, showing how readers arrive at an article and what they click next.

Run DownloadDatasets.sh to download all the datasets.

To import datasets to PostgreSQL, run:

Similar when importing to openGauss.

Results:

What are the unique advantages of a DBMS compared with data operations in files?

Purpose: Find all events that contains 'main' in 'curr'(current).

PostgreSQL

During 5 tests, the average result is 24662.1ms.

C++

During 5 tests, the average result is 20116.2ms.

Purpose: Update every '_' in 'curr' to '^'.

PostgreSQL

The result is 225019.1ms.

C++

The result is 61515.7ms.

PostgreSQL

The result is 101260.8ms.

C++

The result is 69683.3ms.

Conclusion

Overall, C++ streaming program beat the DBMS(PostgreSQL) on all tests.

And there're multiple reasons why DBMS seems to be slower than C++:

Thus, the results doesn't negate the DBMS strengths.

At some circumstances include reusable queries, concurrency, strong consistency, complex joins/transactions, e.t.c., DBMS will perform significantly better than data operations in files.

Which DBMS is better? PostgreSQL or openGauss, and by which standard?

Preparation

Comparison of Select

Results:

PostgreSQL: 24662.1ms.

openGauss: 27012.6ms.

Comparison of Update

Results:

PostgreSQL: 225019.1ms.

openGauss: 244217.9ms.

Comparison of Table Join

Results:

PostgreSQL: 180567.2ms.

openGauss: 191124.3ms.

Comparison of Top-K Query

Results:

PostgreSQL: 97628.7ms.

openGauss: 115431.5ms.

Conclusion

Overall, we can conclude that PostgreSQL and openGauss have almost the same efficiency. They are very similar and openGauss is usually a little bit slower than PostgreSQL. Therefore, at some specific circumstances, like looser settings, openGauss will perhaps performs better, but still a little.

DBMS Strengthens

As results showcased, DBMS performs slower than C++ in select operation, but if we use prefix search (ILIKE 'main%'), DBMS will perform much better.

Better Join Operations

DBMS will perform better when joining large tables, no matter PostgreSQL, openGauss or other DBMS.

Extensions

With pg_trgm, or some other extensions, we can obtain better operation efficiency, even still slower than C++ at some circumstances.

Ultimate Conclusion

Overall, C++ streaming performs better than DBMS baselines for it's more basic and trivial. Nevertheless, in reusable, concurrent or consistency-critical circumstances with proper design, DBMS can significantly outperform than C++ streaming (Not mentioned too much in this report due to the requirements). And PostgreSQL and openGauss performed almost the same and PostgreSQL is a little bit faster in most circumstances. While openGauss still has its own advantages.

Remarks

How to connect?

PostgreSQL

openGauss