-1

I'm working on a project for university. Do you guys know what kind of algorithms i could implement that would help with the proper design and general performance of a database? Until now i came up with an algorithm that can help the user pick candidate keys and also an algorithm for normalization up to 3NF. Do you have any other ideas or suggestions? Thanks.

a_horse_with_no_name
  • 440,273
  • 77
  • 685
  • 758
Olteanu Radu
  • 57
  • 1
  • 1
  • 9
  • what do you mean by `algorithms` ? An algorithm is just a logical construction, which might be implemented in any programming language. I actually have no idea what you meant by it. – Roberto Hernandez Aug 05 '20 at 10:25
  • For example i created a java algorithm that connects to a small oracle database and tries to normalize it up to 3NF. I'm talking about code that can be run on a database to check for problems in the design and improve it. – Olteanu Radu Aug 05 '20 at 10:34
  • 1
    code in which language ?? – Roberto Hernandez Aug 05 '20 at 10:35
  • java mostly but this is not a restriction. i just need some ideas/pseudocode of stuff i could implement to check the database for things that could affect its performance or maybe improve it. – Olteanu Radu Aug 05 '20 at 10:36
  • I'm not sure I believe an algorithm can design a database either, but I suppose you could try: choosing best type of tables and indexes e.g. IOT vs normal, partitioning? Re candidate keys and normalization, this is just instinctive to anyone who has worked with databases for a while. – Tony Andrews Aug 05 '20 at 11:56

3 Answers3

2

This is like asking how you can figure out how to make a car be more efficient. It's such a broad question that it's essentially unanswerable. There are so many moving parts to a car, and each one has its own problems. You really need to understand what each component is doing. In the case of databases, you need to understand the data before you try and fix it. And if you want a good answer, you have to ask the right questions.

A good question should include context on what you are working with, and what you are trying to do. And when it comes to data manipulation, the details are extremely important. How is your data represented? What kind of infrastructure are you working with? What purpose does the data serve, and what processes use this data? If you are working with floating point numbers, are your processes tolerant of small rounding errors? Would your organization even let you make changes to how the data is stored?

In general, adding algorithms to improve data performance is probably largely unnecessary. Databases are designed out of the box to be simple and efficient. If there were a known method to increase efficiency in general without any drawbacks, there's no reason why the designers of the system wouldn't have implemented it already.

  • This question is clearly too broad & shows no research & should be downvoted & close votes/flagged & not answered. – philipxy Aug 06 '20 at 16:09
1

I am just putting an answer because I have no way to tell this in the comment section. You need to understand a basic principle in database design and data model construction. What your database is for ? That is the main question, and believe it or not, sometimes people with experience make the same mistake.

As you were saying, 3NF could be good for OLTP systems, but it would be horrendous for Data Warehouse or Reporting Databases where the queries are huge and they work on big batch operations. In those systems denormalization offers always better results.

Once you know what you're database is for, then you can start to apply some "Best Practices" , but even here there is a lot of room for interpretation, and even worse, same principles could be good in one place but very bad in another. I am just going to provide you an example of my real experience

8 years ago I started a project and we have to design a database for a financial application. After some analysis, we decided to use a start model, or dimension-fact model. We decided to create indexes ( including bitmap ) for some tables, even though we were rebuild them during batch to avoid performance degradation.

Funny thing is that after some months, I realised that the indexes were useless, as the users were running queries that were accessing the whole data, mostly analytics and aggregation. Consequence: I drop all indexes.

Is it a good thing to do ? No, it is not, but in my scenario it was the best thing and the performance increased a lot, both in batch and also in user experience.

Summary, like an old friend of mine that that was working in Oracle Support used to tell me: "Performance is an art my friend, not a science"

Roberto Hernandez
  • 4,275
  • 2
  • 4
  • 29
0

There are too many database algorithms to list, but below is a structured way of thinking about classes of algorithms that affect database performance.

Algorithm analysis is a helpful way of categorizing and thinking about many database performance problems. While most performance problems are solved with best practices and trial-and-error, we'll never truly understand why one solution is better than another without understanding the algorithms behind them. Below is a list of functions that describe the algorithmic complexity of different database operations, ordered from fastest to slowest.

  1. O(1/N) – Batching to reduce overhead for bulk collect, sequences, fetching rows
  2. O(1) - Hashing for hash partitioning, hash clusters, hash joins
  3. O(LOG(N)) – Index access for b-trees
  4. 1/((1-P)+P/N) – Amdahl's Law and its implications for parallelizing large data warehouse workloads
  5. O(N) - Full table scans, hash joins (in theory)
  6. O(N*LOG(N)) – Full table scan versus repeated index reads, sorting, global versus local indexes, gathering statistics (distinct approximations and partition birthday problems)
  7. O(N^2) – Cross joins, nested loops, parsing
  8. O(N!) – Join order
  9. O(∞) – The optimizer (satisficing and avoiding the halting problem)

One suggestion - based on the way you phrased your questions and comments, you're thinking of a database as merely a place to store data. But the most interesting parts of a database happen when you think of them as joining machines. There's not much to optimize about data sitting around, the real work happens when data is combined.

The above list is based on Chapter 16 of my book, Pro Oracle SQL Development. You can read an early version of the entire chapter for free here. While the chapter mostly stands alone, it requires an advanced understanding of Oracle. But each of the topics could be the basis for a lifetime of academic study, so you only need to pick one.

Jon Heller
  • 31,663
  • 6
  • 63
  • 114