Sorgulama optimizasyonunda SQL kullanımı ve bir uygulama

Çatalbaş, Semra
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Fen Bilimleri Enstitüsü
Bu çalışmada Standard bir veri tabanı alt dili olan SQL kullanılarak sorgulama optimizasyonu konusu incelenmiştir. öncelikle sorgulama optimizasyonu için neden SQL * in seçildiğini açıklayabilmek amacıyla SQL dilinin bir çok alanda tasarımcılara sağladığı imkanlar ve avantajları anlatılmıştır. SQL dilinin pek çok özelliğinin anlatıldığı kısımlarda anlaşılabilirliği arttırmak amacıyla örnekler verilmiştir. Çalışma kapsamında örnek bir sistem seçilmiş ve üzerinde sorgulama optimizasyonu gerçekleştirilmiştir. Bu örnekte tasarım esnasında kullanılan yöntemde veri tabam, yaşam süreci olarak kabul edilmekte ve veri ortak bir kaynak olarak düşünülmektedir. Yöntem gereksinim formülasyonu ile sistemin bilgi ve işlem gereksinimlerini belirleyerek, mantıksal tasarımdan ve gerçekleştirme adımlarından sonra veri tabanının verimliliğini belirlenen gereksinimleri baz alarak incelemektedir. Mantıksal kayıt erişim sayılan ve taşıma hacmi değerleri yaklaşımı ile tasarımın performansı incelenmekte ve gerekli görüldüğü takdirde bu değerleri azaltmak ve öncelikli işlemlerin performansım artırabilmek amacıyla düzenlemeler yapılmaktadır. Seçilen örnek sistemde de bu yaklaşım kullanılarak bir veri tabam tasarımı yapılmıştır. Tasarım oluşturulduktan sonra, sistem bilgi ve işlem gereksinimlerine göre incelenerek veri tabanı yapısı bilgisayar ortamında denenmiştir. Yapı, hesaplanan değerlere ve amaca göre yeniden düzenlenmiş ve ulaşılan yeni yapı da bilgisayar ortamında denenmiştir. Her iki yapının eklerde verilen sonuçlan karşılaştırılarak sistemde bir iyileşme olduğu belirlenmiştir.
Database management systems have evolved to the point of general acceptance and wide application ; however a major problem still facing the user is the effective utilization of these systems. Important to achieving effective database usability and responsiveness is the design of the of database. The main objective is to present data as a common, shared and centrally controlled source. The target is to decrease data maintenance and application development costs. Security, data integrity and reduced data redundancy because of increased concurrent is the goals of the database design process. In order to get these advantages of databases, one have to design databases carefully and correctly. The design process is the most important, because all of other processes depend on it. This thesis presents how SQL, Structured Query Language, which is a standard database sub language, can be used in query optimization. Various aspects of SQL which provides several benefits and advantages is explained detaily and query optimization is done on an example database by using SQL. SQL is an important language as it is the first and so far only standard database language to gain wide acceptance. It provides language facilities that enable the definition of data and maintaining the defined data in computer systems. Data Definition Language part of the SQL allows definition of data structures and the specification of access control. Data Manipulation Language part is used to instruct the database management systems on how to manipulate die actual data stored according to their definition in the database. Since database systems and application programs traditionally forms the information systems offered to clients, database part of the information systems must be designed carefully to obtain the best performance from the information system. The maintenance of data in database can only be performed through use of a database management system and SQL language describes how database management system can be addressed, i.e. which syntax to use when communicating database management systems and what the functionality exhibited will be, i.e. which meaning and semantics are evolved. For the correct and efficient updating of data in the database needs to be structured in a particular manner, called normalized form. Data in unnormalized form can be transformed into normalized form in three steps. First step is the removal of repeating groups and in the second step partial key dependencies must be removed. After removal of transitively dependent columns, data arrives the third normal form. The normalized data structure can simply be written into SQL data definition statements to create the tables. Security is another important aspect of the database design process and deals with two major problem : the maintenance of the operational services and its speedy restatement after any failure and the prevention of any unauthorized use of system components. According to the SQL standard, database management systems must have a mechanism in place to maintain database integrity and to undo SQL command. Furthermore, SQL describes the transaction concept and provides control statements for supporting this concept. All of this is aimed of controlling the correct functioning of the information system. To meet the second objective SQL offers support in the form of access control facilities and allows the specification of privileges that can be granted and revoked from the clients of the information system. The process of database and application design must ensure that the information system support its clients adequately. In a database environment, data must be integral, consistent and accurate state in order to meet the requirements from the clients. SQL provides facilities for defining integrity constraints in an SQL database in order to simplify and control the maintenance of integrity rules. Integrity has to do whether the data is self-consistent and agrees with its definition. This covers both the physical and logical aspect of the database. Column and table constraints and assertions can be specified in SQL to require the database management system to control the logical aspect of the database. Using SQL 's referential integrity which ensures the relationship between two tables, application programs can be isolated from this kind of controls. Designing an application in such a way that all referential integrity checks are easily identified and removed from the application routines should make the task of upgrading an application far easier. xi Even if not all of the constraints are implemented in the database management system, it is a good idea to define them all in terms of SQL as reference model for the application programmers and subsequent maintainers of the system. Because SQL provides a concise and unambiguous definition which can be great benefit in the development and maintenance phase of the system. Missing information is one factor that is all too often forgotten when databases are designed and queried. SQL provides some explicit support for the concept of missing information with the null state. The null value, is in fact, a status and a column has one of two status, valued or null. SQL has also a facility to replace a null value with a default value when the column is first created. Another important and powerful mechanism of SQL is the view mechanism which can be used for various purposes. Views combines portions of one or more tables into a single table. This allows the presentation of data stored in a database more suitable for differing groups of users. Since view mechanism bring together of separate tables into a derived table via query expression, extracting data from or browse through normalized structures becomes easier. Because of its nature normalization leads us to an increased number of tables and makes it more complex to extract and brose data from the tables. Morever, selecting positively portions offered to users by using views, irrelevant data is filtered out and only relevant information is made visible to the user. A further way of pleasing the user is to provide assistance with the derivation of values from the selected rows and columns in the view. Calculation of minima, maxima, averages or counts based on sets of rows can be specified in views. Also, a view with columns containing the results of operations performed on others, visible or invisible, columns of the view can be specified. By its nature view mechanism limits very precisely the scope and the visibility of the of data in the database. Views offer control over rows as well as columns, thereby, allowing effective control at the record and field occurrence level. Usage of views can also provides some enhancements on physical environment. Defining views that contain only the columns required in the application makes it possible for SQL to project the required data alone in the memory buffers. Using SQL in distributed environments has also several advantages. Nowadays we need to consider programs that access data on many different computers in a network. Not all of these computers need to be under the control XII of the same organization and thus it is both impossible and undesirable to have a single schema that defines all of the data in the database. Indeed, in some large networks consisting of several thousand nodes it may be impossible to determine what the complete definition of all the data stored is, as it is highly likely that not all the nodes of the network are available at any given moment in time. SQL deliberately uses a rather vague definition of database in order to avoid this conceptual problem. In SQL, a database is defined as being a collection of all the data described by all the schemas in all the catalogs referenced is an SQL session. A session is an execution of an application program, a database is all the data defined by the all definitions that happen to be accessed by a program. SQL also offers commands so that a client can setup communication with servers and allows establishing multiple sessions between a client and different servers. In distributed database environments SQL database management system takes all responsibility for all features like fragmentation and allocation of data. Apart from all these concerns above, there are many opportunities to enhance the performance of the query processing itself, such as flattenning the query. Many optimizers work more efficiently on flat queries than on nested ones. Queries which are formulated in natural languages must be translated into SQL language with some rules. A problem given in natural languages must be correctly formulated in SQL. Query can be formulated by a user and translated into computer languages by an analyst. A method which consists of four steps, is defined for arriving a correctly formulated query. In each step of the method, a single aspect of problem of formulation into SQL is concentrated. Both the user and analyst must work cooperatively to transform the user 's query into a format suitable for a query language processor. In the first step, after user formulates the query, analyst and user try to understand and eradicate all ambiguity from the query. During the dialog between them, user must make clear to the analyst which meaning is intended. After having reached agreement about the nature and meaning of the query, in the second step, all the terms as posed by the user must be replaced by terms from the database definition. Third step transforms the query into the prenex normal form and in the final step analyst or a translation program translates query in the prenex normal form into the syntax of SQL. XIII Generating SQL syntax from the prenex normal form gives a flat query instead of nested query. A nested query is the one where sub queries exist,perhaps,further sub queries and so on and in a flat query there are a few sub queries if possible none. A flat query is more readable than nested one which contributes correct and maintainable coding. A second benefit of flat SQL is the ability of optimizers in database management system to generate a more efficient code. Certain optimizers always generate inside out in nested SQL queries and thereby, frequently cannot achieve an optimal solution. In the thesis a sample database design study is included. The system designed for a multinational company which sales computer products which are imported from ite plants in different countries. The general statements of the company desires to handle all of the operations on sales, importing, invoicing and commissions of salespeople. In order to measure or project the performance of the design logical record access method is used. Logical record access counts are made to estimate the expected number of occurences of each record type retrieved explicitly. When weighted by processing frequencies, they show which application will probably require the most I/O access to and from the database. Transport volume is the total amount of data flow between then application programs and database management systems. The volume can be calculated for each record type by multiplying that record type ' s size and its logical record access number. This may be summed over all record types to calculate the transport volume in bytes for the application. To obtain the global transport volume, various application ' s transport volumes should be weighted by the application frequencies and summed. By noting which applications are dominant in terms of frequency of execution, logical record accesses per unit time and transport volume, one can determine where design improvements will be most effective. In the example, first a database design had been done and implemented, then the system examined in computer environment to test the database performance. Logical record access counts and transport volumes calculated. Improvements has xiv been done on the design and new design re-implemented and tested. Comparison charts and CPU time outputs can be find at appendices. The main objective of this study is to express benefits of using SQL in query optimization.
Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 1994
Anahtar kelimeler
Mühendislik Bilimleri, Bilgisayar yazılım dilleri, SQL, Sorgulama dili, Veri tabanı tasarımı, Engineering Sciences, Computer software languages, Structured query language, Query lanquage, Database design