Semantik veri modellerinde bellekte kalıcı girişler için indeks seçimi

thumbnail.default.alt
Tarih
1993
Yazarlar
Sürücü, Eda
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Yayınevi
Fen Bilimleri Enstitüsü
Özet
Veritabanının fiziksel olarak düzenlenmesindeki en büyük problemlerden biri mümkün olan en iyi indeks kümesini seçebilmektir. Bu araştırmada bütün bilgi girişlerinin bellekte kalıcı olduğu durumda problem ele alınmıştır. Ele alınan veri modeli ilişkisel modele genişleyebilen bir semantik veri modelidir. Bu modelde giriş semasıyla birlikte bir kısmi eşsorgulamalar kümesi ele alınmaktadır. Problem iki bölüm halinde incelenmektedir. İlk bölümde verilen kısmi eşsorgulamalara cevap verebilecek bir başlangıç indeksler kümesi araştırılmaktadır. İkinci bölümde ise indeks kümesi rafine edilerek sorgulamalara cevap verebilecek en iyi indeks kümesi saptanır. Uygulanan algoritma optimal olup, bitin sorgulamalara cevap verebilecek en küçük indeks kümesinin bulunması garanti edilmektedir. Ayrıca elde edilen indeks kümesi ile tüm sorgulamalar verimli bir şekilde çalışabilmektedir.
Nowadays, databases are very important tools for application development. The importanat thing is not to set up a database but to provide most effective data reports for end users. At this point, application programmers face with a serious problem of choosing the index set that best supports a search requirements. In this paper, it is suggested a new solution to this problem. Now we summarize the steps of application development and how we choose the best index set. The solution is optimal. First of all, we must determine the database structure carefully. Some criters are important while setting up the database. Semantic data model concept is very useful when setting up the database because it provides greater flexibility in describing information. Moreover, this data model can extend relational model in at least two ways. They are summarized as followed. First separate concepts of "domain" and "relation" is combined into a single concept of "class". The second way is generalized taxonomy. In tihs way the relational model is extended to allow classes are organized in a generalization taxonomy by adding directed edges among classes to indicate superclass relationships. We represent our data model by a directed acyclic graph, called a class schema. There is also important definitions that will be used in hte data model. Briefly, these definitions are as followed: Class Schema : A database is definde by a direct acyclic graph S, called a class schema and it is represented in the form below: C{P1:C1,..., Pn:Cn} Multiple Inheritance : If a class has the properties of more than one class generally it is referred as multiple inheritance. Immediate Superclass : Class C2 is an immediate superclass of class CI, written Cl^> C2, if includes an arc from class CI to class C2. - IV Superclass : A class C2 is an immediate superclass of class CI, if C2 = Cl or C2 is an immediate superclass of class CI, if C2 = Cl or C2 is an immediate superclass of Cl. Partial Match Query : In a class schema S, a partial match query is defined with the form find each c given {pfl,..., pfm} Enrollment^: Student, C: Course, Mark: Integer} Course {CName: String, TaughtBy: Teacher} Integer {} String {} Person {PName: String, Dept: String} Student{Year: x Integer} Teacher{Salary: Integer} Î Graduate {Supervisor: Professor} Professor {Office: String} Tutor{} ChairmanO Figure 1. A class schema After setting up the data models according to the concepts and definitions above, we will examine the properties of memory resident indexes. In our data model, there is two purposes of an index. 1- They are used for accessing records so the search efficiency is increased. 2- As a value location mapping the index can support judicious placement or clustering of records on the mass store. An index is only used for accessing path, that is not a factor in record placement, is called as secondary index. To provide index entries to be approximately clustered, thefentries must be stored in an index file, and an accession list is usually introduced as cross referencing the address of records that has the same index value. When evaluating partial match query, it may be needed additional search conditions in order to supply more than one property. If index. If an index include more than one property, it is called combined index. We will both use single and combined indexes depend on our search requirements. Indexes will be determined depend on search requirements. For our data model, determined requirements summarized below: 1 - find each Person given {PName} 2 - find each Student given {PName} 3 - find each Teacher given {PName} 4 - find each Person given {Dept} 5 - find each Student given {Dept} 6 - find each Graduate given {Dept} 7 - find each Tutor given {Dept} 8 - find each Teacher given {Dept} 9 - find each Professor given {Dept} 10 - find each Graduate given {Dept, Supervisor, PName} 11 - find each Tutor given {Dept, Supervisor, PName} 12- find each Enrollment given {C.CName} 13 - find each Enrollment given {S. PName, C.CName} 14 - find each Course given {TaughtBy} We have a semantic data model and search requirements. Now we must determine the indexes. There is some iterative methods for choosing indexes. We can give the two known example to this. One of them proposed by Farley and Schuster and the other Hammer and Chan. But these methods are feasible only when there is a small finite set of possible indexes for a relation. So this hasn't been true for our data model. Because our search space of indexes is enlarged three ways. First, we allow combined index structure, second we allow path functions as search conditions. And third we use a new search condition called subclass sort. In our example, first we search an initial choice of indexes and we are doing by using a procedure called Findlndices. And later we are going to refine this procedure in order to index maintenance. The application of Findlndices procedure is as followed. First we set up a directed graph G(V,A) by using the partial match queries as nodes. But each node, called Q, must satisfy the conditions below: 1 - C1(Q2) => Cl(Ql) 2 - Cond(Ql) kapsama Cond(Q2) At the second step we remove redundant vertices from the graph. But this step has no effect on our example. So we go to the third 3tep. We create a bipartite graph G(V, A) consisting of two copies the graph G. If an arc doesn't VI exist between Qn and Qn we remove the Ql -> Q2 from the A. By doing this we find the maximum matching in G. At this step a very known maximum matching algorithm of FordFulkerson is used. At the last step indexes are selected using bipartite graph. There is some rules for selecting indexes. After these rules applied, our initial selection of indexes is as followed: * index on Person sorted by PName, Student * index on Teacher sorted by PName * index on Person sorted by Dept, Tutor, Professor * index on Student sorted by Dept, Graduate, Supervisor, PName, Tutor * index on Graduate sorted by Dept, Tutor * index on Enrollment sorted by C.CName, S. PName * index on Course sorted by TaughtBy But in the case of functional dependencies are included in the class schema, our initial index selection may not be the best set supporting search requirements. First we explain the path functional dependency by giving an example before improve the index selection. If each person is declared to have a unique name and each teacher supervise at most one Graduate student in the class schema, these conditions are called path functional dependencies and characterized as below: Person (PName -> Id) Graduate (Supervisor -> Id) Now the second step of the algorithm can be modified by using path functional dependency. Findlndices gains effect by means of this. Depend on the path functional dependency, some arcs can be eliminate from the graph. When we use improved second step for Findlndices, the result will change and number of indexes will be able to decrease. But they support the all search requirements in the best way. An additional data modelling about this subject is replaced after the main text. The data model includes, an electronic data processing department organization schema of a company. The model is also planned according to the rules and concepts of the main text. The organization schema is in figure S.2. : VI 1 The search requirements: Person given {EName} Project Employee given {EName} Project Chief given {EName} Department Manager.{EName} Service Chief given {EName} Person given {Dept} Project Employee given {Dept} Project Chief given {Dept} Service Chief given {Dept} Department Manager {Dept} Project Employee given {Dept, PName} Project Chief given {Dept, PName} Project Employee given {PName} Project Chief given {PName} Project given {Project Chief} Assignment {Project employee, Project} Project {PName: String, Director: Project Chief} IntegerO StringO Employee {EName : String, Department : String } / Project Employee Project Chief \ Department Manager EDP Coordinator I Service Chief General Manager Figure 2. An organization schema of an EDP Department The initial index set are as followed: 1 - Index on employee sorted by EName, Project 2 - Index on Service Chief sorted by EName 3 - Index on Department Manager sorted by EName 4 - Index on Project Chief sorted by EName 5 - Index on employee sorted by department, Project Employee, Project Chief 6 - Index on Project Employee sorted by department, Department Manager, Servis Chief, Project Name viii But now we define two path functional dependencies. First each person has a unique name. And second, each person can work at most one project. Employee (EName -> Id) Employee (PName -> Id) After these PFDs applied result index set will be as followed: 1 - Index on Person sorted by EName 2 - Index on Person sorted by department, project employee, project chief 3 - Index on Project Employee sorted by department, Project Name, Project Chief We examine the index selection on a data model and all indexes are memory resident. One advantage of using memory resident index is having an enlarged search space of possibi-lities. The problem is separated into two part. First, we have determined the initial index set that best support the given set of partial match queries (i.e. search requirements). In the second part, we have refined the initial choice of index set. This program is very useful for all application developers. As a result, this method is useful for application developers.
Açıklama
Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 1993
Anahtar kelimeler
Mühendislik Bilimleri, Bellek, Sistem analizi, Veri modelleri, İndeks seçimi, Engineering Sciences, Memory, System analysis, Data models, Index selection
Alıntı