Veri tabanı yönetim sistemleri

thumbnail.default.alt
Tarih
1990
Yazarlar
İclal, Yıldırım
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Yayınevi
Fen Bilimleri Enstitüsü
Institute of Science and Technology
Özet
Veri tabam, büyük veri topluluklarının yönetilmesi ve düzenlenmesi için kullanılan, sistematik bir yaklaşımdır. Bir veri tabanı yönetim sisteminde yalnız veriler değil, veriler arasındaki bir takım ilişkilerde saklanır. Veri tabanı yönetim sisteminin önemli işlevleri, veri tabanı içindeki herhangi bir bilginin yerinin bulunması ve on na erişilmesi, veri kayıtlarının istenilen sırada sıralanması ve indeks yaratmaktır. Bu çalışmada veri kayıtlarının girişi, aranması ve si linmesi için lineer hash adresleme metodu kullanılmıştır. Lineer hash adresleme metodu çok yeni bir metoddur ve henüz birçok sistemde kullanılmamıştır. Dosyaya ne kadar kayıt eklenirse eklensin, yükleme fak törü sabit tutularak bellek kullanımı ve kayıtlara tek tek e- rişme açısından etkin bir metod olduğu gözlenmiştir. Bu metod- la Anahtar alanların birbirlerine bağlanması bağlı liste ya pısı kullanılarak gerçekleştirilmiştir. Lineer hash metodu anlatılırken B+ ağaçları ve diğer metod arla mukayese edilmiştir. Veri alanlarının sıralanmasında ise quick-sort yöntemi kullanılmıştır.
This work a program that helps you collect and manage infor mation or data What is a database: A database is organized collection of related information or data. You probably encounter several databases every day. You take care of your personal business with database such as your cheek book, adress book, and phone book. Each of databasses organizes data for easy storape and retrieval or access. A simple axample of database is a business olient card file. Each client's name, address, and phone number an a separate card. Each record all data for person. Before you can put data into a dotabase file, you must define its structure. This includes the names of fields in a record, the number of characters in a field, and the type of information (alpha numeric, nlimeric, Topical so on) allowed in each field. In a data base, all data for a farticulary entry is called a record, each re cord contains all data for one person. Each item of information within a record is called a field. The name or title of a field called field name, the type of entry allowed for each field called field type and the maximum number of characters in a field is called field with. VIII In a database manapemant system very impartant subjects are sorting, searching, indeksing, deleting. We shall study the mest popüler Indekxing methods in current use: The B- tree and Lineer hashing. Almost no other indexing schemes except B- trees and hashing are used today on large files, even for large files, the B trees is a two-disk-occess method. That is, f=s+r+btt+s+r+d... Fetch time (Tf) assumes that the internal no des of the B+-trees are 2400-byte nodes so we use b..(block trans fer time) and d. t is data transfer t- andr, read time to sector of a disk. fer time) and d. t is data transfer time, s seek time to track This is much more officient than the pile file, where we had to read through half the file on average. It is more efficient than the sorted fie, where we had to use binary search. Since single-re cord fetches are so efficient and since range queries can also be answered with B -trees, this is the most popular indejing method in use today. The reason B -trees are a two-disk-access method is that the fanout, fo, for internal nodes is high. If we have mem blocks in memory all but the parent-of-leaf level and the leaves fit in memory when mem> ~ (fo)2 With a fan-out of 140 and 10 megabytes, or 4167 blocks of 2400 bytes each, in memory, if bk is less than about 80 million buckets, IX we have a two-disk-access method. None of teh files we use as examp les exceed this limit. Some of our examples will be small enough for a one-disk-access method, but to make analyses ismpler, we shall al ways assume two disk accesses. Since the records are stored in order within the leaves, pri mary B -trees can be used to list the records in order. This time is Tx and the number of data buket is bk. T w(primary)= bkx(s+r+dtt) This is like reading the file randomly by bucket since the leaves ara not in order on the disk. Hashing with Buckets and Chaining: In summary, if the growth of a file can be predicted, hashing with buckets and chains is an excellent access method. It is fast for fetching, inserting, and deleting, but not for sequential opera tions. Hashing cannot be used for range queries. Usually, rifht after a hash table has been created, fetching tekes about one disk access. This is the method of choice for many files and many database nanagement systems. When most of the opera tions on the file are individual-record fetches, hashing is the op timal file structure. When there is a mixture of sequential operations, range searc hes, and individual -record fetches, a B -tree may perform better than a hash table. It 's a good idea to make a rough estimate of the time needed for a given mij of operations to decide whether to use hashing or B -trees or some combination. Linear Hashing: Several new methods have been proposed to avoid reorganizati on of hashed files. One such method is linear hashing (Li twin 1981). Linear hashing maintains a constant load fastor, even when many new records are added to the file. It does so by incrementally adding new buckets to the primary area. Thus as new records are added, new space for the records in, the primary area is also added. The load factor, which is the ratio between the number of records and the number of spaces for records in the primary area, remains cons tant. Maintaining a constant load factor as the file grows is a revolutionary idea. Fetching a Record In linear hashing, teh last binary bits in the hash number are used for placing the record. That is, we first calculate a lar ge number from the key, just as before. Then we place the record in a bucket according to the last k binary digits in the hash number. For example, we could begin with eight buckets and use the last th ree bts in teh hash number that was calculated. If the number en ded in 000, it would be placed in the first bucket. If it ended in 001, the second; in 010, the third; and so on. When we exp and the table, we split an existing bucket which holds all the records which end in a particular k digits into two buckets using the k+1 last digits of the hash number. This way, for example, the bucket with records whose hash number ended in 010 would be split into one bucket with records whose hash number ends in 0010 and another bucket with records whose hash number ends in 1010 After such splits, some of thi buckets hold records where k digits are used and some where k+1 digits are used. XI The splits are handled following an algorithm which we shall out line in treating insertion. For fetching, we need only know that some of the buckets use k digits, and some use k+1 di gits, nd we can tell the difference by checking whether or not the last k digits are smaller than a given volue, the boundary value. The boundary value and the current value of k are kept in teh file header. This method is very close in time to sorting and merging. using one disk drive. With the sorting, we created sorted segments and then merged them, using at least two reads and writes of the whole file. With sorting, the large number of disk accesses comes during reading in for merging. It helps to sort by hash value or else to partition by hash value, in order to construct a hsh table from ejisting large files. If we try to construct the table by successive insertions, it will take too much time. In traditional hashing, reorganizing occurs often. So it - is very i-portan to have an efficient way to create new hash tables For linear hashing, we have to create the table only once. Still, we can save a considerable amount of time by sorting by the bits expected to be used in the table, instead of doing succes sive insertions. We gave an algirithm for deciding how many bits to use for creating a linear-hashing table.
Açıklama
Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 1990
Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 1990
Anahtar kelimeler
Sistem analizi, Veri tabanı yönetim sistemi, System analysis, Database management system
Alıntı