Please use this identifier to cite or link to this item: http://hdl.handle.net/11527/417
Title: Bilişsel Robotlar İçin Öğrenme Güdümlü Sembolik Planlama
Other Titles: Learning Guided Symbolic Planning For Cognitive Robots
Authors: Talay, Sanem Sarıel
Yıldız, Petek
10007032
Bilgisayar Mühendisliği
Computer Engineering
Keywords: Yapay zeka
görev planlama
planlamada güvenlik
bilişsel robotlar
robotik
Artificial intelligence
task planning
safety in planning
cognitive robots
robotics
Issue Date: 19-Jul-2013
Publisher: Fen Bilimleri Enstitüsü
Institute of Science and Technology
Abstract: Bilişsel robotlar, verilen hedeflere ulaşırken gerekli olan eylemler dizisini üretmek için planlama yaparlar. Bu eylemler gerçek dünyada yürütme zamanında çeşitli sebeplerden dolayı başarısız olabilmektedir. Böyle durumlarda eylem hatalarına bağlı olarak yürütülen plan da başarısız olmaktadır. İstenmeyen durumların minimuma indirgendiği gürbüz bir görev yürütme için; sürekli planlama, yürütme, gözlemleme, çıkarsama ve öğrenme süreçlerinin birlikte ele alınması gerekir. Bu çalışmada yürütme deneyimlerinden yararlanılarak görev yürütmede gürbüzlüğün nasıl sağlanılacağı araştırılmıştır. Önerilen yaklaşım öğrenme güdümlü bir planlama sistemine dayanmaktadır. Sistem sırasıyla hedef duruma ulaşmak için gerekli planın oluşturulması olan üst seviye planlama ile başlar. Bu çalışmada ileri zincir yöntemini kullanan TLPlan zamansal planlayıcısı kullanılmıştır. TLPlan planlama tanım kümesini, planlama problemini ve önceki planlama deneyimlerini kullanarak gürbüz görev yürütme için gerekli olan geçerli planı üretir. Robot, tanımlı eylem modellerini kullanarak plandaki her eylemi sırasıyla yürütür. Gözlemleme, yürütülen eylemlerin beklenen sonuçları üretip üretmediğinin kontrolüyle sağlanır. Ortamda beklenmeyen etkiler oluşmuşsa veya beklenen etkilerin tümü oluşmamışsa, yürütülen eylemin başarısız olduğu varsayılır. Yürütme esnasında bir eylemin başarısız olduğu belirlendikten sonra çıkarsama ve öğrenme süreçleri başlar. Burada başarısız olan eyleme ilişkin çıkarımlar yapılarak eylemin parametreleri, eylemin etkilediği nesneler ve eylemin kendisi ele alınarak hipotezler üretilir. Her eylem yürütme işleminden sonra öğrenme süreci uygulanarak hem başarılı hem de başarısız durumlar için Tümevarımsal Mantıksal Programlama (Inductive Logic Programming) yöntemi kullanılarak hipotezler üretilir. Üretilen hipotezlerin yeniden planlamada kullanılmasıyla robotun sonraki görevlerinde daha başarılı planlar üretmesi için deneyimlerinden faydalanması sağlanır. Başarılı durumlar için yürütme modeli saklanırken başarısız durumlar için yeniden planlama yapılarak görev yürütme süreci tekrarlanır. Bu tez çalışmasında yeni görevlerde önceki deneyimlerden yararlanan güvenli bir planlama sistemi önerilmektedir. Sistem başarısız hipotezlere göre planlayıcıyı yönlendirir. Deneyimlerin planlama aşamasında sisteme dahil edilebilmesi için hipotez bilgileri planlama tanım kümesine üç farklı yöntemle kodlanmıştır: eylem önkoşulu, arama kontrol kuralı ve eylem maliyeti güncellemesi. İlk iki yöntem, önceki yürütmelerde başarısız olan eylem-parametre çiftlerini göz önüne alarak eylemin sonraki görevlerde seçilmesini engellemektedir. Bu yöntemler kullanılarak güncellenen planlama tanım kümesi, yeniden planlamada varsa alternatif eylemlerin seçilmesini sağlar. Diğer yöntem ise başarısız olan eylemin maliyetinin arttırılarak seçilme ihtimalinin azaltılmasına dayanır. Bu yöntem, başarısız olan eylemin seçilmesini kesin olarak engellememekle birlikte A* Arama yöntemi ile en düşük maliyetli planı üretir. Üç yöntem değişik planlama problemleri üzerinde uygulanmış ve yöntem başarımları karşılaştırılmıştır. Elde edilen sonuçlara göre eylem önkoşulu ve maliyet güncellemesi yöntemlerinden oluşan melez bir yaklaşım uygulanmasına karar verilmiştir. Bu sayede bilişsel bir robotun önceki eylem yürütme hatalarını kullanarak yeni görevlerinde başarısız olan eylemlerin seçimini en aza indiren güvenli planlama yapması mümkün kılınmıştır. Pioneer 3DX gezgin robot ile yapılan deney sonuçları sistemin güvenilir ve gürbüz olduğunu göstermektedir.
Cognitive robots plan to construct a sequence of actions which is required to achieve their objectives. These actions may face several types of failures while executing in the physical world. For minimizing undesired situations in plan execution; continual planning, execution, monitoring, reasoning and learning processes should be integrated within a framework. In this work, we investigate how robustness can be ensured by learning from experimentation and propose a learning-based guidance system for safe planning. Learning is from experimentation and failures that are experienced in the physical world. The robot gains its experience from action execution failures through lifelong experimental learning. Inductive Logic Programming (ILP) is used as the learning method to frame hypotheses for failure situations. It provides first-order logic representation of the robot s experience. The robot uses this experience to construct heuristics to guide its planner in future decisions. The performance of the learning guided planning process is analyzed on our Pioneer 3-DX robot. Automated planning forms the basis for cognitive abilities of a robot. Planning is a search procedure for constructing a valid plan for achieving objectives. Complex planning domains with high number of actions or objects may be intractable and it may be hard to find a solution. Efficient search on such domains is possible with the use of domain control knowledge to reduce search complexity. Control knowledge can be specified in terms of search control rules, macro-operators, plan operators, cases and policies. There exist methods that learn stochastic models of domain operators and control rules for planning. The main objective of this study is developing methods to use experience on the future tasks of the robot. That is, the second problem asks for using learning as a tool to guide the future selections of the robot. We focus on developing a learning method to map from action execution contexts to failure cases in this work. This is needed to either handle or prevent from failures for robustness. Since the robot can observe its execution and environment during its whole lifetime, an incremental and continual approach is needed. In this work, we propose a safe planning system which learns from execution experiments. The system guides the planner according to the failure hypotheses. The results of the learning process is then used to guide the future decisions on planning for robust execution. Our approach differs from earlier work in the way learning outcomes are applied. In our approach, action execution experience gained in the real world is used to provide feedback to the robot to improve its performance on future tasks. Context situations (the actions, the objects in interest and their relations) are considered for derivation of hypotheses which are expressed in first-order logical sentences. Derived hypotheses are then used to devise heuristics for planning. The planner is guided in search by using these heuristics extracted from learned hypotheses by updating the planning domain. Three different methods are used to encode the heuristics in the planning domain. After investigation of these alternative guidance methods, we propose a hybrid heuristic guidance method for failure prevention. We show that without a model-based failure isolation, robustness can be ensured by experience-based learning and learning-based guidance to planning to present alternative solutions to the failed cases. The system starts with a high-level planning process that is generation of a plan to attain the given goals. We use TLPlan, a forward-chaining temporal planner for task planning. This planner uses search control formulas that are expressed in terms of linear temporal logic sentences. These rules enable the planner to reduce its search space by pruning unpromising branches that lead to dead-ends and suboptimal plans in search tree. TLPlan generates a valid plan which is required for robust task execution by using planning domain, planning problem and previous planning experiences. Robot executes each action in the plan by using action models previously defined. Monitoring is performed by controlling whether expected outcomes of the actions exist or not. If an undesired effect occurs or all desired effects do not occur in the environment, an action is said to be a failed action. After an action is defined as failed, reasoning and learning processes start. Inference about failed actions are applied and new hypotheses are generated by considering parameters of the action, the objects affected by the action and the action itself. Learning process is applied after each action execution and new hypotheses are generated by using Inductive Logic Programming (ILP) method for both successful and unsuccessful situations. ILP helps robots build their experience through observing different states of execution during lifetime. Framed hypotheses involve either known or observed features of objects, their relations and the observable features of the world state as the context of a failure. It is obtained that the robot executes future tasks more successfully by using generated hypotheses in replanning. While keeping execution model for successful situations, whole task execution process is repeated by replanning for unsuccessful situations. Lifelong learning procedure continually frames new hypotheses during execution. These hypotheses are to be used to improve the performance of the robot on its future tasks. We show that a hybrid heuristic method used for updating the planning domain ensures both robustness and completeness in future tasks. In order to benefit from experiences in the planning stage of the system, this knowledge is needed to be encoded into the planning domain. We analyze three ways of encoding the hypotheses: (i) deriving new control formulas (ii) updating the models of operators corresponding to the failed actions (iii) setting an adaptive cost computation method for the operators. In the first approach, the selection of a failed operator is completely abandoned to prevent its selection on specific contexts defined by the hypotheses. In the second approach, the preconditions of the failed operators are updated to prevent their selection in specific branches during search by considering the failure cases of corresponding action-parameter pairs. Updated domain by using these two methods, provides selection of an alternative action if exists. In the third approach, the cost values of failed operators are updated to set preference models. This helps to reduce the probability of the failed action to be selected in similar cases. This method does not precisely prevent the selection of the action. It generates the optimal plan by using A* search. Three methods are applied on different planning problems in which the robot manipulates several objects in the environment. The performances of these methods are analyzed. Search control formulas extracted from hypotheses for failure cases, represent reject rules in plan search. The planner only expands the nodes that meet the control rules in the search tree. Precondition update prevents the selection of the actions specified in the hypotheses for failure cases. The cost update method increases the cost of the action by a factor to prevent its selection in future plans. First two methods fail in finding a valid plan when there is no alternative action for a failed action. Based on the results obtained, a hybrid method consisting of both precondition and cost update methods is developed. The performance of the learning guided planning process is analyzed on our Pioneer 3DX mobile robot. The results reveal that the hypotheses framed for failure cases are sound and ensure safety and robustness in future tasks of the robot. Our approach for robust task execution includes an experience-based learning method, ILP, to learn from action execution failures. The learning process is used to frame hypotheses for relating different contexts to failure situations. Whenever the robot gets new observations, it can incrementally revise its hypothesis space. In the derived hypotheses, the observable attributes of and the relations among the objects and the relevant facts of the world are specified. The results of the learning process is then used to guide the future decisions on planning for robust execution. We show that a hybrid heuristic method used for updating the planning domain ensures both robustness and completeness in future tasks.
Description: Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2013
Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2013
URI: http://hdl.handle.net/11527/417
Appears in Collections:Bilgisayar Mühendisliği Lisansüstü Programı - Yüksek Lisans

Files in This Item:
File Description SizeFormat 
13831.pdf5.45 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.