數(shù)據(jù)挖掘課設報告.doc
XIAN TECHNOLOGICAL UNIVERSITY課程設計報告課程名稱 數(shù)據(jù)挖掘 專 業(yè): 信息管理與信息系統(tǒng) 班 級: 130513 姓 名: 賈丹丹 學 號: 130513117 指導教師: 李剛 成 績: 2016 年 1 月 3 日前言數(shù)據(jù)挖掘就是從大量的數(shù)據(jù)中挖掘出有用的信息。它是根據(jù)人們的特定要求,從浩如煙海的數(shù)據(jù)中找出所需的信息來,供人們的特定需求使用。據(jù)國外專家預測,隨著數(shù)據(jù)量的日益積累和計算機的廣泛應用,在今后的510年內(nèi),數(shù)據(jù)挖掘將在中國形成一個新型的產(chǎn)業(yè)。數(shù)據(jù)挖掘,在人工智能領域,習慣上又稱為數(shù)據(jù)庫中的知識發(fā)現(xiàn)(Knowledge Discovery in Database, KDD), 也有人把數(shù)據(jù)挖掘視為數(shù)據(jù)庫中知識發(fā)現(xiàn)過程的一個基本步驟。知識發(fā)現(xiàn)過程由以下三個階段組成:(1)數(shù)據(jù)準備(2)數(shù)據(jù)挖掘(3)結果表達和解釋。數(shù)據(jù)挖掘可以與用戶或知識庫交互。數(shù)據(jù)挖掘是通過分析每個數(shù)據(jù),從大量數(shù)據(jù)中尋找其規(guī)律的技術,主要有數(shù)據(jù)準備、規(guī)律尋找和規(guī)律表示3個步驟。數(shù)據(jù)準備是從相關的數(shù)據(jù)源中選取所需的數(shù)據(jù)并整合成用于數(shù)據(jù)挖掘的數(shù)據(jù)集;規(guī)律尋找是用某種方法將數(shù)據(jù)集所含的規(guī)律找出來;規(guī)律表示是盡可能以用戶可理解的方式(如可視化)將找出的規(guī)律表示出來。數(shù)據(jù)挖掘中的分類反映同類事物共同性質(zhì)的特征型知識和不同事物之間的差異型特征知識。最為典型的分類方法是基于決策樹的分類方法。它是從實例集中構造決策樹,是一種有指導的學習方法。該方法先根據(jù)訓練子集(又稱為窗口)形成決策樹。如果該樹不能對所有對象給出正確的分類,那么選擇一些例外加入到窗口中,重復該過程一直到形成正確的決策集。最終結果是一棵樹,其葉結點是類名,中間結點是帶有分枝的屬性,該分枝對應該屬性的某一可能值。目錄1 業(yè)務理解12 數(shù)據(jù)理解12.1英文版數(shù)據(jù)說明12.2數(shù)據(jù)的讀入22.3瀏覽數(shù)據(jù)內(nèi)容22.4指定各個變量的作用32.5觀察各變量的數(shù)據(jù)分布特征43 數(shù)據(jù)準備43.1對數(shù)據(jù)進行重新分類43.2對數(shù)據(jù)進行平衡處理64 建立決策樹模型64.1 C5.0,CART,CHAID算法介紹74.2模型建立84.3模型計算結果144.4模型結果分析175 模型評估186 總結20附錄1:zoo.date21附錄2:zoo.names241 業(yè)務理解動物園動物數(shù)量大,種類多,對動物園的動物根據(jù)它們的特征進行分類,以便于觀察和分析動物的特征,進而更加合理的管理動物以及為未來查找動物信息提供參考。2 數(shù)據(jù)理解該數(shù)據(jù)集是從UCI網(wǎng)站上獲得的一份關于動物園的動物的數(shù)據(jù)。該數(shù)據(jù)是收集的動物園中99種動物的特征,包括hair,feathers ,eggs,milk,airborne,aquatic ,Predator,toothed,backbone,breathes,venomous,fins,legs,tail,domestic,catsize ?,F(xiàn)需利用數(shù)據(jù)挖掘將這些動物進行分類,分成7種類型。2.1英文版數(shù)據(jù)說明Source:Creator:Richard ForsythDonor:Richard S. Forsyth8 Grosvenor AvenueMapperley ParkNottingham NG3 5DX0602-621676Data Set Information:A simple database containing 17 Boolean-valued attributes. The "type" attribute appears to be the class attribute. Here is a breakdown of which animals are in which type: (I find it unusual that there are 2 instances of "frog" and one of "girl"!)Class# - Set of animals:1 - (41) aardvark, antelope, bear, boar, buffalo, calf, cavy, cheetah, deer, dolphin, elephant, fruitbat, giraffe, girl, goat, gorilla, hamster, hare, leopard, lion, lynx, mink, mole, mongoose, opossum, oryx, platypus, polecat, pony, porpoise, puma, pussycat, raccoon, reindeer, seal, sealion, squirrel, vampire, vole, wallaby,wolf2 - (20) chicken, crow, dove, duck, flamingo, gull, hawk, kiwi, lark, ostrich, parakeet, penguin, pheasant, rhea, skimmer, skua, sparrow, swan, vulture, wren3 - (5) pitviper, seasnake, slowworm, tortoise, tuatara4 - (13) bass, carp, catfish, chub, dogfish, haddock, herring, pike, piranha, seahorse, sole, stingray, tuna5 - (4) frog, frog, newt, toad6 - (8) flea, gnat, honeybee, housefly, ladybird, moth, termite, wasp7 - (10) clam, crab, crayfish, lobster, octopus, scorpion, seawasp, slug, starfish, wormAttribute Information:1. animal name: Unique for each instance2. hair: Boolean3. feathers: Boolean4. eggs: Boolean5. milk: Boolean6. airborne: Boolean7. aquatic: Boolean8. predator: Boolean9. toothed: Boolean10. backbone: Boolean11. breathes: Boolean12. venomous: Boolean13. fins: Boolean14. legs: Numeric (set of values: 0,2,4,5,6,8)15. tail: Boolean16. domestic: Boolean17. catsize: Boolean18. type: Numeric (integer values in range 1,7)Relevant Papers:Forsyths PC/BEAGLE Users Guide.2.2數(shù)據(jù)的讀入 將數(shù)據(jù)讀入Modeler中。在源選項卡中選擇可變文件節(jié)點并設置節(jié)點參數(shù)。在文件選項卡中指定從文件zoo.txt中讀入數(shù)據(jù)。2.3瀏覽數(shù)據(jù)內(nèi)容 在輸出選項卡中選擇表節(jié)點,添加到數(shù)據(jù)流中。執(zhí)行該節(jié)點生成數(shù)據(jù)表。在瀏覽數(shù)據(jù)時發(fā)現(xiàn)兩個錯誤項,數(shù)據(jù)中有兩個frog和一個gril,則刪除一個frog和gril?!颈怼抗?jié)點的輸出結果2.4指定各個變量的作用 其中animal name,hair,feathers ,eggs,milk,airborne,aquatic ,Predator,toothed,backbone,breathes,venomous,fins,legs,tail,domestic,catsize為模型的輸入變量, type為模型的目標變量。在字段選項選項卡中選擇【類型】節(jié)點,添加到數(shù)據(jù)流中,設置參數(shù)指定變量角色。 【類型】節(jié)點的參數(shù)2.5觀察各變量的數(shù)據(jù)分布特征 在輸出選項卡中選擇數(shù)據(jù)審核節(jié)點,添加到數(shù)據(jù)流中。執(zhí)行節(jié)點生成數(shù)據(jù)表?!緮?shù)據(jù)審核】節(jié)點的輸出結果可以看出,該份數(shù)據(jù)有99個樣本,除animal name以外均為數(shù)值型變量,除animal name、legs、type以外均是布爾值。Modeler對此計算,輸出最小值、最大值、均值、標準差、偏態(tài)系數(shù)等基本描述統(tǒng)計量。數(shù)據(jù)顯示,legs最大值與最小值差距較大。從數(shù)值型變量的柱形圖可以看出屬于type1的數(shù)量最多。數(shù)據(jù)質(zhì)量理想。3 數(shù)據(jù)準備3.1對數(shù)據(jù)進行重新分類 針對該數(shù)據(jù),hair,feathers ,eggs,milk,airborne,aquatic ,Predator,toothed,backbone,breathes,venomous,fins,tail,domestic,catsize屬性為是否有hair,feathers ,eggs,milk,airborne,aquatic ,Predator,toothed,backbone,breathes,venomous,fins,tail,domestic,catsize,所以取值0和1不規(guī)范,應將取值0和1調(diào)整為No和Yes。 【重新分類】的【設置】選項卡 在輸出選項卡中選擇【表】節(jié)點,連接到【重新分類】節(jié)點,執(zhí)行【表】節(jié)點生成重新分類后的數(shù)據(jù)表,如下: 【表】節(jié)點的輸出結果3.2對數(shù)據(jù)進行平衡處理 觀察數(shù)據(jù)發(fā)現(xiàn),屬于type1的數(shù)據(jù)較多,屬于其他type的數(shù)據(jù)相對較少,所以進行樣本平衡處理。 【平衡】的【設置】選項卡 在輸出選項卡中選擇【表】節(jié)點,連接到【平衡】節(jié)點,執(zhí)行【表】節(jié)點生成平衡處理后的數(shù)據(jù)表,如下圖:【表】節(jié)點的輸出結果4 建立決策樹模型 使用C5.0,CART,CHAID三種算法建立模型:4.1 C5.0,CART,CHAID算法介紹 (1)C5.0:C5.0是決策樹模型中的算法,79年由J R Quinlan發(fā)展,并提出了ID3算法,主要針對離散型屬性數(shù)據(jù),其后又不斷的改進,形成C4.5,它在ID3基礎上增加了隊連續(xù)屬性的離散化。C5.0是C4.5應用于大數(shù)據(jù)集上的分類算法,主要在執(zhí)行效率和內(nèi)存使用方面進行了改進。C5.0是經(jīng)典的決策樹模型算法之一,可生成多分支的決策樹,目標變量為分類變量,使用C5.0算法可以生成決策樹或者規(guī)則集。C5.0模型根據(jù)能偶帶來的最大信息增益的字段拆分樣本。第一次拆分確定的樣本子集隨后再次拆分,通常是根據(jù)另一個字段進行拆分,這一過程重復進行指導樣本子集不能在被拆分為止。最后,重新緝拿眼最低層次的拆分,哪些對模型值沒有顯著貢獻的樣本子集被提出或者修剪。優(yōu)點:C5.0模型在面對數(shù)據(jù)遺漏和輸入字段很多的問題時非常穩(wěn)健; C5.0模型比一些其他類型的模型易于理解,模型退出的規(guī)則有非常直觀的解釋; C5.0也提供強大技術以提高分類的精度。C5.0算法選擇分支變量的依據(jù):以信息熵的下降速度作為確定最佳分支變量和分割閥值的依據(jù)。 (2)CART:CART(Classification And Regression Tree)算法采用一種二分遞歸分割的技術,將當前的樣本集分為兩個子樣本集,使得生成的的每個非葉子節(jié)點都有兩個分支。因此,CART算法生成的決策樹是結構簡潔的二叉樹。CART算法檢查每個變量和該變量所有可能的劃分值來發(fā)現(xiàn)最好的劃分,對離散值如x,y,x,則在該屬性上的劃分有三種情(x,y,z,x,z,y,y,z,x),空集和全集的劃分除外;對于連續(xù)值處理引進“分裂點”的思想,假設樣本集中某個屬性共n個連續(xù)值,則有n-1個分裂點,每個“分裂點”為相鄰兩個連續(xù)值的均值 (ai + ai+1) / 2。將每個屬性的所有劃分按照他們能減少的雜質(zhì)(合成物中的異質(zhì),不同成分)量來進行排序。CART算法經(jīng)常采用事后剪枝方法:該方法是通過在完全生長的樹上剪去分枝實現(xiàn)的,通過刪除節(jié)點的分支來剪去樹節(jié)點。最下面未被剪枝的節(jié)點成為樹葉。 (3)CHAID:CHAID(Chi-SquareAutomaticInteractionDetection)提供了一種在多個自變量中自動搜索能產(chǎn)生最大差異的變量方案。CHAID分析可以生成非二進制樹,即有些分割有兩個以上的分支。CHAID模型需要一個單一的目標和一個或多個輸入字段。還可以指定重量和頻率領域。CHAID分析,卡方自動交互檢測,是一種用卡方統(tǒng)計,以確定最佳的分割,建立決策樹的分類方法。CHAID算法以因變量為根結點,對每個自變量(只能是分類或有序變量,也就是離散性的,如果是連續(xù)變量,如年齡,收入要定義成分類或有序變量)進行分類,計算分類的卡方值(Chi-Square-Test)。如果幾個變量的分類均顯著,則比較這些分類的顯著程度(P值的大?。?,然后選擇最顯著的分類法作為子節(jié)點。CHIAD可以自動歸并自變量中類別,使之顯著性達到最大。最后的每個葉結點就是一個細分市場。4.2模型建立 (1)在【建模】選項卡中選擇【C5.0】、【C&R樹R】、【CHAID(C)】節(jié)點,添加到數(shù)據(jù)流中。設置各算法的主要參數(shù)。 【C5.0】的【模型】選項卡【C5.0】的【分析】選項卡 【C&R樹】的【構建選項】選項卡(一) 【C&R樹】的【構建選項】選項卡(二) 【C&R樹】的【構建選項】選項卡(三) 【C&R樹】的【構建選項】選項卡(四) 【C&R樹】的【構建選項】選項卡(六) 【CHAID】的【構建選項】選項卡(一) 【CHAID】的【構建選項】選項卡(二) 【CHAID】的【構建選項】選項卡(三) 【CHAID】的【構建選項】選項卡(四) 【CHAID】的【構建選項】選項卡(五) (2)建立的數(shù)據(jù)流如圖所示: 動物分類的數(shù)據(jù)流 4.3模型計算結果 C5.0算法分析結果的文字形式如下圖:C5.0算法分析結果的圖形形式如下圖:CART算法分析結果的文字形式如下圖:CART算法分析結果的圖形形式如下圖:CHAID算法分析結果的文字形式如下圖:CHAID算法分析結果的圖形形式如下圖:4.4模型結果分析 (1)C5.0算法模型結果分析 該模型找出了10個影響因素:feathers,tail,backbone,milk,fins,legs,predator,airborne其中feathers是最重要的屬性,其中l(wèi)egs,predator,fins是不重要的屬性。因此,對一個動物進行歸類時,首先看它是否有feathers。 當feathers為有時,則直接屬于type2,不用考慮其他因素,如果沒有feathers,再看它是否有backbone,如果有backbone,再看它是否有milk,如果有milk,則屬于type1,如果沒有milk,再看它是否有fins,如果有fins,則直接屬于type4,如果沒有fins,再看它是否有tail,如果有tail,則屬于type3,如果沒有tail,則直接屬于type5,如果沒有backbone,再看它是否有airborne,如果有airborne,則直接屬于type6,如果沒有airborne,再看它是否有predator,如果有predator,則屬于type7,如果沒有predator,再看它的legs是否為0,如果它的legs為0,則屬于type7,如果它的legs為2,4,5,6,8,則屬于type,6。 (2)CART算法模型結果分析 該模型找出了3個影響因素:feathers,legs,airborne,其中feathers是最重要的屬性,與feathers比較,其他屬性遠遠不如feathers重要。 當feathers為有時,則直接屬于type2,不用考慮其他因素,如果無feathers,則直接屬于type1。 (3)CHAID算法模型結果分析 該模型找出了10個影響因素:legs,hair,aquatic,fins,toothed,其中l(wèi)egs最重要,其中fins和toothed是最不重要的屬性。當當腿的數(shù)量等于0時,再看它是否有hair,如果有,則直接屬于type1,不用考慮其他因素,如果沒有hair,則看它收否有toothed,如果沒有則直接屬于type7,如果有toothed,再看它是否有fins,如果沒有,則屬于type3,如果有,則屬于type4。當腿的數(shù)量等于2時,再看它是否有hair,如果沒有,則直接屬于type2,不用考慮其他因素,如果有hair,則直接屬于type1。當腿的數(shù)量等于4時,再看它是否有hair,如果有,則直接屬于type1,不用考慮其他因素,如果沒有hair,則看它收否有aquatic,如果沒有則直接屬于type3,如果有aquatic,再看它是否有toothed,如果沒有,則屬于type7,如果有,則屬于type5。當腿的數(shù)量等于5或者8時,則直接屬于type7,不再考慮其他因素。當腿的數(shù)量等于6時,再看它是否有aquatic,如果沒有,則直接屬于type6,如果有aquatic,則直接屬于type7。5 模型評估 在節(jié)點工具箱的【輸出】選項卡中選擇【分析】節(jié)點,與模型結果節(jié)點相連。執(zhí)行分析節(jié)點,得到分析結果。 C5.0的分析結果如下圖: CART的分析結果如下圖: CHAID的分析結果如下圖: 可以看出,C5.0和CHAID算法建立的模型正確預測精度分別達到了98.75%和100%,模型比較理想。CART算法建立的模型正確預測精度為51.25%,模型不理想。6 總結用數(shù)據(jù)挖掘技術對審計數(shù)據(jù)加以分析,總結出一些正常模式,用來進行異常檢測,將有助于提高入侵檢測系統(tǒng)的檢測準確性和完備性。在本課設中用到了決策樹分類分析方法,使用了決策樹算法中的C5.0、CART、CHAID三種算法,結果各不相同,預測的準確性也不同,由此可見每種數(shù)據(jù)挖掘的方法都有其側重點,對于現(xiàn)實的數(shù)據(jù)挖掘處理,不大可能使用單一的數(shù)據(jù)挖掘方法就能得到滿意的結果,而要綜合應用多種方法取各種方法之長補其之短,對數(shù)據(jù)進行挖掘才能得到滿意的結果。通過這次的課程設計,使我對數(shù)據(jù)挖掘技術有了一個整體的認識。同樣在建立模型的時候也遇到了這樣或那樣的問題。但在自己認真的思考和查找資料,艱難的完成了這次課設。這讓我對數(shù)據(jù)挖掘技術以后的深入學習打下了良好的基礎。附錄1:zoo.dateaardvark,1,0,0,1,0,0,1,1,1,1,0,0,4,0,0,1,1antelope,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1bass,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4bear,1,0,0,1,0,0,1,1,1,1,0,0,4,0,0,1,1boar,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1buffalo,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1calf,1,0,0,1,0,0,0,1,1,1,0,0,4,1,1,1,1carp,0,0,1,0,0,1,0,1,1,0,0,1,0,1,1,0,4catfish,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4cavy,1,0,0,1,0,0,0,1,1,1,0,0,4,0,1,0,1cheetah,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1chicken,0,1,1,0,1,0,0,0,1,1,0,0,2,1,1,0,2chub,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4clam,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,7crab,0,0,1,0,0,1,1,0,0,0,0,0,4,0,0,0,7crayfish,0,0,1,0,0,1,1,0,0,0,0,0,6,0,0,0,7crow,0,1,1,0,1,0,1,0,1,1,0,0,2,1,0,0,2deer,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1dogfish,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,1,4dolphin,0,0,0,1,0,1,1,1,1,1,0,1,0,1,0,1,1dove,0,1,1,0,1,0,0,0,1,1,0,0,2,1,1,0,2duck,0,1,1,0,1,1,0,0,1,1,0,0,2,1,0,0,2elephant,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1flamingo,0,1,1,0,1,0,0,0,1,1,0,0,2,1,0,1,2flea,0,0,1,0,0,0,0,0,0,1,0,0,6,0,0,0,6frog,0,0,1,0,0,1,1,1,1,1,0,0,4,0,0,0,5frog,0,0,1,0,0,1,1,1,1,1,1,0,4,0,0,0,5fruitbat,1,0,0,1,1,0,0,1,1,1,0,0,2,1,0,0,1giraffe,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1girl,1,0,0,1,0,0,1,1,1,1,0,0,2,0,1,1,1gnat,0,0,1,0,1,0,0,0,0,1,0,0,6,0,0,0,6goat,1,0,0,1,0,0,0,1,1,1,0,0,4,1,1,1,1gorilla,1,0,0,1,0,0,0,1,1,1,0,0,2,0,0,1,1gull,0,1,1,0,1,1,1,0,1,1,0,0,2,1,0,0,2haddock,0,0,1,0,0,1,0,1,1,0,0,1,0,1,0,0,4hamster,1,0,0,1,0,0,0,1,1,1,0,0,4,1,1,0,1hare,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,0,1hawk,0,1,1,0,1,0,1,0,1,1,0,0,2,1,0,0,2herring,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4honeybee,1,0,1,0,1,0,0,0,0,1,1,0,6,0,1,0,6housefly,1,0,1,0,1,0,0,0,0,1,0,0,6,0,0,0,6kiwi,0,1,1,0,0,0,1,0,1,1,0,0,2,1,0,0,2ladybird,0,0,1,0,1,0,1,0,0,1,0,0,6,0,0,0,6lark,0,1,1,0,1,0,0,0,1,1,0,0,2,1,0,0,2leopard,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1lion,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1lobster,0,0,1,0,0,1,1,0,0,0,0,0,6,0,0,0,7lynx,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1mink,1,0,0,1,0,1,1,1,1,1,0,0,4,1,0,1,1mole,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,0,1mongoose,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1moth,1,0,1,0,1,0,0,0,0,1,0,0,6,0,0,0,6newt,0,0,1,0,0,1,1,1,1,1,0,0,4,1,0,0,5octopus,0,0,1,0,0,1,1,0,0,0,0,0,8,0,0,1,7opossum,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,0,1oryx,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1ostrich,0,1,1,0,0,0,0,0,1,1,0,0,2,1,0,1,2parakeet,0,1,1,0,1,0,0,0,1,1,0,0,2,1,1,0,2penguin,0,1,1,0,0,1,1,0,1,1,0,0,2,1,0,1,2pheasant,0,1,1,0,1,0,0,0,1,1,0,0,2,1,0,0,2pike,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,1,4piranha,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4pitviper,0,0,1,0,0,0,1,1,1,1,1,0,0,1,0,0,3platypus,1,0,1,1,0,1,1,0,1,1,0,0,4,1,0,1,1polecat,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1pony,1,0,0,1,0,0,0,1,1,1,0,0,4,1,1,1,1porpoise,0,0,0,1,0,1,1,1,1,1,0,1,0,1,0,1,1puma,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1pussycat,1,0,0,1,0,0,1,1,1,1,0,0,4,1,1,1,1raccoon,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1reindeer,1,0,0,1,0,0,0,1,1,1,0,0,4,1,1,1,1rhea,0,1,1,0,0,0,1,0,1,1,0,0,2,1,0,1,2scorpion,0,0,0,0,0,0,1,0,0,1,1,0,8,1,0,0,7seahorse,0,0,1,0,0,1,0,1,1,0,0,1,0,1,0,0,4seal,1,0,0,1,0,1,1,1,1,1,0,1,0,0,0,1,1sealion,1,0,0,1,0,1,1,1,1,1,0,1,2,1,0,1,1seasnake,0,0,0,0,0,1,1,1,1,0,1,0,0,1,0,0,3seawasp,0,0,1,0,0,1,1,0,0,0,1,0,0,0,0,0,7skimmer,0,1,1,0,1,1,1,0,1,1,0,0,2,1,0,0,2skua,0,1,1,0,1,1,1,0,1,1,0,0,2,1,0,0,2slowworm,0,0,1,0,0,0,1,1,1,1,0,0,0,1,0,0,3slug,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,7sole,0,0,1,0,0,1,0,1,1,0,0,1,0,1,0,0,4sparrow,0,1,1,0,1,0,0,0,1,1,0,0,2,1,0,0,2squirrel,1,0,0,1,0,0,0,1,1,1,0,0,2,1,0,0,1starfish,0,0,1,0,0,1,1,0,0,0,0,0,5,0,0,0,7stingray,0,0,1,0,0,1,1,1,1,0,1,1,0,1,0,1,4swan,0,1,1,0,1,1,0,0,1,1,0,0,2,1,0,1,2termite,0,0,1,0,0,0,0,0,0,1,0,0,6,0,0,0,6toad,0,0,1,0,0,1,0,1,1,1,0,0,4,0,0,0,5tortoise,0,0,1,0,0,0,0,0,1,1,0,0,4,1,0,1,3tuatara,0,0,1,0,0,0,1,1,1,1,0,0,4,1,0,0,3tuna,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,1,4vampire,1,0,0,1,1,0,0,1,1,1,0,0,2,1,0,0,1vole,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,0,1vulture,0,1,1,0,1,0,1,0,1,1,0,0,2,1,0,1,2wallaby,1,0,0,1,0,0,0,1,1,1,0,0,2,1,0,1,1wasp,1,0,1,0,1,0,0,0,0,1,1,0,6,0,0,0,6wolf,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1worm,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,7wren,0,1,1,0,1,0,0,0,1,1,0,0,2,1,0,0,2附錄2:zoo.names1. Title: Zoo database2. Source Information - Creator: Richard Forsyth - Donor: Richard S. Forsyth 8 Grosvenor Avenue Mapperley Park Nottingham NG3 5DX 0602-621676 - Date: 5/15/19903. Past Usage: - None known other than what is shown in Forsyths PC/BEAGLE Users Guide.4. Relevant Information: - A simple database containing 17 Boolean-valued attributes. The "type" attribute appears to be the class attribute. Here is a breakdown of which animals are in which type: (I find it unusual that there are 2 instances of "frog" and one of "girl"!) Class# Set of animals: 1 (41) aardvark, antelope, bear, boar, buffalo, calf, cavy, cheetah, deer, dolphin, elephant, fruitbat, giraffe, girl, goat, gorilla, hamster, hare, leopard, lion, lynx, mink, mole, mongoose, opossum, oryx, platypus, polecat, pony, porpoise, puma, pussycat, raccoon, reindeer, seal, sealion, squirrel, vampire, vole, wallaby,wolf 2 (20) chicken, crow, dove, duck, flamingo, gull, hawk, kiwi, lark, ostrich, parakeet, penguin, pheasant, rhea, skimmer, skua, sparrow, swan, vulture, wren 3 (5) pitviper, seasnake, slowworm, tortoise, tuatara 4 (13) bass, carp, catfish, chub, dogfish, haddock, herring, pike, piranha, seahorse, sole, stingray, tuna 5 (4) frog, frog, newt, toad 6 (8) flea, gnat, honeybee, housefly, ladybird, moth, termite, wasp 7 (10) clam, crab, crayfish, lobster, octopus, scorpion, seawasp, slug, starfish, worm5. Number of Instances: 1016. Number of Attributes: 18 (animal name, 15 Boolean attributes, 2 numerics)7. Attribute Information: (name of attribute and type of value domain) 1. animal name: Unique for each instance 2. hairBoolean 3. feathersBoolean 4. eggsBoolean 5. milkBoolean 6. airborneBoolean 7. aquaticBoolean 8. predatorBoolean 9. toothedBoolean 10. backboneBoolean 11. breathesBoolean 12. venomousBoolean 13. finsBoolean 14. legsNumeric (set of values: 0,2,4,5,6,8) 15. tailBoolean 16. domesticBoolean 17. catsizeBoolean 18. typeNumeric (integer values in range 1,7)8. Missing Attribute Values: None9. Class Distribution: Given above