歡迎來(lái)到裝配圖網(wǎng)! | 幫助中心 裝配圖網(wǎng)zhuangpeitu.com!
裝配圖網(wǎng)
ImageVerifierCode 換一換
首頁(yè) 裝配圖網(wǎng) > 資源分類(lèi) > DOC文檔下載  

畢業(yè)設(shè)計(jì)論文 外文文獻(xiàn)翻譯 中英文對(duì)照 計(jì)算機(jī)科學(xué)與技術(shù) 預(yù)處理和挖掘Web日志數(shù)據(jù)網(wǎng)站個(gè)性化

  • 資源ID:36204098       資源大?。?span id="ksfh1jl" class="font-tahoma">56.50KB        全文頁(yè)數(shù):9頁(yè)
  • 資源格式: DOC        下載積分:15積分
快捷下載 游客一鍵下載
會(huì)員登錄下載
微信登錄下載
三方登錄下載: 微信開(kāi)放平臺(tái)登錄 支付寶登錄   QQ登錄   微博登錄  
二維碼
微信掃一掃登錄
下載資源需要15積分
郵箱/手機(jī):
溫馨提示:
用戶(hù)名和密碼都是您填寫(xiě)的郵箱或者手機(jī)號(hào),方便查詢(xún)和重復(fù)下載(系統(tǒng)自動(dòng)生成)
支付方式: 支付寶    微信支付   
驗(yàn)證碼:   換一換

 
賬號(hào):
密碼:
驗(yàn)證碼:   換一換
  忘記密碼?
    
友情提示
2、PDF文件下載后,可能會(huì)被瀏覽器默認(rèn)打開(kāi),此種情況可以點(diǎn)擊瀏覽器菜單,保存網(wǎng)頁(yè)到桌面,就可以正常下載了。
3、本站不支持迅雷下載,請(qǐng)使用電腦自帶的IE瀏覽器,或者360瀏覽器、谷歌瀏覽器下載即可。
4、本站資源下載后的文檔和圖紙-無(wú)水印,預(yù)覽文檔經(jīng)過(guò)壓縮,下載后原文更清晰。
5、試題試卷類(lèi)文檔,如果標(biāo)題沒(méi)有明確說(shuō)明有答案則都視為沒(méi)有答案,請(qǐng)知曉。

畢業(yè)設(shè)計(jì)論文 外文文獻(xiàn)翻譯 中英文對(duì)照 計(jì)算機(jī)科學(xué)與技術(shù) 預(yù)處理和挖掘Web日志數(shù)據(jù)網(wǎng)站個(gè)性化

南京理工大學(xué)泰州科技學(xué)院畢業(yè)設(shè)計(jì)(論文)外文資料翻譯系 部: 計(jì)算機(jī)科學(xué)與技術(shù) 專(zhuān) 業(yè): 計(jì)算機(jī)科學(xué)與技術(shù) 姓 名: 學(xué) 號(hào): 外文出處: Dipartimento di Informatica, Universita di Pisa 附 件: 1.外文資料翻譯譯文;2.外文原文。指導(dǎo)教師評(píng)語(yǔ): 簽名: 年 月 日注:請(qǐng)將該封面與附件裝訂成冊(cè)。附件1:外文資料翻譯譯文預(yù)處理和挖掘Web日志數(shù)據(jù)網(wǎng)站個(gè)性化摘要:我們描述了Web使用挖掘活動(dòng)的一個(gè)持續(xù)項(xiàng)目要求,我們叫它ClickWorld3,旨在提取導(dǎo)航行為的一個(gè)網(wǎng)站的用戶(hù)的模型。該模型的推斷在訪問(wèn)日志的網(wǎng)絡(luò)服務(wù)器通過(guò)數(shù)據(jù)和Web挖掘技術(shù)的功能。提取的知識(shí)是部署的個(gè)性化和主動(dòng)提供網(wǎng)絡(luò)服務(wù)給用戶(hù)。第一,我們描述預(yù)處理步驟訪問(wèn)日志必要的步驟,選擇并準(zhǔn)備數(shù)據(jù),知識(shí)提取。然后,我們表現(xiàn)出兩套實(shí)驗(yàn):第一,一個(gè)嘗試性預(yù)測(cè)的用戶(hù)基礎(chǔ)上訪問(wèn)的網(wǎng)頁(yè);第二,試圖預(yù)測(cè)是否用戶(hù)可能有興趣參觀的一部分網(wǎng)頁(yè)。關(guān)鍵詞:知識(shí)發(fā)現(xiàn),Web挖掘,分類(lèi)。1、導(dǎo)言Web挖掘是利用數(shù)據(jù)挖掘技術(shù)在自動(dòng)化發(fā)現(xiàn)和提取信息從網(wǎng)絡(luò)的文件和服務(wù)。一個(gè)常見(jiàn)的分類(lèi)Web挖掘的三個(gè)主要的研究項(xiàng)目明確的規(guī)定:內(nèi)容分鐘法,結(jié)構(gòu)挖掘和使用挖掘。區(qū)分這些類(lèi)別沒(méi)有一個(gè)明確的界限,而是將經(jīng)常使用的方法相結(jié)合區(qū)分出不同的類(lèi)別。內(nèi)容涵蓋數(shù)據(jù)挖掘技術(shù)提取模型,網(wǎng)絡(luò)對(duì)象的內(nèi)容,包括純文字,半結(jié)構(gòu)化文件(例如,HTML或XML語(yǔ)言),結(jié)構(gòu)化文件(數(shù)字圖書(shū)館),動(dòng)態(tài)的文件,多媒體文件。提取模型被用于分類(lèi)的網(wǎng)頁(yè)對(duì)象,提取關(guān)鍵字用于信息檢索,推斷結(jié)構(gòu)的半結(jié)構(gòu)化或非結(jié)構(gòu)化的對(duì)象。結(jié)構(gòu)挖掘旨在發(fā)掘基本的拓?fù)浣Y(jié)構(gòu)的互連,籌措之間的網(wǎng)絡(luò)對(duì)象。該模型建立可用于分類(lèi)和排名的網(wǎng)站,并發(fā)現(xiàn)了它們之間的相似性。使用挖掘是應(yīng)用數(shù)據(jù)挖掘技術(shù)發(fā)現(xiàn)使用從網(wǎng)絡(luò)模式的數(shù)據(jù)。數(shù)據(jù)通常是收集用戶(hù)的互動(dòng)關(guān)系在網(wǎng)上,例如網(wǎng)站/代理服務(wù)器日志,用戶(hù)查詢(xún),登記數(shù)據(jù)。使用挖掘工具發(fā)現(xiàn)和預(yù)測(cè)用戶(hù)行為,以幫助設(shè)計(jì)師為改善網(wǎng)站,來(lái)吸引游客,或給普通用戶(hù)的個(gè)性化和適應(yīng)性的服務(wù)。在本文中,我們描述了Web使用挖掘活動(dòng)的一個(gè)持續(xù)項(xiàng)目要求ClickWorld ,旨在提取模型,以用戶(hù)的行為為目的的個(gè)性化網(wǎng)站。我們從中期全國(guó)性大型門(mén)戶(hù)網(wǎng)站vivacity.it收集和預(yù)處理訪問(wèn)日志,花費(fèi)的時(shí)間為5個(gè)月。該網(wǎng)站包括了民族地區(qū)如網(wǎng)址為:www.vivacity.it的新聞,論壇,笑話等,以及30多個(gè)地方,例如,www.roma.vivacity.it與城市專(zhuān)用信息,如本地新聞,餐廳地址,戲劇節(jié)目,巴士的時(shí)間表,ECC等。預(yù)處理步驟包括數(shù)據(jù)選擇,清洗和轉(zhuǎn)化和通過(guò)驗(yàn)證的用戶(hù)和用戶(hù)會(huì)話。結(jié)果預(yù)處理,方法是一個(gè)數(shù)據(jù)集市的網(wǎng)絡(luò)訪問(wèn)和注冊(cè)信息。從預(yù)處理的數(shù)據(jù),Web挖掘的目的是發(fā)現(xiàn)模式調(diào)整方法從統(tǒng)計(jì)數(shù)據(jù),數(shù)據(jù)挖掘,機(jī)器學(xué)習(xí)和模式識(shí)別。其中基本數(shù)據(jù)挖掘技術(shù),我們提到的關(guān)聯(lián)規(guī)則,發(fā)現(xiàn)集團(tuán)的物體,常常要求用戶(hù)一起;集群,集團(tuán)用戶(hù)提供類(lèi)似的瀏覽方式,或集團(tuán)類(lèi)似的物體內(nèi)容或訪問(wèn)的模式;分類(lèi),而有利于的用戶(hù)被分到某一類(lèi)或類(lèi)別;和序列模式,即序列請(qǐng)求這是常見(jiàn)的許多用戶(hù)。在ClickWorld項(xiàng)目,有幾個(gè)上述方法,目前被用來(lái)提取有用的信息主動(dòng)提供個(gè)性化網(wǎng)頁(yè)網(wǎng)站。在本文中,我們描述了兩套分類(lèi)實(shí)驗(yàn)。第一個(gè),一項(xiàng)旨在提取一分類(lèi)模型能夠性別歧視的用戶(hù)根據(jù)設(shè)置的網(wǎng)頁(yè)訪問(wèn)。第二次試驗(yàn)的目的是提取一分類(lèi)模型能夠歧視這些用戶(hù)訪問(wèn)的網(wǎng)頁(yè)有關(guān)例如:提供給典型的實(shí)驗(yàn)。2、預(yù)處理的Web個(gè)性化我們已經(jīng)制定了一個(gè)數(shù)據(jù)集市的網(wǎng)頁(yè)記錄特殊的支持網(wǎng)絡(luò)個(gè)人化分析。該數(shù)據(jù)集市是人口從一個(gè)網(wǎng)絡(luò)日志數(shù)據(jù)倉(cāng)庫(kù)房子,如中所描述的,或更簡(jiǎn)單地說(shuō),從原材料網(wǎng)絡(luò)/代理服務(wù)器日志種來(lái)。在這一節(jié)中,我們描述了一些預(yù)處理和編碼步驟進(jìn)行數(shù)據(jù)的選擇,理解,清洗和轉(zhuǎn)化。雖然其中一些是一般數(shù)據(jù)準(zhǔn)備步驟,Web使用挖掘,值得注意的是,在許多人的一種領(lǐng)域知識(shí)必須一定要包括以清潔,正確和完整的輸入數(shù)據(jù)根據(jù)網(wǎng)頁(yè)的個(gè)性化需求。2.1用戶(hù)注冊(cè)數(shù)據(jù)除了網(wǎng)頁(yè)訪問(wèn)日志,我們考慮輸入包括個(gè)人資料的一個(gè)子集的用戶(hù),即那些誰(shuí)注冊(cè)的vivacity.it網(wǎng)站,備注:注冊(cè)法不是強(qiáng)制性的。對(duì)于注冊(cè)用戶(hù),該系統(tǒng)記錄了以下資料:性別,城市,省,婚姻狀況,出生日期。此信息是提供由用戶(hù)在一個(gè)網(wǎng)頁(yè)表單在登記時(shí),作為一個(gè)可預(yù)計(jì),數(shù)據(jù)的標(biāo)準(zhǔn)是對(duì)用戶(hù)公平。作為預(yù)處理步驟,難以置信的數(shù)據(jù)檢測(cè)并刪除,如出生數(shù)據(jù)在未來(lái)或在遙遠(yuǎn)的過(guò)去。此外,一些額外的投入沒(méi)有進(jìn)口的數(shù)據(jù)信息,因?yàn)閹缀跛械闹捣謩e為左為默認(rèn)選擇的網(wǎng)頁(yè)表單。換言之,領(lǐng)域被認(rèn)為是不利于區(qū)分用戶(hù)的選擇和喜好。為了避免用戶(hù)位數(shù)的登錄名和密碼在每個(gè)訪問(wèn)vivacity.it網(wǎng)站采用的Cookie重復(fù)。如果一個(gè)Cookie是由用戶(hù)的瀏覽器,然后認(rèn)證并不是必需的。否則,身份驗(yàn)證后,一個(gè)新的Cookie 發(fā)送到用戶(hù)的瀏覽器。隨著這一機(jī)制,可以跟蹤任何用戶(hù)只要她刪除的Cookie的體系。此外,如果用戶(hù)注冊(cè),該協(xié)會(huì)登錄cookie是可以在輸入數(shù)據(jù),然后可以跟蹤用戶(hù)后,還原她刪除的cookie.這種機(jī)制使檢測(cè)非人類(lèi)的用戶(hù),如系統(tǒng)診斷診斷和監(jiān)測(cè)方案。通過(guò)檢查的數(shù)量分配給cookie每個(gè)用戶(hù),我們發(fā)現(xiàn),用戶(hù)登錄test009被派到以上24.000獨(dú)特的Cookie。這不僅是可能的,如果用戶(hù)是一些程序,自動(dòng)刪除指定的cookie,例如:系統(tǒng)診斷程序。2.2網(wǎng)站的網(wǎng)址一方面,有一些標(biāo)準(zhǔn)化的網(wǎng)頁(yè)必須形成的統(tǒng)一的網(wǎng)址,以消除不相關(guān)的句法的差異。例如,主機(jī)可以在IP格式或自身格式,如131.114.2.91是相同的主機(jī)作為kdd.di.unipi.it。另一方面,也有一些網(wǎng)絡(luò)服務(wù)器程序采用非標(biāo)準(zhǔn)格式的參數(shù)傳遞。網(wǎng)站的vivacity.it 服務(wù)器程序是其中之一。例如,在以下網(wǎng)址:http:/roma.vivacity.it/speciali/EditColonnaSpeciale/1,3478,|DX,00.html文件的名字1,3478,|DX,載有00碼的地方網(wǎng)站,網(wǎng)頁(yè)識(shí)別碼(3478)及其專(zhuān)用的參數(shù)(DX型)。上述的形式設(shè)計(jì)了效率的機(jī)器進(jìn)程。作為一個(gè)例子,網(wǎng)頁(yè)標(biāo)識(shí)是一個(gè)關(guān)鍵的數(shù)據(jù)庫(kù)表的網(wǎng)頁(yè)模板發(fā)現(xiàn),雖然參數(shù)可以檢索的網(wǎng)頁(yè)內(nèi)容在一些其他就座。不幸的是,這是一場(chǎng)噩夢(mèng)時(shí),挖掘點(diǎn)擊的網(wǎng)址。句法功能的網(wǎng)址是很少的幫助:我們需要一些語(yǔ)義信息,或本論文指定的網(wǎng)址。在最好的,我們可以預(yù)期,一個(gè)應(yīng)用程序級(jí)別的日志是,即日志的訪問(wèn)語(yǔ)義相關(guān)的對(duì)象。例如,應(yīng)用程序級(jí)日志是記錄用戶(hù)進(jìn)入網(wǎng)站主頁(yè),然后參觀了體育與新聞頁(yè)面上足球代表隊(duì),等等。這將需要一個(gè)系統(tǒng)模塊監(jiān)測(cè)用戶(hù)的步驟在語(yǔ)義水平的力度。在這個(gè)ClickWorld項(xiàng)目中這樣一個(gè)模塊被稱(chēng)為ClickObserve。不幸地,然而,該模塊是一個(gè)可交付的項(xiàng)目,它不適用于在收集數(shù)據(jù)的開(kāi)始該項(xiàng)目。因此,我們決定提取兩個(gè)句法和語(yǔ)義信息從網(wǎng)址通過(guò)一個(gè)半自動(dòng)的辦法。該辦法包括通過(guò)在逆向工程的網(wǎng)址,從網(wǎng)站設(shè)計(jì)者說(shuō)明這意味著每一個(gè)URL路徑,網(wǎng)頁(yè)id和網(wǎng)頁(yè)的參數(shù)。使用PERL腳本,從設(shè)計(jì)師的描述,我們從原來(lái)的提取網(wǎng)址以下信息:本地網(wǎng)絡(luò)服務(wù)器,即vivacity.it或roma.vivacity.it等,這些親志愿給我們一些空間信息的用戶(hù)的利益;第一級(jí)分類(lèi)的網(wǎng)址有24種,其中一些是:家庭,新聞,財(cái)政,照片,笑話,購(gòu)物。論壇,酒吧;第二個(gè)級(jí)別的網(wǎng)址取決于第一級(jí)之一,例如:網(wǎng)址分類(lèi)版購(gòu)物可進(jìn)一步分類(lèi)版的圖書(shū)購(gòu)物或PC購(gòu)物等;第三級(jí)分類(lèi)的網(wǎng)址取決于第二級(jí)之一,例如網(wǎng)址分類(lèi)版的圖書(shū)購(gòu)物可進(jìn)一步分類(lèi)版編程該書(shū)敘事購(gòu)物或購(gòu)物和書(shū)籍等;參數(shù)信息,還詳細(xì)介紹了三個(gè)層次分類(lèi),如網(wǎng)址分類(lèi)版的編程書(shū)籍購(gòu)物可能的ISBN書(shū)碼作為參數(shù)的深度分類(lèi),即一日的網(wǎng)址,如果只有一個(gè)第一級(jí)別分類(lèi),如果網(wǎng)址的第一和第二級(jí)分類(lèi),等等。當(dāng)然,采取的辦法主要是其中的一個(gè)啟發(fā)式,隨著本次設(shè)計(jì)的層次上升。此外,本次設(shè)計(jì)不利用任何基于內(nèi)容的分類(lèi),即說(shuō)明新聞分類(lèi),如體育新聞的編號(hào)為12345的代碼,即第一級(jí)是新聞,并沒(méi)有提及的新聞內(nèi)容。附件2:外文原文Preprocessing and Mining Web Log Data forWeb PersonalizationM. Baglioni1, U. Ferrara2, A. Romei1, S. Ruggieri1, and F. Turini11 Dipartimento di Informatica, Universita di Pisa,Via F. Buonarroti 2, 56125 Pisa Italyfbaglioni,romei,ruggieri,turinigdi.unipi.it2 KSolutions S.p.A.Via Lenin 132/26, 56017 S. Martino Ulmiano (PI) Italyferraraksolutions.itAbstract. We describe the web usage mining activities of an on-going project, called ClickWorld3, that aims at extracting models of the navigational behaviour of a web site users. The models are inferred from the access logs of a web server by means of data and web mining techniques. The extracted knowledge is deployed to the purpose of offering a personalized and proactive view of the web services to users. We first describe the preprocessing steps on access logs necessary to clean, select and prepare data for knowledge extraction. Then we show two sets of experiments: the first one tries to predict the sex of a user based on the visited web pages, and the second one tries to predict whether a user might be interested in visiting a section of the site.Keywords: knowledge discovery, web mining, classification.1 IntroductionAccording to 10, Web Mining is the use of data mining techniques to auto-matically discover and extract information from web documents and services. A common taxonomy of web mining defines three main research lines: content mining, structure mining and usage mining. The distinction between those categories is not a clear cut, and very often approaches use combination of techniques from different categories.Content mining covers data mining techniques to extract models from web object contents including plain text, semi-structured documents (e.g., HTML orXML), structured documents (digital libraries), dynamic documents, multimedia documents. The extracted models are used to classify web objects, to extractkeywords for use in information retrieval, to infer structure of semi-structured or unstructured objects.Structure Mining aims at finding the underlying topology of the interconnections between web objects. The model built can be used to categorize and to rank web sites, and also to find out similarity between them.2 M. Baglioni et al.Usage mining is the application of data mining techniques to discover usage patterns from web data. Data is usually collected from users interaction with the web, e.g. web/proxy server logs, user queries, registration data. Usage mining tools 3,4,9,15 discover and predict user behavior, in order to help the designer to improve the web site, to attract visitors, or to give regular users a personalized and adaptive service. In this paper, we describe the web usage mining activities of an on-going project, called ClickWorld, that aims at extracting models of the navigational behavior of users for the purpose of web site personalization 6. We have collected and preprocessed access logs from a medium-large national web portal,vivacity.it, over a period of five months. The portal includes a national area (www.vivacity.it) with news, forums, jokes, etc., and more than 30 local areas (e.g., www.roma.vivacity.it) with city-specific information, such as local news, restaurant addresses, theatre programming, bus timetable, ecc.The preprocessing steps include data selection, cleaning and transformation and the identification of users and of user sessions 2. The result of preprocessing is a data mart of web accesses and registration information. Starting from preprocessed data, web mining aims at pattern discovery by adapting methods from statistics, data mining, machine learning and pattern recognition. Among the basic data mining techniques 7, we mention association rules, discovering groups of objects that are frequently requested together by users; clustering, grouping users with similar browsing patterns, or grouping objects with similarcontent or access patterns; classification, where a profile is built for users belonging to a given class or category; and sequential patterns, namely sequences of requests which are common for many users.In the ClickWorld project, several of the mentioned methods are currently being used to extract useful information for proactive personalization of web sites. In this paper, we describe two sets of classification experiments. The first one aims at extracting a classification model able to discriminate the sex of a user based on the set of web pages visited. The second experiment aims at extracting a classification model able to discriminate those users that visit pages regarding e.g. sport or finance from those that typically do not.2 Preprocessing for Web PersonalizationWe have developed a data mart of web logs specifically to support web personalization analysis. The data mart is populated starting from a web log data warehouse (such as those described in 8,16) or, more simply, from raw web/proxy server log files. In this section, we describe a number of preprocessing and coding steps performed for data selection, comprehension, cleaning and transformation.While some of them are general data preparation steps for web usage mining2,16, it is worth noting that in many of them a form of domain knowledge must necessarily be included in order to clean, correct and complete the input data according to the web personalization requirements.2.1 User registration dataIn addition to web access logs, our given input includes personal data on a subset of users, namely those who are registered to the vivacity.it website (registration is not mandatory). For a registered user, the system records the following information: sex, city, province, civil status, born date. This information is provided by the user in a web form at the time of registration and, as one could expect, the quality of data is up to the user fairness. As preprocessing steps, improbable data are detected and removed, such as born data in the future or in the remote past. Also, some additional input fields were not imported in the data mart since almost all values were left as the default choice in the web form. In other words, the fields were considered not to be useful in discriminating user choices and preferences.In order to avoid users to digit their login and password at each visit, the vivacity.it web site adopts cookies. If a cookie is provided by the user browser, then authentication is not required. Otherwise, after authentication, a new cookie is sent to the user browser. With this mechanism, it is possible to track any user as long as she deletes the cookies on her system. In addition, if the user is registered, the association login-cookie is available in the input data, and then it is possible to track the user also after she deletes the cookies. This mechanism allows for detecting non-human users, such as system diagnosis and monitoring programs. By checking the number of cookies assigned to each user, we discovered that the user login test009 was assigned more than 24.000 distinct cookies. This is possible only if the user is some program that automatically deletes assigned cookies, e.g. a system diagnosis program.2.2 Web URLResources in the World Wide Web are uniformly identified by means of URLs(Uniform Resource Locators). The syntax of an http URL is: http:/ host.domain :port abs path ? querywhere host.domain:port is the name of the server site. The TCP/IP port is optional (the default port is 80), abs path is the absolute path of the requested resource in the server filesystem. We further consider abs path of the form path / filename .extension, i.e. consisting of the filesystem path, filename and file extension. query is an optional collection of parameters, to be passed as an input to a resource that is actually an executable program, e.g. a CGI script.On the one side, there are a number of normalizations that must be performed on URLs, in order to remove irrelevant syntactic differences (e.g., thehost can be in IP format or host format 131.114.2.91 is the same host as kdd.di.unipi.it). On the other side, there are some web server programs that adopt non-standard formats for passing parameters. The vivacity.it web server program is one of them. For instance, in the following URL:http:/roma.vivacity.it/speciali/EditColonnaSpeciale/1,3478,|DX,00.html the file name 1,3478,|DX,00 contains a code for the local web site (1 stands for roma.vivacity.it), a web page id (3478) and its specific parameters (DX). The form above has been designed for excient machine processing. For instance, the web page id is a key for a database table where the page template is found, while the parameters allow for retrieving the web page content in some other table. Unfortunately, this is a nightmare when mining clickstream of URLs.Syntactic features of URLs are of little help: we need some semantic information,or ontology 5,13, assigned to URLs. At the best, we can expect that an application-level log is available, i.e. a log of accesses to semantic-relevant objects. An example of application-level log is one recording that the user entered the site from the home page, then visited a sport page with news on a soccer team, and so on. This would require a system module monitoring user steps at a semantic level of granularity. In the ClickWorld project such a module is called Click Observe. Unfortunately , however, the module is a deliverable of the project, and it was not available for collecting data at the beginning of the project. Therefore, we decided to extract both syntactic and semantic information from URLs via a semi-automatic approach. The adopted approach consists in reverse-engineering URLs, starting from the web site designer description of the meaning of each URL path, web page id and web page parameters. Using a PERL script, starting from the designer description we extracted from original URLs the following information: local web server (i.e., vivacity.it or roma.vivacity.it etc.), which provides us with some spatial information about user interests; a first-level classification of URLs into 24 types, some of which are: home , news, finance, photo galleries, jokes, shopping, forum, pubs; a second-level classification of URLs depending on the first-level one, e.g.URLs classified as shopping may be further classified as book shopping or pcshopping and so on; a third-level classification of URLs depending on the second-level one, e.g.URLs classified as book shopping may be further classified as programming book shopping or narrative book shopping and so on; a parameter information, further detailing the three level classification, e.g.URLs classified as programming book shopping may have the ISBN book code as parameter; the depth of the classification, i.e. 1 if the URL has only a first-level classification, 2 if the URL has first and second-level classification, and so on .Of course, the adopted approach was mainly an heuristics one, with the hierarchical ontology designed at posteriori. Also, the designed ontology does not exploit any content-based classification, i.e. the description of an elementary object such as sport news with id 12345 is its code (i.e., first-level is news, second level is sport, parameter information 12345), with no reference to the content of the news (was the news reporting about any specific player?).

注意事項(xiàng)

本文(畢業(yè)設(shè)計(jì)論文 外文文獻(xiàn)翻譯 中英文對(duì)照 計(jì)算機(jī)科學(xué)與技術(shù) 預(yù)處理和挖掘Web日志數(shù)據(jù)網(wǎng)站個(gè)性化)為本站會(huì)員(1777****777)主動(dòng)上傳,裝配圖網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)上載內(nèi)容本身不做任何修改或編輯。 若此文所含內(nèi)容侵犯了您的版權(quán)或隱私,請(qǐng)立即通知裝配圖網(wǎng)(點(diǎn)擊聯(lián)系客服),我們立即給予刪除!

溫馨提示:如果因?yàn)榫W(wǎng)速或其他原因下載失敗請(qǐng)重新下載,重復(fù)下載不扣分。




關(guān)于我們 - 網(wǎng)站聲明 - 網(wǎng)站地圖 - 資源地圖 - 友情鏈接 - 網(wǎng)站客服 - 聯(lián)系我們

copyright@ 2023-2025  zhuangpeitu.com 裝配圖網(wǎng)版權(quán)所有   聯(lián)系電話:18123376007

備案號(hào):ICP2024067431號(hào)-1 川公網(wǎng)安備51140202000466號(hào)


本站為文檔C2C交易模式,即用戶(hù)上傳的文檔直接被用戶(hù)下載,本站只是中間服務(wù)平臺(tái),本站所有文檔下載所得的收益歸上傳人(含作者)所有。裝配圖網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)上載內(nèi)容本身不做任何修改或編輯。若文檔所含內(nèi)容侵犯了您的版權(quán)或隱私,請(qǐng)立即通知裝配圖網(wǎng),我們立即給予刪除!