畢業(yè)設(shè)計(jì)論文外文文獻(xiàn)翻譯中英文對(duì)照計(jì)算機(jī)科學(xué)與技術(shù) 預(yù)處理和挖掘Web日志數(shù)據(jù)網(wǎng)站個(gè)性化

上傳人：1777****777 文檔編號(hào)：36204098 上傳時(shí)間：2021-10-29 格式：DOC 頁(yè)數(shù)：9 大小：56.50KB

收藏版權(quán)申訴舉報(bào) 下載

畢業(yè)設(shè)計(jì)論文外文文獻(xiàn)翻譯中英文對(duì)照計(jì)算機(jī)科學(xué)與技術(shù) 預(yù)處理和挖掘Web日志數(shù)據(jù)網(wǎng)站個(gè)性化_第1頁(yè)

第1頁(yè) / 共9頁(yè)

畢業(yè)設(shè)計(jì)論文外文文獻(xiàn)翻譯中英文對(duì)照計(jì)算機(jī)科學(xué)與技術(shù) 預(yù)處理和挖掘Web日志數(shù)據(jù)網(wǎng)站個(gè)性化_第2頁(yè)

第2頁(yè) / 共9頁(yè)

畢業(yè)設(shè)計(jì)論文外文文獻(xiàn)翻譯中英文對(duì)照計(jì)算機(jī)科學(xué)與技術(shù) 預(yù)處理和挖掘Web日志數(shù)據(jù)網(wǎng)站個(gè)性化_第3頁(yè)

第3頁(yè) / 共9頁(yè)

下載文檔到電腦，查找使用更方便

15 積分

下載資源

還剩頁(yè)未讀，繼續(xù)閱讀

資源描述：

《畢業(yè)設(shè)計(jì)論文外文文獻(xiàn)翻譯中英文對(duì)照計(jì)算機(jī)科學(xué)與技術(shù) 預(yù)處理和挖掘Web日志數(shù)據(jù)網(wǎng)站個(gè)性化》由會(huì)員分享，可在線閱讀，更多相關(guān)《畢業(yè)設(shè)計(jì)論文外文文獻(xiàn)翻譯中英文對(duì)照計(jì)算機(jī)科學(xué)與技術(shù) 預(yù)處理和挖掘Web日志數(shù)據(jù)網(wǎng)站個(gè)性化（9頁(yè)珍藏版）》請(qǐng)?jiān)谘b配圖網(wǎng)上搜索。

1、南京理工大學(xué)泰州科技學(xué)院畢業(yè)設(shè)計(jì)(論文)外文資料翻譯系部：計(jì)算機(jī)科學(xué)與技術(shù) 專業(yè)：計(jì)算機(jī)科學(xué)與技術(shù) 姓名：學(xué) 號(hào)：外文出處： Dipartimento di Informatica, Universita di Pisa 附件： 1.外文資料翻譯譯文；2.外文原文。指導(dǎo)教師評(píng)語(yǔ)：簽名：年月日注：請(qǐng)將該封面與附件裝訂成冊(cè)。附件1：外文資料翻譯譯文預(yù)處理和挖掘Web日志數(shù)據(jù)網(wǎng)站個(gè)性化摘要：我們描述了Web使用挖掘活動(dòng)的一個(gè)持續(xù)項(xiàng)目要求，我們叫它ClickWorld3，旨在提取導(dǎo)航行為的一個(gè)網(wǎng)站的用戶的模型。該模型的推斷在訪問(wèn)日志的網(wǎng)絡(luò)服務(wù)器通過(guò)數(shù)據(jù)和Web挖掘技術(shù)的功能。

2、提取的知識(shí)是部署的個(gè)性化和主動(dòng)提供網(wǎng)絡(luò)服務(wù)給用戶。第一，我們描述預(yù)處理步驟訪問(wèn)日志必要的步驟，選擇并準(zhǔn)備數(shù)據(jù)，知識(shí)提取。然后，我們表現(xiàn)出兩套實(shí)驗(yàn)：第一，一個(gè)嘗試性預(yù)測(cè)的用戶基礎(chǔ)上訪問(wèn)的網(wǎng)頁(yè)；第二，試圖預(yù)測(cè)是否用戶可能有興趣參觀的一部分網(wǎng)頁(yè)。關(guān)鍵詞：知識(shí)發(fā)現(xiàn)，Web挖掘，分類。1、導(dǎo)言Web挖掘是利用數(shù)據(jù)挖掘技術(shù)在自動(dòng)化發(fā)現(xiàn)和提取信息從網(wǎng)絡(luò)的文件和服務(wù)。一個(gè)常見的分類Web挖掘的三個(gè)主要的研究項(xiàng)目明確的規(guī)定：內(nèi)容分鐘法，結(jié)構(gòu)挖掘和使用挖掘。區(qū)分這些類別沒(méi)有一個(gè)明確的界限，而是將經(jīng)常使用的方法相結(jié)合區(qū)分出不同的類別。內(nèi)容涵蓋數(shù)據(jù)挖掘技術(shù)提取模型，網(wǎng)絡(luò)對(duì)象的內(nèi)容，包括純文字，半結(jié)構(gòu)化文件（例如，HT

3、ML或XML語(yǔ)言），結(jié)構(gòu)化文件（數(shù)字圖書館），動(dòng)態(tài)的文件，多媒體文件。提取模型被用于分類的網(wǎng)頁(yè)對(duì)象，提取關(guān)鍵字用于信息檢索，推斷結(jié)構(gòu)的半結(jié)構(gòu)化或非結(jié)構(gòu)化的對(duì)象。結(jié)構(gòu)挖掘旨在發(fā)掘基本的拓?fù)浣Y(jié)構(gòu)的互連，籌措之間的網(wǎng)絡(luò)對(duì)象。該模型建立可用于分類和排名的網(wǎng)站，并發(fā)現(xiàn)了它們之間的相似性。使用挖掘是應(yīng)用數(shù)據(jù)挖掘技術(shù)發(fā)現(xiàn)使用從網(wǎng)絡(luò)模式的數(shù)據(jù)。數(shù)據(jù)通常是收集用戶的互動(dòng)關(guān)系在網(wǎng)上，例如網(wǎng)站/代理服務(wù)器日志，用戶查詢，登記數(shù)據(jù)。使用挖掘工具發(fā)現(xiàn)和預(yù)測(cè)用戶行為，以幫助設(shè)計(jì)師為改善網(wǎng)站，來(lái)吸引游客，或給普通用戶的個(gè)性化和適應(yīng)性的服務(wù)。在本文中，我們描述了Web使用挖掘活動(dòng)的一個(gè)持續(xù)項(xiàng)目要求ClickWorld ，旨在

4、提取模型，以用戶的行為為目的的個(gè)性化網(wǎng)站。我們從中期全國(guó)性大型門戶網(wǎng)站vivacity.it收集和預(yù)處理訪問(wèn)日志，花費(fèi)的時(shí)間為5個(gè)月。該網(wǎng)站包括了民族地區(qū)如網(wǎng)址為：www.vivacity.it的新聞，論壇，笑話等，以及30多個(gè)地方，例如，www.roma.vivacity.it與城市專用信息，如本地新聞，餐廳地址，戲劇節(jié)目，巴士的時(shí)間表，ECC等。預(yù)處理步驟包括數(shù)據(jù)選擇，清洗和轉(zhuǎn)化和通過(guò)驗(yàn)證的用戶和用戶會(huì)話。結(jié)果預(yù)處理，方法是一個(gè)數(shù)據(jù)集市的網(wǎng)絡(luò)訪問(wèn)和注冊(cè)信息。從預(yù)處理的數(shù)據(jù)，Web挖掘的目的是發(fā)現(xiàn)模式調(diào)整方法從統(tǒng)計(jì)數(shù)據(jù)，數(shù)據(jù)挖掘，機(jī)器學(xué)習(xí)和模式識(shí)別。其中基本數(shù)據(jù)挖掘技術(shù)，我們提到的關(guān)聯(lián)規(guī)則，

5、發(fā)現(xiàn)集團(tuán)的物體，常常要求用戶一起;集群，集團(tuán)用戶提供類似的瀏覽方式，或集團(tuán)類似的物體內(nèi)容或訪問(wèn)的模式;分類，而有利于的用戶被分到某一類或類別;和序列模式，即序列請(qǐng)求這是常見的許多用戶。在ClickWorld項(xiàng)目，有幾個(gè)上述方法，目前被用來(lái)提取有用的信息主動(dòng)提供個(gè)性化網(wǎng)頁(yè)網(wǎng)站。在本文中，我們描述了兩套分類實(shí)驗(yàn)。第一個(gè)，一項(xiàng)旨在提取一分類模型能夠性別歧視的用戶根據(jù)設(shè)置的網(wǎng)頁(yè)訪問(wèn)。第二次試驗(yàn)的目的是提取一分類模型能夠歧視這些用戶訪問(wèn)的網(wǎng)頁(yè)有關(guān)例如：提供給典型的實(shí)驗(yàn)。2、預(yù)處理的Web個(gè)性化我們已經(jīng)制定了一個(gè)數(shù)據(jù)集市的網(wǎng)頁(yè)記錄特殊的支持網(wǎng)絡(luò)個(gè)人化分析。該數(shù)據(jù)集市是人口從一個(gè)網(wǎng)絡(luò)日志數(shù)據(jù)倉(cāng)庫(kù)房子，如中所

6、描述的，或更簡(jiǎn)單地說(shuō)，從原材料網(wǎng)絡(luò)/代理服務(wù)器日志種來(lái)。在這一節(jié)中，我們描述了一些預(yù)處理和編碼步驟進(jìn)行數(shù)據(jù)的選擇，理解，清洗和轉(zhuǎn)化。雖然其中一些是一般數(shù)據(jù)準(zhǔn)備步驟，Web使用挖掘，值得注意的是，在許多人的一種領(lǐng)域知識(shí)必須一定要包括以清潔，正確和完整的輸入數(shù)據(jù)根據(jù)網(wǎng)頁(yè)的個(gè)性化需求。2.1用戶注冊(cè)數(shù)據(jù)除了網(wǎng)頁(yè)訪問(wèn)日志，我們考慮輸入包括個(gè)人資料的一個(gè)子集的用戶，即那些誰(shuí)注冊(cè)的vivacity.it網(wǎng)站，備注：注冊(cè)法不是強(qiáng)制性的。對(duì)于注冊(cè)用戶，該系統(tǒng)記錄了以下資料：性別，城市，省，婚姻狀況，出生日期。此信息是提供由用戶在一個(gè)網(wǎng)頁(yè)表單在登記時(shí)，作為一個(gè)可預(yù)計(jì)，數(shù)據(jù)的標(biāo)準(zhǔn)是對(duì)用戶公平。作為預(yù)處理步驟，難以

7、置信的數(shù)據(jù)檢測(cè)并刪除，如出生數(shù)據(jù)在未來(lái)或在遙遠(yuǎn)的過(guò)去。此外，一些額外的投入沒(méi)有進(jìn)口的數(shù)據(jù)信息，因?yàn)閹缀跛械闹捣謩e為左為默認(rèn)選擇的網(wǎng)頁(yè)表單。換言之，領(lǐng)域被認(rèn)為是不利于區(qū)分用戶的選擇和喜好。為了避免用戶位數(shù)的登錄名和密碼在每個(gè)訪問(wèn)vivacity.it網(wǎng)站采用的Cookie重復(fù)。如果一個(gè)Cookie是由用戶的瀏覽器，然后認(rèn)證并不是必需的。否則，身份驗(yàn)證后，一個(gè)新的Cookie 發(fā)送到用戶的瀏覽器。隨著這一機(jī)制，可以跟蹤任何用戶只要她刪除的Cookie的體系。此外，如果用戶注冊(cè)，該協(xié)會(huì)登錄cookie是可以在輸入數(shù)據(jù)，然后可以跟蹤用戶后，還原她刪除的cookie.這種機(jī)制使檢測(cè)非人類的用戶，如系統(tǒng)

8、診斷診斷和監(jiān)測(cè)方案。通過(guò)檢查的數(shù)量分配給cookie每個(gè)用戶，我們發(fā)現(xiàn)，用戶登錄test009被派到以上24.000獨(dú)特的Cookie。這不僅是可能的，如果用戶是一些程序，自動(dòng)刪除指定的cookie，例如：系統(tǒng)診斷程序。2.2網(wǎng)站的網(wǎng)址一方面，有一些標(biāo)準(zhǔn)化的網(wǎng)頁(yè)必須形成的統(tǒng)一的網(wǎng)址，以消除不相關(guān)的句法的差異。例如，主機(jī)可以在IP格式或自身格式，如131.114.2.91是相同的主機(jī)作為kdd.di.unipi.it。另一方面，也有一些網(wǎng)絡(luò)服務(wù)器程序采用非標(biāo)準(zhǔn)格式的參數(shù)傳遞。網(wǎng)站的vivacity.it 服務(wù)器程序是其中之一。例如，在以下網(wǎng)址：http:/roma.vivacity.it/spe

9、ciali/EditColonnaSpeciale/1，3478，|DX，00.html文件的名字1,3478，|DX,載有00碼的地方網(wǎng)站，網(wǎng)頁(yè)識(shí)別碼（3478）及其專用的參數(shù)（DX型）。上述的形式設(shè)計(jì)了效率的機(jī)器進(jìn)程。作為一個(gè)例子，網(wǎng)頁(yè)標(biāo)識(shí)是一個(gè)關(guān)鍵的數(shù)據(jù)庫(kù)表的網(wǎng)頁(yè)模板發(fā)現(xiàn)，雖然參數(shù)可以檢索的網(wǎng)頁(yè)內(nèi)容在一些其他就座。不幸的是，這是一場(chǎng)噩夢(mèng)時(shí)，挖掘點(diǎn)擊的網(wǎng)址。句法功能的網(wǎng)址是很少的幫助：我們需要一些語(yǔ)義信息，或本論文指定的網(wǎng)址。在最好的，我們可以預(yù)期，一個(gè)應(yīng)用程序級(jí)別的日志是，即日志的訪問(wèn)語(yǔ)義相關(guān)的對(duì)象。例如，應(yīng)用程序級(jí)日志是記錄用戶進(jìn)入網(wǎng)站主頁(yè)，然后參觀了體育與新聞頁(yè)面上足球代表隊(duì)，等等。

10、這將需要一個(gè)系統(tǒng)模塊監(jiān)測(cè)用戶的步驟在語(yǔ)義水平的力度。在這個(gè)ClickWorld項(xiàng)目中這樣一個(gè)模塊被稱為ClickObserve。不幸地，然而，該模塊是一個(gè)可交付的項(xiàng)目，它不適用于在收集數(shù)據(jù)的開始該項(xiàng)目。因此，我們決定提取兩個(gè)句法和語(yǔ)義信息從網(wǎng)址通過(guò)一個(gè)半自動(dòng)的辦法。該辦法包括通過(guò)在逆向工程的網(wǎng)址，從網(wǎng)站設(shè)計(jì)者說(shuō)明這意味著每一個(gè)URL路徑，網(wǎng)頁(yè)id和網(wǎng)頁(yè)的參數(shù)。使用PERL腳本，從設(shè)計(jì)師的描述，我們從原來(lái)的提取網(wǎng)址以下信息：本地網(wǎng)絡(luò)服務(wù)器，即vivacity.it或roma.vivacity.it等，這些親志愿給我們一些空間信息的用戶的利益;第一級(jí)分類的網(wǎng)址有24種，其中一些是：家庭，新聞，財(cái)政

11、，照片，笑話，購(gòu)物。論壇，酒吧;第二個(gè)級(jí)別的網(wǎng)址取決于第一級(jí)之一，例如：網(wǎng)址分類版購(gòu)物可進(jìn)一步分類版的圖書購(gòu)物或PC購(gòu)物等;第三級(jí)分類的網(wǎng)址取決于第二級(jí)之一，例如網(wǎng)址分類版的圖書購(gòu)物可進(jìn)一步分類版編程該書敘事購(gòu)物或購(gòu)物和書籍等;參數(shù)信息，還詳細(xì)介紹了三個(gè)層次分類，如網(wǎng)址分類版的編程書籍購(gòu)物可能的ISBN書碼作為參數(shù)的深度分類，即一日的網(wǎng)址，如果只有一個(gè)第一級(jí)別分類，如果網(wǎng)址的第一和第二級(jí)分類，等等。當(dāng)然，采取的辦法主要是其中的一個(gè)啟發(fā)式，隨著本次設(shè)計(jì)的層次上升。此外，本次設(shè)計(jì)不利用任何基于內(nèi)容的分類，即說(shuō)明新聞分類，如體育新聞的編號(hào)為12345的代碼，即第一級(jí)是新聞，并沒(méi)有提及的新聞內(nèi)容。附件

12、2：外文原文Preprocessing and Mining Web Log Data forWeb PersonalizationM. Baglioni1, U. Ferrara2, A. Romei1, S. Ruggieri1, and F. Turini11 Dipartimento di Informatica, Universita di Pisa,Via F. Buonarroti 2, 56125 Pisa Italyfbaglioni,romei,ruggieri,turinigdi.unipi.it2 KSolutions S.p.A.Via Lenin 132/26, 5

13、6017 S. Martino Ulmiano (PI) Italyferraraksolutions.itAbstract. We describe the web usage mining activities of an on-going project, called ClickWorld3, that aims at extracting models of the navigational behaviour of a web site users. The models are inferred from the access logs of a web server by me

14、ans of data and web mining techniques. The extracted knowledge is deployed to the purpose of offering a personalized and proactive view of the web services to users. We first describe the preprocessing steps on access logs necessary to clean, select and prepare data for knowledge extraction. Then we

15、 show two sets of experiments: the first one tries to predict the sex of a user based on the visited web pages, and the second one tries to predict whether a user might be interested in visiting a section of the site.Keywords: knowledge discovery, web mining, classification.1 IntroductionAccording t

16、o 10, Web Mining is the use of data mining techniques to auto-matically discover and extract information from web documents and services. A common taxonomy of web mining defines three main research lines: content mining, structure mining and usage mining. The distinction between those categories is

17、not a clear cut, and very often approaches use combination of techniques from different categories.Content mining covers data mining techniques to extract models from web object contents including plain text, semi-structured documents (e.g., HTML orXML), structured documents (digital libraries), dyn

18、amic documents, multimedia documents. The extracted models are used to classify web objects, to extractkeywords for use in information retrieval, to infer structure of semi-structured or unstructured objects.Structure Mining aims at finding the underlying topology of the interconnections between web

19、 objects. The model built can be used to categorize and to rank web sites, and also to find out similarity between them.2 M. Baglioni et al.Usage mining is the application of data mining techniques to discover usage patterns from web data. Data is usually collected from users interaction with the we

20、b, e.g. web/proxy server logs, user queries, registration data. Usage mining tools 3,4,9,15 discover and predict user behavior, in order to help the designer to improve the web site, to attract visitors, or to give regular users a personalized and adaptive service. In this paper, we describe the web

21、 usage mining activities of an on-going project, called ClickWorld, that aims at extracting models of the navigational behavior of users for the purpose of web site personalization 6. We have collected and preprocessed access logs from a medium-large national web portal,vivacity.it, over a period of

22、 five months. The portal includes a national area (www.vivacity.it) with news, forums, jokes, etc., and more than 30 local areas (e.g., www.roma.vivacity.it) with city-specific information, such as local news, restaurant addresses, theatre programming, bus timetable, ecc.The preprocessing steps incl

23、ude data selection, cleaning and transformation and the identification of users and of user sessions 2. The result of preprocessing is a data mart of web accesses and registration information. Starting from preprocessed data, web mining aims at pattern discovery by adapting methods from statistics,

24、data mining, machine learning and pattern recognition. Among the basic data mining techniques 7, we mention association rules, discovering groups of objects that are frequently requested together by users; clustering, grouping users with similar browsing patterns, or grouping objects with similarcon

25、tent or access patterns; classification, where a profile is built for users belonging to a given class or category; and sequential patterns, namely sequences of requests which are common for many users.In the ClickWorld project, several of the mentioned methods are currently being used to extract us

26、eful information for proactive personalization of web sites. In this paper, we describe two sets of classification experiments. The first one aims at extracting a classification model able to discriminate the sex of a user based on the set of web pages visited. The second experiment aims at extracti

27、ng a classification model able to discriminate those users that visit pages regarding e.g. sport or finance from those that typically do not.2 Preprocessing for Web PersonalizationWe have developed a data mart of web logs specifically to support web personalization analysis. The data mart is populat

28、ed starting from a web log data warehouse (such as those described in 8,16) or, more simply, from raw web/proxy server log files. In this section, we describe a number of preprocessing and coding steps performed for data selection, comprehension, cleaning and transformation.While some of them are ge

29、neral data preparation steps for web usage mining2,16, it is worth noting that in many of them a form of domain knowledge must necessarily be included in order to clean, correct and complete the input data according to the web personalization requirements.2.1 User registration dataIn addition to web

30、 access logs, our given input includes personal data on a subset of users, namely those who are registered to the vivacity.it website (registration is not mandatory). For a registered user, the system records the following information: sex, city, province, civil status, born date. This information i

31、s provided by the user in a web form at the time of registration and, as one could expect, the quality of data is up to the user fairness. As preprocessing steps, improbable data are detected and removed, such as born data in the future or in the remote past. Also, some additional input fields were

32、not imported in the data mart since almost all values were left as the default choice in the web form. In other words, the fields were considered not to be useful in discriminating user choices and preferences.In order to avoid users to digit their login and password at each visit, the vivacity.it w

33、eb site adopts cookies. If a cookie is provided by the user browser, then authentication is not required. Otherwise, after authentication, a new cookie is sent to the user browser. With this mechanism, it is possible to track any user as long as she deletes the cookies on her system. In addition, if

34、 the user is registered, the association login-cookie is available in the input data, and then it is possible to track the user also after she deletes the cookies. This mechanism allows for detecting non-human users, such as system diagnosis and monitoring programs. By checking the number of cookies

35、 assigned to each user, we discovered that the user login test009 was assigned more than 24.000 distinct cookies. This is possible only if the user is some program that automatically deletes assigned cookies, e.g. a system diagnosis program.2.2 Web URLResources in the World Wide Web are uniformly id

36、entified by means of URLs(Uniform Resource Locators). The syntax of an http URL is: http:/ host.domain :port abs path ? querywhere host.domain:port is the name of the server site. The TCP/IP port is optional (the default port is 80), abs path is the absolute path of the requested resource in the ser

37、ver filesystem. We further consider abs path of the form path / filename .extension, i.e. consisting of the filesystem path, filename and file extension. query is an optional collection of parameters, to be passed as an input to a resource that is actually an executable program, e.g. a CGI script.On

38、 the one side, there are a number of normalizations that must be performed on URLs, in order to remove irrelevant syntactic differences (e.g., thehost can be in IP format or host format 131.114.2.91 is the same host as kdd.di.unipi.it). On the other side, there are some web server programs that adop

39、t non-standard formats for passing parameters. The vivacity.it web server program is one of them. For instance, in the following URL:http:/roma.vivacity.it/speciali/EditColonnaSpeciale/1,3478,|DX,00.html the file name 1,3478,|DX,00 contains a code for the local web site (1 stands for roma.vivacity.i

40、t), a web page id (3478) and its specific parameters (DX). The form above has been designed for excient machine processing. For instance, the web page id is a key for a database table where the page template is found, while the parameters allow for retrieving the web page content in some other table

41、. Unfortunately, this is a nightmare when mining clickstream of URLs.Syntactic features of URLs are of little help: we need some semantic information,or ontology 5,13, assigned to URLs. At the best, we can expect that an application-level log is available, i.e. a log of accesses to semantic-relevant

42、 objects. An example of application-level log is one recording that the user entered the site from the home page, then visited a sport page with news on a soccer team, and so on. This would require a system module monitoring user steps at a semantic level of granularity. In the ClickWorld project su

43、ch a module is called Click Observe. Unfortunately , however, the module is a deliverable of the project, and it was not available for collecting data at the beginning of the project. Therefore, we decided to extract both syntactic and semantic information from URLs via a semi-automatic approach. Th

44、e adopted approach consists in reverse-engineering URLs, starting from the web site designer description of the meaning of each URL path, web page id and web page parameters. Using a PERL script, starting from the designer description we extracted from original URLs the following information: local

45、web server (i.e., vivacity.it or roma.vivacity.it etc.), which provides us with some spatial information about user interests; a first-level classification of URLs into 24 types, some of which are: home , news, finance, photo galleries, jokes, shopping, forum, pubs; a second-level classification of

46、URLs depending on the first-level one, e.g.URLs classified as shopping may be further classified as book shopping or pcshopping and so on; a third-level classification of URLs depending on the second-level one, e.g.URLs classified as book shopping may be further classified as programming book shoppi

47、ng or narrative book shopping and so on; a parameter information, further detailing the three level classification, e.g.URLs classified as programming book shopping may have the ISBN book code as parameter; the depth of the classification, i.e. 1 if the URL has only a first-level classification, 2 i

48、f the URL has first and second-level classification, and so on .Of course, the adopted approach was mainly an heuristics one, with the hierarchical ontology designed at posteriori. Also, the designed ontology does not exploit any content-based classification, i.e. the description of an elementary object such as sport news with id 12345 is its code (i.e., first-level is news, second level is sport, parameter information 12345), with no reference to the content of the news (was the news reporting about any specific player?).

展開閱讀全文

溫馨提示:
1: 本站所有資源如無(wú)特殊說(shuō)明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
2: 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3.本站RAR壓縮包中若帶圖紙，網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 裝配圖網(wǎng)僅提供信息存儲(chǔ)空間，僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請(qǐng)與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

關(guān)于我們 - 網(wǎng)站聲明 - 網(wǎng)站地圖 - 資源地圖 - 友情鏈接 - 網(wǎng)站客服 - 聯(lián)系我們

備案號(hào):蜀ICP備2024067431號(hào)-1 川公網(wǎng)安備51140202000466號(hào)

本站為文檔C2C交易模式，即用戶上傳的文檔直接被用戶下載，本站只是中間服務(wù)平臺(tái)，本站所有文檔下載所得的收益歸上傳人(含作者)所有。裝配圖網(wǎng)僅提供信息存儲(chǔ)空間，僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)上載內(nèi)容本身不做任何修改或編輯。若文檔所含內(nèi)容侵犯了您的版權(quán)或隱私，請(qǐng)立即通知裝配圖網(wǎng)，我們立即給予刪除！

畢業(yè)設(shè)計(jì)論文外文文獻(xiàn)翻譯中英文對(duì)照計(jì)算機(jī)科學(xué)與技術(shù) 預(yù)處理和挖掘Web日志數(shù)據(jù)網(wǎng)站個(gè)性化

最新文檔

相關(guān)資源

相關(guān)搜索

畢業(yè)設(shè)計(jì)論文 外文文獻(xiàn)翻譯 中英文對(duì)照 計(jì)算機(jī)科學(xué)與技術(shù) 預(yù)處理和挖掘Web日志數(shù)據(jù)網(wǎng)站個(gè)性化

最新文檔

相關(guān)資源

相關(guān)搜索

畢業(yè)設(shè)計(jì)論文外文文獻(xiàn)翻譯中英文對(duì)照計(jì)算機(jī)科學(xué)與技術(shù) 預(yù)處理和挖掘Web日志數(shù)據(jù)網(wǎng)站個(gè)性化