7ET023 Presentation

Download Report

Transcript 7ET023 Presentation

7ET023 – MSc Dissertation
Student Name: Colin Hopson
Student Number: 0482647
Course Title: MSc Computer Science (Internet Engineering)
Research Question: What is the most suitable web mining
technique for a specified business and mobile application case
study?
Contents:
 1 – Introduction to the subject of web mining and techniques
 2 – Overview of research conducted (both theory and
practical)
 3 – Software applications on which to test web mining
techniques
 4 – Demonstration (Digital Solutions and Repairs)
 5 – Evaluating results (suitability and practicality)
7ET023 – MSc Dissertation
1 – Introduction to the subject of web mining and techniques
 Sequential research of techniques for an empirical study
 Initial research into data mining (databases)
 Previous knowledge of web services (RSS, REST, etc.)
 Research into theory of web mining



Web usage mining – logs to examine navigation patterns
Web structure mining – examine link hierarchy
Web content mining – “the discovery of useful information from the Web by
examining the data that is contained in the Web site” (Pendharkar, 2003
pg.243) * Pendharkar, P.C. (2003) Managing data mining technologies in organizations:
techniques and applications, Idea Group Pub, Hershey.
 Data extraction from HTML (machine learning algorithms)


Wrapper Induction
Semi-Automatic Extraction
7ET023 – MSc Dissertation
2 – Overview of research conducted (both theory and practical)
 Researching Theory of Data and Web Mining
Empirical research method to acquire knowledge,
Research into data mining, web mining, data extraction algorithms, etc.,
Sequential investigation of applicable techniques.
 Artefact Design and Development
E-commerce prototype website (Digital Solutions and Repairs),
Mobile application (Mobile Shopper).
 Practical Research to Implement Techniques
Resolution of web services (Amazon APIs),
HTML extraction technique using XML; DOM; Xpath; PHP Arrays,
Consuming Google API with REST; DOM; Xpath; PHP Arrays,
Third-Party Software (Newprosoft and Automation Anywhere),
Functionality of XSLT.
7ET023 – MSc Dissertation
3 – Software applications on which to test web mining techniques
7ET023 – MSc Dissertation
4 – Demonstration (Digital Solutions and Repairs)
 Web Mining Technique 1
Amazon API
(coded class/methods)
 Web Mining Technique 2
HTML Extraction
(DOMDocument, Xpath and PHP Arrays)
 Web Mining Technique 3
Google API
(REST, DOMDocument, XPath and PHP Arrays)
 Web Mining Technique 4
Third-Party Software
(Automation Anywhere and Newprosoft)
 Web Mining Technique 5
None Implemented, but XSLT investigated
Website Demonstration >>>
7ET023 – MSc Dissertation
5 – Evaluating results (suitability and practicality)
 Web Mining Technique 1: Amazon API
Requires registration and associate keys,
Product Advertising API has most requirements (plus more),
ASINs assist administration system,
Top quality delivery and discounts,
Regular updates although lengthy documentation.
 Web Mining Technique 2: HTML Extraction
No cost, but requires programming knowledge,
Bespoke algorithm specific for HTML format,
Limited to one online organisation.
 Web Mining Technique 3: Google API
Requires registration and associate keys,
Searches products from many online organisations,
GoogleId does not assist administration system,
Web service retrieves limited product information,
Top security measures, but lengthy documentation.
 Web Mining Technique 4: Third-Party Software
Limited free trial with subscription costs,
Possible difficulty with integration with administration system
 Web Mining Technique 5: XSLT investigated
Limited free trial with subscription costs,
Integration difficulties with administration system
7ET023 – MSc Dissertation
SUMMARY
 Study of web mining and some of its techniques
Empirical study, data mining, web services, web content mining, data
extraction algorithms.
 Sequential research conducted (theory and practical)
Web services (APIs), HTML extraction, Third-Party software, XSLT.
 E-commerce prototype website and mobile application
‘Digital Solutions and Repairs’ and ‘Mobile Shopper’.
 Demonstration of web mining techniques
DSR computer repairs administration system
 Evaluation of web mining techniques investigated
Comparison between APIs, HTML extraction, third-party software and XSLT.
Questions?