How Data and Machine Learning are Used within This Industry

Coursework Instructions:

Please upload your response to the lecture on Enterprise Search (Lecture6: Abhishek Singh Tomar) answering the base questions listed in the course information.

prepare a short (~2-3 pages, 12 point single space) report that addresses as many of the following questions as are relevant:

• Describe the market sector or sub-space covered in this lecture.

• What data science related skills and technologies are commonly used in this sector?

• How are data and computing related methods used in typical workflows in this sector? Illustrate with an example.

• What are the data science related challenges one might encounter in this domain?

• What do you find interesting about the nature of data science opportunities in this

domain?

In addition,

(i) What's the difference between a forward index and an inverted index? (10 pts of the 80 C+R points in the rubric) )

(ii) Describe the high level architectural components of web search. (10 pts of the 80 C+R points in the rubric) )

(iii) Also, answer the following multiple-choice questions: You can list the question number and the letter corresponding to the correct choice as Answer in your report, (2x5 = 10 pts of the 80 C+R points in the rubric)

Q1: Based on the lecture, there are 3 different actors in Web Search, the search engine users, the search engine providers, and the advertisers. Different actors have different expectations on web search results. Select the INCORRECT statement about web search actors’ expectations

A. Search engine users want high-quality search results and fast response time

B. Search engine providers want to attract more users, and reduce operational costs

C. Advertisers on search engine want to attract more users to their sites

D. Advertisers on search engine want to increase ad revenue

Q2: Based on the lecture, the indexing system performs several tasks, which of these is NOT a task of the indexing system?

E. Performs information extraction, filtering, and classification on downloaded web pages

F. Provides meta-data, metrics, and other kinds of feedback to the crawling and query processing systems

G. Based on the query data, index the retrieved textual content of pages for ranking

H. Converts the pages in the web repository into appropriate index structures that facilitate searching the textual content of pages

Q3: Based on the lecture, there are many textual content processing techniques used in the Query Interpretation System. Select all the text processing techniques mentioned in the lecture in this context.

9. Spelling correction

10. Stop-words removal

11. Word tokenization

12. Word stemming

13. Lemmatization

14. Geotagging

O. 1,2,3,4,5 B. 1,2,3,4,6 C. 1,2,4,5,6 D. All of Them

Q4. Based on the lecture, there are several processing steps in the Query Interpretation System. Select the steps in the order of the pipeline described in the lecture.

16. Spelling Correction

17. Normalization

18. Segmentation

19. Annotation

20. Stemming

21. Term Expansion

22. Query Rewriting

W. 1,2,3,4,5,6,7 B. 2,1,3,5,4,6,7 C. 2,1,3,4,5,6,7 D. 1,2,3,5,4,6,7

Q5. Based on the lecture, Machine Learning has many use cases in Enterprise, select ALL the mentioned Machine Learning use case scenarios.

24. Transformational HR Services

25. Self-Driving Customer Service

26. Conversational Bots

27. Student services

BB. 1,2,3 B. 1,2,4 C. 2,3,4 D. All of them

Coursework Sample Content Preview:

How Data and Machine Learning are Used within This Industry
Market Sector
Lecture 6 explores the enterprise market sector, specifically how Data and Machine Learning are used within this industry. Enterprise Search is used in this lecture as a case study to demonstrate the various ML and information retrieval techniques used within the enterprise sector.
Description
Enterprise content is distributed across multiple data sources in various formats. For example, in a company, employee data can be found in the employee database, the company’s internal collaboration system, and the finance ERP. This wide content distribution within the enterprise sector can make retrieving relevant content challenging. Therefore, Enterprise Search makes it easier for enterprises to find relevant content from multiple data sources without worrying about where the information is stored.
Key Data Science Skills and Technologies
Indexing
Indexing systems are an essential part of enterprise search. Their significance goes beyond information extraction, filtering, and classification. These tools enable users to search the textual content of webpages by converting pages in web repositories into appropriate index structures. Through pipelines, indexing systems pre-process documents and perform other extraction tasks on webpages. To excel in enterprise search, data science professionals must be conversant with the concept of full-text indexing and text document properties such as metadata and structure. Additionally, one must understand the use of pipelines for document processing, term selection, and the removal of bearing words.
Deep Learning
Deep learning is an essential data science technology in the enterprise market because it helps query interpretation. It influences matching users’ queries to relevant documents. Query term expansion, one of the mechanisms used in a query interpretation system, utilizes deep learning by applying synonyms, tokenization, and inverted index to maximize the successful matching of queries and results (documents).
Machine Learning
The benefits of machine learning in the enterprise sector are extensive. It plays a significant part in enterprise search because it is involved in processes such as tokenization, stopword removal stemming, and query interpretation. Supervised ML is specifically substantial when data science professionals need to develop a model that addresses the needs of their users.
Use of Data and Computing-related methods
All three high-level architecture components of Web search utilize data and computing-related methods to execute their tasks. Since queries have statistical patterns, data and computing is often leveraged to speed up the search. For example, optimization of the search results to be more user-specific is a complex process that requires the application of several computer-related methods. Search engines also often construct links and word frequencies and run this data through their ML algorithm.
Challenges in Enterprise Search
Redundancy
Redundancy is a common problem in the enterprise sector because project teams traditionally load data marts from existing data sources into the data lake. Teams would then add their unique data that matched their needs. While this approach is convenient, simple, and faster, it contributes to the proliferation of similar data marts. Blindly copying data from other sources and adding our modifications will cause extreme levels of redundancy.
Reliance on Tribal Knowledge
Various actors in the enterprise sector, including companies, rely heavily on...

Updated on January 26, 2024

Get the Whole Paper!

Not exactly what you need?

Do you need a custom essay? Order right now:

Order

👀 Other Visitors are Viewing These APA Essay Samples:

The Market Sector Covered in this Lecture is Retail

5 pages/≈1375 words | No Sources | MLA | IT & Computer Science | Coursework |
Data Science in the Financial Sector

5 pages/≈1375 words | No Sources | MLA | IT & Computer Science | Coursework |
Electronic Design Automation Industry

5 pages/≈1375 words | No Sources | MLA | IT & Computer Science | Coursework |