Spis treści książki
- Foreword
- Preface
- Why We Wrote This Book
- The Philosophy
- Scope
- Who Should Read This Book
- What You Will Learn
- Structure of the Book
- How to Read This Book
- Conventions Used in This Book
- Using Code Examples
- OReilly Online Learning
- How to Contact Us
- Further Information
- Acknowledgments
- I. Foundations
- 1. NLP: A Primer
- NLP in the Real World
- NLP Tasks
- NLP in the Real World
- What Is Language?
- Building Blocks of Language
- Phonemes
- Morphemes and lexemes
- Syntax
- Context
- Building Blocks of Language
- Why Is NLP Challenging?
- Ambiguity
- Common knowledge
- Creativity
- Diversity across languages
- Machine Learning, Deep Learning, and NLP: An Overview
- Approaches to NLP
- Heuristics-Based NLP
- Machine Learning for NLP
- Naive Bayes
- Support vector machine
- Hidden Markov Model
- Conditional random fields
- Deep Learning for NLP
- Recurrent neural networks
- Long short-term memory
- Convolutional neural networks
- Transformers
- Autoencoders
- Why Deep Learning Is Not Yet the Silver Bullet for NLP
- An NLP Walkthrough: Conversational Agents
- Wrapping Up
- 2. NLP Pipeline
- Data Acquisition
- Text Extraction and Cleanup
- HTML Parsing and Cleanup
- Unicode Normalization
- Spelling Correction
- System-Specific Error Correction
- Pre-Processing
- Preliminaries
- Sentence segmentation
- Word tokenization
- Preliminaries
- Frequent Steps
- Stemming and lemmatization
- Other Pre-Processing Steps
- Text normalization
- Language detection
- Code mixing and transliteration
- Advanced Processing
- Feature Engineering
- Classical NLP/ML Pipeline
- DL Pipeline
- Modeling
- Start with Simple Heuristics
- Building Your Model
- Building THE Model
- Evaluation
- Intrinsic Evaluation
- Extrinsic Evaluation
- Post-Modeling Phases
- Deployment
- Monitoring
- Model Updating
- Working with Other Languages
- Case Study
- Wrapping Up
- 3. Text Representation
- Vector Space Models
- Basic Vectorization Approaches
- One-Hot Encoding
- Bag of Words
- Bag of N-Grams
- TF-IDF
- Distributed Representations
- Word Embeddings
- Pre-trained word embeddings
- Training our own embeddings
- CBOW
- SkipGram
- Word Embeddings
- Going Beyond Words
- Distributed Representations Beyond Words and Characters
- Universal Text Representations
- Visualizing Embeddings
- Handcrafted Feature Representations
- Wrapping Up
- II. Essentials
- 4. Text Classification
- Applications
- A Pipeline for Building Text Classification Systems
- A Simple Classifier Without the Text Classification Pipeline
- Using Existing Text Classification APIs
- One Pipeline, Many Classifiers
- Naive Bayes Classifier
- Logistic Regression
- Support Vector Machine
- Using Neural Embeddings in Text Classification
- Word Embeddings
- Subword Embeddings and fastText
- Document Embeddings
- Deep Learning for Text Classification
- CNNs for Text Classification
- LSTMs for Text Classification
- Text Classification with Large, Pre-Trained Language Models
- Interpreting Text Classification Models
- Explaining Classifier Predictions with Lime
- Learning with No or Less Data and Adapting to New Domains
- No Training Data
- Less Training Data: Active Learning and Domain Adaptation
- Case Study: Corporate Ticketing
- Practical Advice
- Wrapping Up
- 5. Information Extraction
- IE Applications
- IE Tasks
- The General Pipeline for IE
- Keyphrase Extraction
- Implementing KPE
- Practical Advice
- Named Entity Recognition
- Building an NER System
- NER Using an Existing Library
- NER Using Active Learning
- Practical Advice
- Named Entity Disambiguation and Linking
- NEL Using Azure API
- Relationship Extraction
- Approaches to RE
- RE with the Watson API
- Other Advanced IE Tasks
- Temporal Information Extraction
- Event Extraction
- Template Filling
- Case Study
- Wrapping Up
- 6. Chatbots
- Applications
- A Simple FAQ Bot
- Applications
- A Taxonomy of Chatbots
- Goal-Oriented Dialog
- Chitchats
- A Pipeline for Building Dialog Systems
- Dialog Systems in Detail
- PizzaStop Chatbot
- Building our Dialogflow agent
- Testing our agent
- PizzaStop Chatbot
- Deep Dive into Components of a Dialog System
- Dialog Act Classification
- Identifying Slots
- Response Generation
- Dialog Examples with Code Walkthrough
- Datasets
- Dialog act prediction
- Loading the dataset
- Models
- Slot identification
- Loading the dataset
- Models
- Other Dialog Pipelines
- End-to-End Approach
- Deep Reinforcement Learning for Dialogue Generation
- Human-in-the-Loop
- Rasa NLU
- A Case Study: Recipe Recommendations
- Utilizing Existing Frameworks
- Open-Ended Generative Chatbots
- Wrapping Up
- 7. Topics in Brief
- Search and Information Retrieval
- Components of a Search Engine
- A Typical Enterprise Search Pipeline
- Setting Up a Search Engine: An Example
- A Case Study: Book Store Search
- Search and Information Retrieval
- Topic Modeling
- Training a Topic Model: An Example
- Whats Next?
- Text Summarization
- Summarization Use Cases
- Setting Up a Summarizer: An Example
- Practical Advice
- Recommender Systems for Textual Data
- Creating a Book Recommender System: An Example
- Practical Advice
- Machine Translation
- Using a Machine Translation API: An Example
- Practical Advice
- Question-Answering Systems
- Developing a Custom Question-Answering System
- Looking for Deeper Answers
- Wrapping Up
- III. Applied
- 8. Social Media
- Applications
- Unique Challenges
- NLP for Social Data
- Word Cloud
- Tokenizer for SMTD
- Trending Topics
- Understanding Twitter Sentiment
- Pre-Processing SMTD
- Removing markup elements
- Handling non-text data
- Handling apostrophes
- Handling emojis
- Split-joined words
- Removal of URLs
- Nonstandard spellings
- Text Representation for SMTD
- Customer Support on Social Channels
- Memes and Fake News
- Identifying Memes
- Fake News
- Wrapping Up
- 9. E-Commerce and Retail
- E-Commerce Catalog
- Review Analysis
- Product Search
- Product Recommendations
- E-Commerce Catalog
- Search in E-Commerce
- Building an E-Commerce Catalog
- Attribute Extraction
- Direct attribute extraction
- Indirect attribute extraction
- Attribute Extraction
- Product Categorization and Taxonomy
- Product Enrichment
- Product Deduplication and Matching
- Attribute match
- Title match
- Image match
- Review Analysis
- Sentiment Analysis
- Aspect-Level Sentiment Analysis
- Supervised approach
- Unsupervised approach
- Connecting Overall Ratings to Aspects
- Understanding Aspects
- Recommendations for E-Commerce
- A Case Study: Substitutes and Complements
- Latent attribute extraction from reviews
- Product linking
- A Case Study: Substitutes and Complements
- Wrapping Up
- 10. Healthcare, Finance, and Law
- Healthcare
- Health and Medical Records
- Patient Prioritization and Billing
- Pharmacovigilance
- Clinical Decision Support Systems
- Health Assistants
- Electronic Health Records
- HARVEST: Longitudinal report understanding
- Question answering for health
- Outcome prediction and best practices
- Mental Healthcare Monitoring
- Medical Information Extraction and Analysis
- Healthcare
- Finance and Law
- NLP Applications in Finance
- Financial sentiment
- Risk assessments
- Accounting and auditing
- NLP Applications in Finance
- NLP and the Legal Landscape
- Legal entity extraction with LexNLP
- Wrapping Up
- IV. Bringing It All Together
- 11. The End-to-End NLP Process
- Revisiting the NLP Pipeline: Deploying NLP Software
- An Example Scenario
- Revisiting the NLP Pipeline: Deploying NLP Software
- Building and Maintaining a Mature System
- Finding Better Features
- Iterating Existing Models
- Code and Model Reproducibility
- Troubleshooting and Interpretability
- Monitoring
- Minimizing Technical Debt
- Automating Machine Learning
- auto-sklearn
- Google Cloud AutoML and other techniques
- The Data Science Process
- The KDD Process
- Microsoft Team Data Science Process
- Making AI Succeed at Your Organization
- Team
- Right Problem and Right Expectations
- Data and Timing
- A Good Process
- Other Aspects
- Peeking over the Horizon
- Final Words
- Index