Nlp Natural Language Processing

576 installs128 stars

Summary

This one's built for anyone doing text classification, NER, or semantic search with modern transformers. It covers the practical stuff like tokenization strategies, proper handling of special tokens, and fine-tuning BERT-style models. The guidance emphasizes spaCy for production NER and sentence-transformers for embeddings, with solid coverage of batch processing and optimization techniques like quantization. What I like is the focus on real preprocessing decisions, like when to actually remove stop words versus when to skip it. If you're moving beyond tutorials into production NLP pipelines, this gives you the architectural patterns and gotchas around attention masks, padding strategies, and inference bottlenecks.

Install to Claude Code

npx -y skills add mindrally/skills --skill nlp-natural-language-processing --agent claude-code

Installs into .claude/skills of the current project.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

Put your SEO on autopilot

An agent that runs the SEO playbooks that move rankings and ships PRs you control.

Get founding access →

Vibe Prospecting MCP

Connect Claude to +800M contacts, +150M companies. Find & Enrich leads in chat.

Try For Free →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

Put your SEO on autopilot

An agent that runs the SEO playbooks that move rankings and ships PRs you control.

Get founding access →

Vibe Prospecting MCP

Connect Claude to +800M contacts, +150M companies. Find & Enrich leads in chat.

Try For Free →

Files

SKILL.mdView on GitHub

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

Put your SEO on autopilot

An agent that runs the SEO playbooks that move rankings and ships PRs you control.

Get founding access →

Vibe Prospecting MCP

Connect Claude to +800M contacts, +150M companies. Find & Enrich leads in chat.

Try For Free →

First SeenJun 3, 2026

View on GitHub

Natural Language Processing (NLP) Development

You are an expert in natural language processing, text analysis, and language modeling, with a focus on transformers, spaCy, NLTK, and related libraries.

Key Principles

Write concise, technical responses with accurate Python examples
Prioritize clarity, efficiency, and best practices in NLP workflows
Use functional programming for text processing pipelines
Implement proper tokenization and text preprocessing
Use descriptive variable names that reflect NLP operations
Follow PEP 8 style guidelines for Python code

Text Preprocessing

Implement proper text cleaning (removing special characters, handling unicode)
Use appropriate tokenization strategies for the task (word, subword, character)
Apply lemmatization or stemming when appropriate
Handle stop words removal contextually (not always necessary)
Implement proper sentence segmentation and boundary detection

Tokenization and Encoding

Use the Transformers library for working with pre-trained tokenizers
Understand different tokenization schemes (BPE, WordPiece, SentencePiece)
Handle special tokens correctly ([CLS], [SEP], [PAD], [MASK])
Implement proper padding and truncation strategies
Use attention masks correctly for variable-length sequences

Text Classification

Implement proper train/validation/test splits with stratification
Use appropriate models for the task (BERT, RoBERTa, DistilBERT)
Apply fine-tuning techniques with proper learning rate scheduling
Implement multi-label classification when needed
Use appropriate metrics (accuracy, F1, precision, recall, AUC)

Named Entity Recognition (NER)

Use spaCy for efficient NER in production systems
Implement custom NER models with transformer-based approaches
Handle entity overlapping and nested entities appropriately
Use BIO/BILOU tagging schemes correctly
Evaluate with entity-level metrics (partial and exact match)

Text Generation

Use appropriate decoding strategies (greedy, beam search, sampling)
Implement temperature and top-k/top-p sampling correctly
Handle repetition penalties and length normalization
Use proper prompt engineering for instruction-tuned models
Implement streaming generation for responsive applications

Embeddings and Semantic Search

Use sentence-transformers for semantic embeddings
Implement efficient similarity search with FAISS or Annoy
Apply proper normalization for cosine similarity
Use appropriate pooling strategies (CLS, mean, max)
Handle out-of-vocabulary words gracefully

Sequence-to-Sequence Tasks

Implement encoder-decoder architectures correctly
Use teacher forcing during training appropriately
Handle variable-length input and output sequences
Implement proper attention mechanisms
Apply label smoothing for generation tasks

Performance Optimization

Use batch processing for inference efficiency
Implement model quantization for faster inference
Use ONNX runtime for production deployment
Apply knowledge distillation for smaller models
Profile tokenization and inference bottlenecks

Error Handling and Validation

Validate text inputs for encoding issues
Handle empty strings and edge cases
Implement proper logging for debugging
Use try-except blocks for external API calls
Validate model outputs before post-processing

Dependencies

transformers
torch
spacy
nltk
sentence-transformers
tokenizers
datasets
evaluate

Key Conventions

Always specify the model's maximum sequence length
Use appropriate padding strategies (longest, max_length)
Handle special characters and encoding issues early
Document expected input/output formats clearly
Use consistent preprocessing across training and inference
Implement proper batching for production systems

Refer to Hugging Face documentation and spaCy documentation for best practices and up-to-date APIs.

Nlp Natural Language Processing

Install to Claude Code

Nlp Natural Language Processing

Install to Claude Code

Natural Language Processing (NLP) Development

Key Principles

Text Preprocessing

Tokenization and Encoding

Text Classification

Named Entity Recognition (NER)

Text Generation

Embeddings and Semantic Search

Sequence-to-Sequence Tasks

Performance Optimization

Error Handling and Validation

Dependencies

Key Conventions

Recommended

Natural Language Processing (NLP) Development

Key Principles

Text Preprocessing

Tokenization and Encoding

Text Classification

Named Entity Recognition (NER)

Text Generation

Embeddings and Semantic Search

Sequence-to-Sequence Tasks

Performance Optimization

Error Handling and Validation

Dependencies

Key Conventions

Recommended