π§° NLP & Linguistics Resource Links
π οΈ NLP Toolkits & Libraries
- Gensim β Topic modeling and vector space modeling
- spaCy β Industrial-strength NLP in Python
- NLTK β Natural Language Toolkit for teaching and research
- Stanza β Stanfordβs neural NLP library
- OpenNLP β Apache NLP tools
π Linguistic Resources
- Glottolog β Comprehensive reference on world languages
- Ethnologue β Global language data
- Universal Dependencies β Annotated corpora for dependency parsing
- PHOIBLE β Phonological inventories of the worldβs languages
- WALS β World Atlas of Language Structures
- TextBlob β Simple Python NLP
π§ͺ Corpora & Datasets
- Common Crawl β Massive open web crawl data
- COCA β Corpus of Contemporary American English
- Europarl β Multilingual parallel corpus from European Parliament
- OpenSubtitles β Subtitle-based multilingual corpus
- LDC (Linguistic Data Consortium) β Access to specialized linguistic corpora
ποΈ Ontology & Taxonomy Tools
- ProtΓ©gΓ© β Ontology editor
- SKOS (Simple Knowledge Organization System) β W3C standard for knowledge organization
- Wikidata β Collaborative structured data
- WordNet β Lexical database for English
- BabelNet β Multilingual encyclopedic dictionary and semantic network
π Text Analysis & Semantic Tools
- Voyant Tools β Web-based text analysis suite
- Sketch Engine β Corpus manager and text analysis
- MeaningCloud β SaaS for semantic analysis
- TextRazor β Natural language processing API
- AllenNLP β NLP research library from AI2
π§βπ» APIs & Developer Resources
- Hugging Face Transformers β Pre-trained language models and APIs
- OpenAI API β Language models and embeddings
- Google Cloud NLP β Cloud-based NLP services
- IBM Watson NLP
- Microsoft Azure Text Analytics
π Multilingual Resources
- PanLex β Lexical data for thousands of languages
- Linguee / DeepL Dictionary β Multilingual contextual translations
- Tatoeba β Sentence-level multilingual database
- OPUS Corpus β Parallel multilingual corpora