The National Centre for Text Mining (NaCTeM)
The National Centre for Text Mining (NaCTeM) is the first publicly-funded text mining centre in the world. We provide text mining services in response to the requirements of the UK academic community. NaCTeM is operated by the University of Manchester.
On our website, you can find pointers to sources of information about text mining such as links to
- text mining services provided by NaCTeM
- software tools, both those developed by the NaCTeM team and by other text mining groups
- seminars, general events, conferences and workshops
- tutorials and demonstrations
- text mining publications
NaCTeM Software Tools
The National Centre for Text Mining bases its service systems on a number of text mining software tools.
- Part-of-speech (POS) taggers
- Named entitities/terms
- AnatomyTagger — an open-source entity mention tagger for anatomical entities
- Named-entity Recognizer — Part of the GENIA Tagger
- NEMine — Recognizes gene/protein names in text.
- Yeast MetaboliNER — Recognizes yeast metabolite names in text.
- ACELA — Tool for efficient annotation of named entitites
- Smart dictionary lookup — machine learning-based gene/protein name lookup
- Smart Dictionary Lookup Tool Web Service — Looks up term variations of a given gene/protein name based on an automatically trained similarity measure
- Term Normalization Tool — Normalizes terms with string rewriting rules automatically generated based on a dictionary.
- DECA — A species disambiguation system for biological named entities
- RF-TermAlign — a bilingual dictionary extraction tool that uses a Random Forest method to learn string similarity of terms between a source and target language.
- Other tools
- EventMine — A machine learning-based event extraction system.
- brat — A free, open-source, web-based tool for text annotation visualisation and editing.
- Cafetiere — An easy-to-use text mining system for carrying text mining on your own document collection
- Sentence and paragraph breaker — An accurate sentence and paragraph detector based on heuristic rules
- Clinical Document Classification — automatic document classification demo
- Sentiment Analysis Tool — Analyses sentiment of input text.