Tokenization
This demo illustrates Celatro's ability to quickly tokenize various types of
documents. In order to process different languages and the writing systems they
might use, Celatro's tokenizing capabilities are based on language-specific
alphabets.
Instructions
To use this demo, first choose a language from the drop-down menu provided
below, then use the following controls to specify the text to be tokenized:
-
Use Demo File:
Select a demo file from a drop-down list offering three great works of the
current language's literature.
-
Upload File:
Specify a URL or browse to a specific folder and file text, xml, html or rtf
files only; must be smaller than 2 MB).
-
Specify an URL:
Specify any document you can locate using an URL.
-
Write Some Text:
Type or paste text to be tokenized.
-
Select Encoding:
Depending on your previous choices, you might also need to select the encoding
you wish applied from a drop-down list: UTF-8 or UTF-16.
After configuring your search and clicking
Submit, scroll down to view the results
page, which will show the outcome of Celatro's tokenization: a) the ordinal
number of the token, and b) the token itself.
Please select a language: |
|
|