CORPORA of text in Azebaijani language
$30-250 USD
Betales ved levering
BONUS PROJECT: Data Collection and Pre-processing
SUBJECT / AREA: "Parlament's Regulations, Policy Making, Decrees"
- Only Azerbaijani text is allowed (any text in other language must be excluded);
- All final text must be in textual file;
- Sentences shorter than 3 words must be excluded;
- Only complete sentences should be used;
- Poem/Poetry is not allowed;
- Each sentence must start with Letter (first symbol can't be number, or any other symbol like "-, _, (, ), ..." etc.);
- Format is one sentence per line - each sentence must start from new line and end with ".";
- Broken sentences (when sentence has EOF in middle) are not allowed;
- Only Single space between all words;
- All page-numbers, headers, titles, etc. must be excluded - just sentences are allowed;
- If applicable, Headers and Footers must be removed;
- Total size of textual file should be at least 15000 lines (sentences);
Prosjekt-ID: #16958923