CORPORA of text in Azebaijani language

Lukket Lagt ut 5 år siden Betales ved levering
Lukket Betales ved levering

BONUS PROJECT: Data Collection and Pre-processing

SUBJECT / AREA: "Parlament's Regulations, Policy Making, Decrees"

- Only Azerbaijani text is allowed (any text in other language must be excluded);

- All final text must be in textual file;

- Sentences shorter than 3 words must be excluded;

- Only complete sentences should be used;

- Poem/Poetry is not allowed;

- Each sentence must start with Letter (first symbol can't be number, or any other symbol like "-, _, (, ), ..." etc.);

- Format is one sentence per line - each sentence must start from new line and end with ".";

- Broken sentences (when sentence has EOF in middle) are not allowed;

- Only Single space between all words;

- All page-numbers, headers, titles, etc. must be excluded - just sentences are allowed;

- If applicable, Headers and Footers must be removed;

- Total size of textual file should be at least 15000 lines (sentences);

Python

Prosjekt-ID: #16958923

Om prosjektet

1 bud Eksternt prosjekt Aktiv 5 år siden