Find Jobs
Hire Freelancers

Looking for a python developer to help me finish a search engine with tf-idf and cosine similarity + query WITHOUT libraries such as sklearn

€8-30 EUR

Kansellert
Lagt ut over 3 år siden

€8-30 EUR

Betalt ved levering
I am looking for a python developer, preferably an expert in NLP, to help me finish a search engine for one of my college courses. The first part of the code, which is an inverted index, is already done. Please DO NOT change any parts of the pre-existing code, except for the parts instructed. It is important to keep the posting lists as they are - DO NOT shorten them. As I only have a limited number of characters, i have added a file that contains a more detailed job description, which examples, as well as a screenshot of what the result should look like. Please read the instructions carefully first and have a look at the screenshot before bidding. It is of great importance to follow the instructions (e.g. NOT using libraries for certain parts) This task should not be too much trouble for a skilled developer. Here is the rough outline of what needs to be done: - the tokens need to be stemmed, using snowballstemmer for German. It MUST be done using a separate function, do not stem in the same function as tokens are counted. I have noted in the code where to add this part. Stemming has also to be done in the queries. So, for example, if you type in "eating" in the queries (both inverted index AND cosine similarty), anything starting with "eat" should be printed out. - tf-idf needs to be calculated. MOST IMPORTANTLY: you CANNOT use any libraries for this. So DO NOT use sklearn, tfidfvectorizer or anything like that. Each part (tf, idf, tfidf) needs to be calculated in a separate function. I have noted where to add these in the code as well. If you use a library like tfidfvectorizer, or anything else that does the same, I cannot accept the code. - cosine similary has to be calculated; also MUST be done using a function, NO libraries (No sklearn, etc.) it has to be calculated based on whatever is typed into a query, comparing to the texts in the corpus. This query has to be accessed using the main function by typing in "2" in the menu. (menu already implemented; please find the corresponding part in the main function to add the query) The user should be able to search for words and then see the cosine similarity, tf, idf, and the final tf-idf for the Top N (e.g. Top 10) ranked document names AND document IDs for each result (please view the screen shot for this) after choosing the option for tf-idf in the menu (menu already implemented, tf-idf is chosen by entering "2"), first, the overall top 10 results (or any other number) for tf-idf should be printed out; without a query (no cosine similarty in this, as it is used for queries only). it should look something like this: Documents: [id: name (|d|)] 0: text1, 1: text2, 2: text3,.... dictionary: [term: idf | (doc: tf), (doc: tf), (doc: tf),...] and then it should ask the user to type something into a query. the result should look something like this (using cosine similarity): Query: food Top 3 containing the queried word(s): filename1 (file ID, tf | idf) filename2 (file ID, tf | idf) filename3 (file ID, tf | idf) (please view the screenshot for details, you will understand what I mean) The user should be able to type in more than just one word, but it the texts don't have to contain every single one of the words typed in in order to appear in the results. the added screenshot, a commented screenshot, and the more detailed project description will give you more details. Please advice these if you need more information. I have also provided some of the texts I am working with. Please note that the code has to be as simple as possible, nothing too hard/fancy. And it should be quite fast as I have to go through almost 4000 texts. To test the query with the texts I provided, I recommend searching for "vater sohn" and see if cosine similarity works.
Prosjekt-ID: 26972532

Om prosjektet

1 forslag
Eksternt prosjekt
Aktiv 4 år siden

Ønsker du å tjene penger?

Fordeler med budgivning på Freelancer

Angi budsjettet og tidsrammen
Få betalt for arbeidet ditt
Skisser forslaget ditt
Det er gratis å registrere seg og by på jobber
1 frilanser byr i gjennomsnitt €490 EUR for denne jobben
Brukeravatar
I have 3+ years of experience as a Python programmer and have worked on several Machine Learning projects mainly targeting the domain of Computer Vision and Digital Image Processing. Get effective Python programming / Machine Learning / Computer Vision / Deep Learning / Digital Image Processing / Algorithms & Design solutions
€490 EUR om 7 dager
0,0 (1 omtale)
0,0
0,0

Om klienten

GERMANYs flagg
Birkenfeld, Germany
5,0
2
Medlem siden mai 30, 2020

Klientbekreftelse

Takk! Vi har sendt deg en lenke for at du skal kunne kreve din gratis kreditt.
Noe gikk galt. Vær så snill, prøv på nytt.
Registrerte brukere Publiserte jobber
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Forhåndsvisning innlasting
Tillatelse gitt for geolokalisering.
Påloggingsøkten din er utløpt og du har blitt logget ut. Logg på igjen.