Find Jobs
Hire Freelancers

Shockingly inefficient PERL script on Google n-gram

$30-250 USD

Fullført
Lagt ut over 10 år siden

$30-250 USD

Betalt ved levering
Some colleagues developed a Perl script that compares the similarity of two sentences using Google n-grams. The n-gram files are huge, and without knowing Perl, we believe they have done nothing to optimize retrieval from the n-gram files. Each sentence comparison now takes an average of 7 minutes, and since we have about 500,000 sentence pairs to compare, this task would take almost 7 years to run. We need the speed improved by two orders of magnitude, to an average of 4.2 seconds per comparison. We suspect a simple initial indexing of the n-gram files to at the start of the process may take care of the problem. It would be ok for the system to take up to an hour at the startup to do any indexing and storing in memory. Up to 20GB of memory may be used to store the indexed data.
Prosjekt-ID: 5337779

Om prosjektet

10 forslag
Eksternt prosjekt
Aktiv 10 år siden

Ønsker du å tjene penger?

Fordeler med budgivning på Freelancer

Angi budsjettet og tidsrammen
Få betalt for arbeidet ditt
Skisser forslaget ditt
Det er gratis å registrere seg og by på jobber
Tildelt til:
Brukeravatar
My name is Elias Hamaz, a Perl Coder based in London UK. I can load the ngram files into a reference tree, so that the query is done on RAM memory. I can then modify the code to query the tree. My initial assessment is that: 1: The 20GB limit means that a file can be in memory only while its data is being queried. 2: A maximum of 2 files will be in memory at one time. 3: The order of the list of comparisons can be optimised so that queries on a particular file are performed sequentially, so as to minimise the number of disk read operations. Please get in touch to discuss the details of the comparison process. Regards, Elias Hamaz
$164 USD om 1 dag
5,0 (1 omtale)
2,7
2,7
10 frilansere byr i gjennomsnitt $181 USD for denne jobben
Brukeravatar
Definitely an interesting issue, I'd be glad to take the challenge and work on it :) Thank you. Is it a Linux system you're working on? (PS. Good that you aren't in Tom-Sawyer- mood right now: you'd reverse the bid, to reward the job to the bidder offering most :) )
$200 USD om 5 dager
4,9 (27 omtaler)
5,2
5,2
Brukeravatar
I'm interested in that project. I'm experienced (15+) perl developer and linux administrator. The bid is just for 2-3 hours of work, it may or may not be enough to solve the problem. Cannot guarantee without seeing the code. regards.
$77 USD om 3 dager
4,8 (17 omtaler)
5,5
5,5
Brukeravatar
Hi, I have experience with Perl and have done such string comparisons before. Indexing can save a lot of time yes.
$222 USD om 3 dager
5,0 (1 omtale)
2,5
2,5
Brukeravatar
Have optimized mime-64bit encryption perl scripts with 1 pass decoding/encoding. Might also need hardware tuning. Can provide a portfolio of work.
$255 USD om 7 dager
4,0 (1 omtale)
0,8
0,8
Brukeravatar
I am new to freelancer but having extensive experience working on Perl. I have executed lot of automation/optimization project in Perl with employer. I want to understand your full requirement and will provide you with my approach, If you are satisfied then only you can give me this project. I will assure you to meet your expectation.
$155 USD om 10 dager
0,0 (0 omtaler)
0,0
0,0
Brukeravatar
I have extensive knowledge of Perl and of creating indexed data structures to allow for efficient data comparisons/manipulations; based on the project description, I propose using a nested hash structure to first load the n-gram data (actual implementation details depend on your data files, such as your "n-" number and how many files are being used) before reading in your sentences for comparison. Provided sample data (n-gram files and comparison sentence input files) and your output requirements, I am confident I can deliver an efficient solution to help you achieve your goal in a timely manner. I look forward to discuss this in detail at your earliest convenience.
$222 USD om 3 dager
0,0 (0 omtaler)
0,0
0,0
Brukeravatar
I have 4 years of experience in unix, perl I can modify the perl script. My bid is low only to gain experience in freelancer.com , not because I am inefficient. If you send the perl script I can tell actually long it takes to modify the script. You pay only if the end result is satisfying. Thanks, Santhanalekshmi
$35 USD om 5 dager
0,0 (0 omtaler)
0,0
0,0

Om klienten

UNITED STATESs flagg
Boulder, United States
4,9
13
Betalingsmetode bekreftet
Medlem siden jun. 29, 2007

Klientbekreftelse

Takk! Vi har sendt deg en lenke for at du skal kunne kreve din gratis kreditt.
Noe gikk galt. Vær så snill, prøv på nytt.
Registrerte brukere Publiserte jobber
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Forhåndsvisning innlasting
Tillatelse gitt for geolokalisering.
Påloggingsøkten din er utløpt og du har blitt logget ut. Logg på igjen.