Find Jobs
Hire Freelancers

Fast Webpage Crawler and Scraper

$30-250 USD

Stengt
Lagt ut omtrent 4 år siden

$30-250 USD

Betalt ved levering
I need a crawler that will crawl a list of domains that I will load in from a CSV file. The crawler needs to crawl ONLY THE LANDING PAGE - not the entire site - and capture the following and output a CSV file and stored to Dropbox: 1) Does URL have Google Analytics code - yes or no. Use a search for "Google Analytics" in the source of the page. 2) Is there a link to a privacy policy on the page - yes or no. Use a search for the word "Privacy" in the link text 3) How many unique internal URL links are present on the page. Return link count. 4) Is the URL secure (SSL) - yes or no. 5) Is the URL mobile-friendly - yes or no. Use a search for "meta name="viewport"" in the source of the page. 6) Is the domain parked - yes or no. Look for keywords or phrases in the source code. 7) Is a phone number present on the page? yes or no. Capture the phone number. 8) URL being crawled. The crawler must be capable of crawling 70,000 URLs per hour. To be successful, the script will be tested using 70,000 URLs in one hour.
Prosjekt-ID: 23485154

Om prosjektet

12 forslag
Eksternt prosjekt
Aktiv 4 år siden

Ønsker du å tjene penger?

Fordeler med budgivning på Freelancer

Angi budsjettet og tidsrammen
Få betalt for arbeidet ditt
Skisser forslaget ditt
Det er gratis å registrere seg og by på jobber
12 freelancers are bidding on average $191 USD for this job
Brukeravatar
Hi there, I am scraping expert, I have did more than 500+ scraping project, please check my feedback then you will know. Can we discuss more details about this project? then I will provide example data/script for you. Thanks, Lin
$160 USD om 5 dager
5,0 (410 omtaler)
8,1
8,1
Brukeravatar
Hi. I did read the project description and have a few questions. 1. Do you need the script as well or data only? 2. What is the format of the output data? CSV is OK? We can do other formats as well. 3. Which fields do you want to extract from the website? 4. How many results/urls are there? 5. Can you share the CSV with urls? Let's get in touch and we can provide a sample. Thx, waiting for these details and hope to collaborate.
$200 USD om 4 dager
5,0 (69 omtaler)
7,5
7,5
Brukeravatar
Hi I can deliver a multi-threading desktop tool that process 70k per hour Thanks
$400 USD om 3 dager
5,0 (99 omtaler)
7,6
7,6
Brukeravatar
hello, i have a 13 years of experience with such tools development - you can check my profile reviews. so i can build such a script for you quick. and you don't need 15 parallel threads or some special vps with it, it can be done with single thread and will work on any machine with a good internet connection. give me a 1K list please, i wanna run a benchmark to measure the scrapping time. let me know if you're interested. thanks.
$180 USD om 3 dager
5,0 (14 omtaler)
6,3
6,3
Brukeravatar
Scraping 70,000 urls in an hour is entirely dependent on the hosting this bot will run on. You'll need atleast 15 parallel threads, which is provided on many VPS, VM, and dedicated hosting providers. I've experience with all of the above. Thanks! Relevant experience: Linux, PHP7+, cURL, Proxies, Excel, Data Parsing, Api Integration --- - Scripts, solutions, frameworks - Over 120 crawler and parser jobs completed(from here and other sources) - Proxies provided. Ask about hosting!
$230 USD om 2 dager
4,9 (67 omtaler)
5,4
5,4
Brukeravatar
I can make a desktop application that will be multi-threaded to download as many pages as possible. However its speed depends entirely on your internet speed and the response times of the websites its download from. My average project completion time is within 3-5 hours on the same day. The skills I have include PHP, HTML5, CSS3, JavaScript, jQuery, WordPress Themes & Plugins, Web Scrapers & Automation Bots, User Scripts, Macros, and much more. If you have any questions or concerns. Feel free to message me via chat to clarify any details.
$150 USD om 1 dag
4,8 (20 omtaler)
5,0
5,0
Brukeravatar
Hi, i'm an expert in highly responsive website with optimale web technologies.I could do the job perfectly. i will work this project with c # desktop application with buttons, progress bar, multithreading. everything is clear and detailed. Can we discuss more details about this project? then I will provide example data/script for you. I'm at your disposal for any further information. Waiting for your response!
$200 USD om 7 dager
5,0 (17 omtaler)
4,8
4,8
Brukeravatar
Hi, Im an expert at Web Scraping with Python. The task is clear, we need multi-threaded/asynchronous programming to achieve that speed. It also depends on your network bandwidth but thats supposed to be alright. Contact me in chat to begin thanks, Pandelis
$100 USD om 3 dager
4,9 (15 omtaler)
4,8
4,8
Brukeravatar
Hey I can provide such scraper done in python + Scrapy. If you have in mind a faster solution than Scrapy, I would like to here what that is. I will integrate with Dropbox SDK for uploading results there. I will wrap everything up with Docker. BUT, 70k requests per hour is something that really depends: - on your internet connection, but more important - on each website bandwidth If this 2 can deliver 70k requests per hour, there will be no limitation from the code. Note: Scrapy already does concurent requests, it's the best tool that I know at this. Vali
$100 USD om 1 dag
5,0 (3 omtaler)
2,6
2,6
Brukeravatar
I will develop a spider for you using Python Scrapy Framework. The framework supports asynchronous web requests which will pass the 70000/hr requirement. Text me to discuss further.
$120 USD om 3 dager
0,0 (0 omtaler)
0,0
0,0

Om klienten

UNITED STATESs flagg
Sheb, United States
4,7
21
Betalingsmetode bekreftet
Medlem siden jun. 16, 2009

Klientbekreftelse

Takk! Vi har sendt deg en lenke for at du skal kunne kreve din gratis kreditt.
Noe gikk galt. Vær så snill, prøv på nytt.
Registrerte brukere Publiserte jobber
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Forhåndsvisning innlasting
Tillatelse gitt for geolokalisering.
Påloggingsøkten din er utløpt og du har blitt logget ut. Logg på igjen.