Web crawler or Web scraper for Aliexpress to extract the data from each product within a specific category with filters applied (listing URL with filters applied)
$30-250 USD
Betales ved levering
1. The web crawler or web scraper has to be capable to crawl and extract all the data from each product within a specific category and with some filters applied within that URL listing. the crawler has to have the function of define the number of products that can be crawled from a URL listing, for example, the first 1,000 or from the range of 500 to 1,500 within that URL listing.
2. The crawler should extract the following information from each product: - photos - stock - title - description - specifications - orders - price - product URL - reference or ID - category and subcategory - publisher - feedback of the publisher - Variants and their photos, stock and their references or ID's (in case there are variants, variants are: when the product has different colors, sizes and each one has different photos, prices and stock).
3. The data of each product must be saved in a database (MongoDB, PostgreSQL, MariaDB etc) to be accessible to be consumed by another application.
4. the crawling system should be able to handle several "threads" at the same time so that the system can scale to crawl and process multiple URL's listings at the same time. (For example 5 URL listings from different categories, different filters applied and different maximum extracted product limits per URL listing.)
Infrasctructure requirements:
1. The infrastructure is recommended to be on an AWS or some other type of server that allows data crawling.
2. Use of redsidential or normal proxies or to use one of the popular proxies services like “crawlera”, "proxycrawl", "scrapehero", "luminate", etc to avoid bans and blocks and to ensure that all requests are successful and we are not omitting any product from a URL listing.
Note: The development should be well documented and code-commented to easily understand the code and to later add more functions and to maintain the crawler.
Optional but recommended technologies are:
Python, Scrapy, Selenium, AWS, Proxy services like Crawlera, lXML, Beautiful Soup, Requests, etc.
Another ones Optional but recommended are:
Pillow, schedule, Twisted, time, json and pymongo
Prosjekt-ID: #26576057
Om prosjektet
8 frilansere byr i gjennomsnitt $208 for denne jobben
Hello, my name is Puru. I have 6+ years experience in providing integrated development solutions including web automation and web scraping with expertise in python, bs4, scrapy, selenium. I have scraped data from sites Mer
Hi there, I Have Scraped Amazon, Aliexpress, Yellow Pages, Yelp, Etc. I Have Unlimited Internet.I Have 6 Systems. I can do it. I have done many related projects like this. If you are provide this work, it will he Mer
hello im a fullstack developer and also a software engineer i have read your description 1. The web crawler or web scraper has to be capable to crawl and extract all the data from each prod but i need more detail via c Mer
Everything is clear. But I'm not good with AWS infrastructure at this moment. I can do a little study to become able to complete this part of the project and will be happy if you can provide some information about it. Mer
Hey, I'm Arnav I'm an experienced Software Engineer/ Fullstack Developer with a skillset comprising of web development (frontend & backend), automation, machine learning, deep learning, data mining, data analysis, API Mer
Hello there, i hope this proposal finds you in the best of your health. I saw your job posting and i would like to take this on and provide you with the most efficient results. I have hands on knowledge about web scrap Mer
Hello, I'm an experienced python developer, my main specialisation are parsing and crawling. I work with MongoDB and SQL. If you choose me, I guarantee that I will done work perfectly and in time.