HTML Scraper Script(repost)

Fullført Lagt ut Aug 18, 2009 Betales ved levering
Fullført Betales ved levering

I need a quick script to pull down two sets of authenticated HTML pages, parse them and return the results in CSV format.

This will need to be a command line script that would run under OS X and Linux. My order of preference for the language would be 1. Python, 2. Bash Shell, or 3. Perl.

The detailed items provided include a Wireshark capture session including a login, pull down of the orders listing, and pull down of two sample orders.

## Deliverables

I need a script that will combine a couple of sets of data and give me information about orders in a given date range output in CSV format. Generally, I will run this through cron shortly after midnight to get the previous day's order information so that I can paste it into a tracking spreadsheet. However, I would also like the capabilities to analyze orders over a date range if desired.

Software should be a command line script with the following run time options:

-u (BrickLink username)

-p (BrickLink password)

-h (help screen showing usage examples)

-d1 (Start date, optional)

-d2 (End date, optional)

The start and end dates are optional parameters where if specified they must both be specified. Dates can be specified in YYYYMMDD format.

The default usage should only include -u and -p as required parameters and the default date range should be the previous day only.

First, you'll need to get a list of order numbers from the desired date range using this URL:

[url removed, login to view]

Note that orders can be *updated*, which causes the dates (and other values) to change, but the order number will remain the same. Please sort the output so that the orders are displayed in increasing order number. This will help me catch order updates and replace the existing values so that I don't double count an order that was placed on one day and updated the next.

After getting the list of orders for the date range, you'll need to pull the order details for each order. These contain the information to be parsed for the output. Most but not all of this information is duplicated from the previous page, and would probably be easier to parse from there. However, the orderReceived listing can be modified to show less detail or show details in a different order, which would screw with the parsing. For that reason, I'd like everything to be parsed from the orderDetail page, which uses this URL:

[url removed, login to view]

Where 5555555 would represent the order numbers parsed from the previous page.

On the orderDetail page, the following text may appear at the bottom of the page:

"First batch of this order has been referred by [[url removed, login to view]][1]."

If this appears, I would like the output for Peeron to read "Y", otherwise it should read "N".

All of the order info should be dumped in CSV format as the output using the following fields in the given order:

"BrickLink",Order Number, Order Date, Number of Parts, Number of Lots, Grand Total, Shipping, Insurance, Charge 1, Charge 2, Coupon, Credit, Payment Method, Username, E-mail address,Orders in Store, PeerOn

The "Coupon" field can be a Y/N field. If a coupon is used, the phrase "Coupon Applied" will appear between the "Order Total" and "Buyer Information" sections.

The "BrickLink" item will be a constant that appears without quotes at the beginning of each line. This represents the source of the sale as I sell in multiple locations, but all of these orders will come from BrickLink

Ingeniørvitenskap Linux Mac OS MySQL Perl PHP Prosjektledelse Python Programvarearkitektur Testing av programvare

Prosjekt-ID: #2841302

Om prosjektet

6 bud Eksternt prosjekt Aktiv Aug 18, 2009

Tildelt til:

nunos

See private message.

$63.74 USD på 14 dager
(14 omtaler)
3.7

6 frilansere byr i gjennomsnitt $58 for denne jobben

jeremiahdodds

See private message.

$51 USD på 14 dager
(26 Omtaler)
4.9
mreznik

See private message.

$63.75 USD på 14 dager
(22 Omtaler)
3.9
ivicamunitic

See private message.

$42.5 USD på 14 dager
(11 Omtaler)
3.8
jvkoder

See private message.

$63.75 USD på 14 dager
(8 Omtaler)
2.5
pwhelan

See private message.

$63.75 USD på 14 dager
(0 Omtaler)
0.0