I want a program for creating an analytical index of an html page.
To know which words to put in the analytical index, the program will use as reference a database where we will store all relevant keywords.
The program will search in the html file these keywords.
Each time it will find one of them, it will do the following operations:
- it will take the 2 words before and the 2 words after the keyword found, - take note of how many lines there are from the most near title h2 before and
- the content of this title
- rewrite the html file putting a reference in this point for doing after a link from the analytical index to the place where the word has been found..
At this point for each word found in this way in the html file, the system will create an entry with the 2 words before and the 2 words after the relevant word and an internal link from the analytical index to the place in the file where the word has been found.
Exemple - in the following lines is the exemple with only 2 words contained in the database:
"magnetism" and "neurological":
Analytical index of important words found in this text
** Word: "Magnetism" **
1 instance found in "effects of magnetism in people"
At line 45 beginning of the chapter : the power of magnetism
1 instance found in "and the magnetism of the"
At line 130 end of the chapter: Conclusions
** Word: "neurological" **
1 instance found in "is the neurological level of" at line 131 end of the chapter: Conclusions
The chapter name will be retrieved from the nearest h2 tag found Before the word.
The line number will be calculated starting from this tag.
In this way we will know the number of lines of each chapter and we will able to add also the information:
- beginning of the chapter (first thirth)
- middle of the chapter (second thirth)
- end of the chapter (thirth thirth)
This because the html files we have to process are divided in chapters each one beginning whith the h2 tag.
I want a well done program as this program will be part of a bigger program.