Wiki Navigation

Loading...

<#xxx> Tags

A simple template would look like this:

<tr> <td><#START></td> <td><#TITLE></td> <td> </tr>

The parser searches for this pattern in the HTML source and reports the number of times it finds it.

When you ask it to parse a certain occurance, it will get the text form the html source, located where the <#START> and <#TITLE> tags are and pass this into an IParserData object using the SetElement (string tag, string value) method.

In this case tag = "#START" or "#TITLE" and value will be the text located in the html source at this location. Characters can be put in front and behind the <#> tags to remove part of the text.

so "-<#START>." will search for the '-' and '.' and pass what is between these as the value string into the SetElement method. To use more then one character in front and behind you need to use the following syntax <#TAGNAME:front,back>, where front and back are search strings (either can be empty). If no search strings/characters are given, then it will go to the next tag. Of cause extra parsing can be done in the IParserData object. You just need to create a new class with this interface.

The tag names can be anything, and just need to be in your template and in the IParserData class must know what to do with them. I have made a very simple ParserData class which just stores the tag/value pair in a Dictionary, these can then be retreived by tag name later. This will take any tag and value pair.

The an example IParserData class however looks like this:

switch (tag)
{
  case "#START":
    BasicTime startTime = GetTime(element);
    break;
  case "#TITLE":
    _title = element.Trim(' ', '\n', '\t');
    break; ...

It does extra parsing of the element values, for example trimming the spaces and other junk or parsing the time values from strings.

The Tags variable, tells the parser which HTML tags are interesting, all other tags will be ignored. It is the first character of the HTML tag name.

So "T" = all table tags "I" = img "D" = div "!" = comment etc.

I take all table tags as one group, mutliple tags can of course be given ie "TSD" (table, span, div), etc, etc.

So in this example I would use "T" as all the tags are table tags (ie starting with the letter T). This means that the real HTML source could have other tags in it, but the parser would match it because it would just ingore these tags.

General it to use a few tags as required to make the template unique to the data. Using too many tags can mean small changes require template changes. Such tags like table tags which define structure are good, because the structure doesn't often change.

<Zxx> tags

This tag is used to make a template for a variable structure and deal with optional information. Some website add extra information by changing the html structure (ie adding extra table rows).

With this tag regex code can be used.

at <z> tag must also have an end tag </z>. This indicates the start and end of the area with is considered optional.

Example:

<tr> <td><#START></td> <td><#TITLE></td> <z(><td><#OPTIONAL></td></z)?> </tr>

In this example the simple regex ( )? is used to indicate that this part is optional.

In regex ? is the same as (){0,1} 0 or 1 times. At the moment the system has problems with any number greater then 1, as it causes an imblance between the template and the source. (If realy required, I can look into fixing this).

I have not needed/tried other regex code. It will except any valid regex code, but whether it parses or not is another question.

For more details on regex try this site: http://www.regular-expressions.info/

This can be used in the test program.

<*xxx> tags

There are currenly only 2 <*> tags: <*MATCH> and <*VALUE>. These tags must be used as a pair.

These tags require an extra list with is passed into the HtmlParser class with the template if required.

This list has a Match value and a Field value, both strings.

This tag set is used as follows:

Template: <table> <z(> <tr> <td><*MATCH></td> <td><*VALUE></td> </tr> </z)?> <z(> <tr> <td><*MATCH></td> <td><*VALUE></td> </tr> </z)?>

List:
MATCH FIELD
Time #TIME
Date #DATE

In this case the parser will try to match the text located by the <*MATCH> tag with the list of match strings and the store the text located by the following <*VALUE> tag into the corresponding field.

Seitenhierarchie

<#xxx> Tags

<Zxx> tags

<*xxx> tags

About The Project

Quick Navigation

Support MediaPortal!

Seitenhierarchie

Parser Template

<#xxx> Tags

<Zxx> tags

<*xxx> tags

About The Project

Quick Navigation

Support MediaPortal!