Page tree


Search


 Recently Updated



 Latest Releases

 MediaPortal 1.21 
            Releasenews | Download
 MediaPortal 2 2.2 
            Releasenews | Download

Table of Contents

This page explains the structure and elements used by a WebEPG grabber file.

<?xml version="1.0" encoding="utf-8"?>

The XML declaration. It defines the XML version (1.0) and the encoding used.

<Grabber>

<Info treatErrorAsWarning="" language="" availableDays="" timezone="" version="" />

Attribute

Type

Example

Description

treatErrorAsWarning

Boolean

true

If WebEPG runs into an error (for example if there's no programs on a channel for a day) it will continue grabbing the next day if this value is set to true. WebEPG will stop processing the current channel and continue with the next if this value is set to false.

language

String

ru

 

availableDays

Integer

14

The number of days of TV listings available on the site. For example if availableDays="7" but the user set Grab Days = 14 in the WebEPG plugin, only 7 days would be grabbed.

timezone

String

GMT Standard Time

The name of the time zone for which the listings are provided.

version

String

2.0

 

<Channels>

All channels for this site is listed in child elements.

<Channel id="" siteId="" />

The information for each channel

Attribute

Type

Example

Description

Required

 

 

 

id

String

svt1@svt.se

Should match a channel id configured in channels.xml. If different grabbers are getting EPG data for the same channel they should use the same Channel id.

siteId

String

SVT1

The identifier for the channel on the site. This will be used by the [ID] variable in the Site element to construct the URL used to download EPG data.

</Channels>

End of Channels section.

<Listing type="">

Attribute

Value(s)

Description

Required

 

 

type

Html, Xml, Data

The type of the listing format of the target EPG data. Would normally be set to Html to grab EPG listings from a website.

<Site url="" post="" external="" encoding="" delay="" user-agent=""/>

Attribute

Type

Example

Description

Required

 

 

 

url

URL

http://tvguide.com/\[ID\]/\[YYYY\]-\[MM\]-\[DD\]

The URL with variables to be used when grabbing EPG data for the different channels and days. See table below for explanation of the variables. Note: Replace all ampersands (&) in the URL with &

external

Boolean

false

Use external browser (IE) for downloading page data. Will load certain Javascript sections.

Optional

 

 

 

post

 

 

 

encoding

String

utf-8

Normally auto-detected. If special characters looks wrong in your EPG Guide, try setting the correct encoding. For example ISO-8859-1 or UTF-8.

delay

Integer

1000

Time in milliseconds to wait between each HTTP request. Might be useful if the website stops responding when too many requests are made in a short period of time.

user-agent

String

Mozilla/5.0 (Windows NT 6.2; WOW64; rv:16.0) Gecko/20100101 Firefox/16.0

WebEPG will connect to the target website and since it isn't a proper web browser some sites may reject the connection. You can use the User-Agent string to simulate a web user in each HTTP request. If not specified, the default will be used. Default is Mozilla/4.0 (compatible; MSIE 6.0;  WindowsNT 5.0; .NET CLR 1 .1.4322)

 

The url and post attributes can use variables which will be used during grabbing to construct different URL:s for the different channels and days.

 

Tag

Description

[ID]

Site Channel ID - from the ChannelList section

[LIST_OFFSET]

Offset position in a list longer than one page. Starts at 0 and is the MaxCount for the next page. MaxCount is added for each page after used together with MaxCount. If number of listings on a page is less than MaxCount it stops looking for more pages.

[PAGE_OFFSET]

Same as LIST_OFFSET but only 1 is added for each new page and not MaxCount.

[DAY_OFFSET]

Offset of the day from today (0). Use startOffset attribute of the Search element to change the start.

[YYYY]

Year

[MONTH]

Month full name (e.g. January)

[MM]

Month with leading 0

[_M]

Month without leading 0

[WEEKDAY]

Day of week full name (ie Monday). Weekday names can be changed by including a WeekDayNames section in the Search element.

[DAY_OF_WEEK]

Day of week as a number. 0 = Sunday, 6 = Saturday. Specifying startOffset attribute in the Search element will shift the first day of the week by the same amount of days. E.g. when startOffset is set to 2: 0 = Friday, 6 = Thursday.

[DD]

Day with leading 0

[_D]

Day without leading 0

[EPOCH_TIME]

Number of seconds since 1/1/1970 8:00:00 AM

[EPOCH_DATE]

Number of days since 1/1/1970 8:00:00 AM

[DAY_NAME]

A string for the name. For example: today, tomorrow, etc. Requires DayNames child element of Search.

<Search startOffset="" maxlistings="" listStart="" startPage="" endPage="" language="" weekday="" />

Attribute

Value(s)

Description

Optional

 

 

startOffset

Integer

Used for configuration of [DAY_OFFSET] and [DAY_OF_WEEK].

maxlistings

Integer

Used for configuration of [LIST_OFFSET] and [PAGE_OFFSET].

listStart

Integer

Used for configuration of [LIST_OFFSET].

startPage

Integer

Used for configuration of [PAGE_OFFSET].

endPage

Integer

Used for configuration of [PAGE_OFFSET].

language

String

Language to use for [WEEKDAY]. Must be a specific country/language not a neutral language group. For example "es-ES" not just "es".

weekday

dddd, ddd

Format for weekday (long or short).

<DayNames>

<Day>value</Day>

The name of the day to be used with [DAY_NAME] tag in URL.

</DayNames>

End of DayNames.

<WeekDayNames>

Optional section to redefine weekday names. If present these will be used instead of the weekday format specified above.

<WeekDay>value</WeekDay>

The name of each day to be used with [WEEKDAY] tag in URL. The first day is by default Sunday, but can be shifted by setting start startOffset. Increasing startOffset will shift days backwards. E.g. when startOffset = 1, first day is Saturday.

</WeekDayNames>

End of weekday names section.

</Search>

End of Search element.

<Html>

Must match listing type. Child elements contains tools for parsing HTML web pages.

<Template name="" start="" end="">

Every grabber must have at least one Template element. It contains the TemplateText element which will be used to match fields, such as #START and #TITLE, on the target web site. It's possible to include different templates for use on different target web pages. For example #TITLE could be matched on the main web page and #DESCRIPTION from a subpage.

 

Attribute

Type

Example

Description

Required

 

 

 

name

String

default

The template name. A template named "default" is required.

Optional

 

 

 

start

String

<!-- Program -->

A string to search for which signifies the start of the listing area. E.g. a heading or the class used by the list. Everything before this string will be ignored.

end

String

<!-- End Content -->

A string to search for which signifies the end of the listing area. Everything after this string will be ignored.

<SectionTemplate tags="">

Attribute

Type

Example

Description

Required

 

 

 

tags

String

TSD

The first letter of each HTML tag to be used for matching. Letters must be in upper case. Multiple tags are given in a string. All other HTML tags can be ignored when creating the Template Text.

Some common tags:

Letter

Tag(s)

T

All table tags <table>, <tr>, <td>, <th>, etc

D

<div>

S

<span>

P

<p>

H

<h1>, <h2>, etc

I

<img>

A

<a>

Although the first letter is not unique for every different HTML tag, it is generally good enough to build a unique template for finding data on the page.

<TemplateText>

The template is the HTML tags and data fields that make up the program listing. It can be made up of any HTML tags, however, ONLY those listed in the tags attribute of the SectionTemplate will be used for matching. The others will be ignored. Only the element name of tags are used for matching! Attributes are ignored. For example template "<SPAN class="class1">" will match any <SPAN> tag, not only those with class="class1". However it is useful to write more self descriptive template text, not only the shortest possible.
The template special tags are used by WebEPG to locate the required data.

See WebEPG Template for detailed information on how to create the TemplateText.

 

Tag

Description

Required

 

#START or #STARTXMLTV

Program start time. Possible START time formats: * hh:MM am/pm * HH:MM * HH.MM * HHhMM   STARTXMLTV format: 20080113011500

#TITLE

Program title.

Optional

 

#END

Program end time.

#ENDXMLTV

Program end time in XMLTV format.

#DESCRIPTION

Program description text.

#DAY

Program day (required if not part of page look up).

#MONTH

 

#SUBTITLE

Program or series episode name.

#GENRE

Program genre.

#EPISODE

Episode number.

#SEASON

Season number.

#ACTORS

List of actors.

*MATCH

Dynamic tag used by MatchList to find a text string in the HTML code.

*VALUE

Dynamic tag used in combination with *MATCH to store the matched text in the field specified in the Match element.

Z

Used to make a template for a variable structure and deal with optional information. See Dynamic Templates for more information.

</TemplateText>

End of the TemplateText.

<MatchList>

List of Match elements used together with the *MATCH and *VALUE dynamic template tags to grab text from HTML code where the normal tags can't be used. See Dynamic Templates for more information.

<Match field="" match="" />

Attribute

Type

Example

Description

Required

 

 

 

field

#FIELD

#ACTORS

The matched text of the *VALUE dynamic tag will be stored in this field.

match

String

Cast:

This string is used by the *MATCH dynamic tag to search for text in a relative position to the *VALUE tag.

</MatchList>

End of MatchList

</SectionTemplate>

End of the SectionTemplate

</Template>

End of the Template

<DataPreference>

<Preference template="" title="" subtitle="" genre="" description="" />

Attribute

Value(s)

Description

Required

 

 

template

template name

The name of the template

title

0-3

Preference of this value

subtitle

0-3

Preference of this value

genre

0-3

Preference of this value

description

0-3

Preference of this value

</DataPreference>

<Sublinks>

Sublinks are linked pages that contain extra data, that may not be provided on the main listing page. Optional.

<Sublink search="" template="">

Attribute

Value(s)

Example

Description

Required

 

 

 

search

Search string

/guiden/expand

String to identify the correct <A href> tag for this sublink. Only part of the target link needs to be specified. WebEPG will automatically use the entire href attribute to download the subpage.

template

Template name

details

Name of the template to use for this sublink. Must match a template name.

<Link url="" post="" external="" encoding="" user-agent=""/>

Optional. Only required if URL cannot be built from the main site URL. See Site url for details.

Attribute

Value(s)

Example

Description

url

URL

http://www.sol.no/guiden/expand.cgi?\[1\]

[1] can be used to match unique parts of the link, such as an ID for the show.

post

 

 

 

external

Boolean

false

Use external browser (IE) for downloading page data. Will load certain Javascript sections.

user-agent

String

Mozilla/5.0 (Windows NT 6.2; WOW64; rv:16.0) Gecko/20100101 Firefox/16.0

WebEPG will connect to the target website and since it isn't a proper web browser some sites may reject the connection. You can use the User-Agent string to simulate a web user in each HTTP request. If not specified the default will be used. Note that if you specify user-agent in the Site element it will NOT be propagated here.

Example

To match the following HTML code:

<a href="http://www.tvtoday.de/programm/detail/?sid=107036189606&format=detail>

we may use an url attribute like:

http://www.tvtoday.de/programm/detail/?sid=[1]&amp;format=detail

</Sublink>

End of this Sublink.

</Sublinks>

End of the Sublinks section.

<Searches>

<Search match="" field="" remove="" />

Attribute

Value(s)

Description

Required

 

 

match

regex search

regex to find data

field

#Field name

Name of the field used to store the data

remove

true/false

Remove data from store. Stops data being added to other fields

This command searches the whole section of source page matching the template (all tags, their attributes and values). It finds the value corresponding to given regular expression match and pastes it to given field. If remove is set to true, the whole text corresponding to regular expression match will be cut out of the source page, so it will not be part of output from template parsing. It can be also used to remove undesired parts of descriptions, titles etc. More than 1 search can be used, however only latest match will be used.

Same fields as for TemplateText are allowed.

Example:

<Search match="\([0-9]{1,3}[,][0-9]{0,3}\)" field="#EPISODE" remove="true" />
<Search match="\([0-9]{1,3}\)" field="#EPISODE" remove="true" />
<Search match="\([0-9]{1,3}[/][0-9]{0,3}\)" field="#EPISODE" remove="true" />

This complex search searches for episode number in any of the form (N), (N1, N2) or (N/Count). Episode number will be removed.

</Searches>

End of Searches section

<DateTime>

<Month>value</Month>

Used for matching <#MONTH> tag in template. Only required if <#MONTH> tag is use in a template.

Value is the tet as found on the site. There must be 12 months in the correct order (Jan-Dec).

</DateTime>

End of DateTime

</Html>

End of the Html section

<Xml channel="" xpath="">

Must match listing type. Child elements contains tools for parsing Xml results.

 

Tag

Description

Required

 

channel

Filter to apply to the list of EPG-elements

xpath

Xpath expression which returns the EPG-elements

<TemplateText>

The template is the HTML tags and data fields that make up the program listing. It can be made up of any HTML tags, however, ONLY those listed in the tags attribute of the SectionTemplate will be used for matching. The others will be ignored. Only the element name of tags are used for matching! Attributes are ignored. For example template "<SPAN class="class1">" will match any <SPAN> tag, not only those with class="class1". However it is useful to write more self descriptive template text, not only the shortest possible.
The template special tags are used by WebEPG to locate the required data.

See WebEPG Template for detailed information on how to create the TemplateText.

 

Tag

Description

Required

 

#START or #STARTXMLTV

Program start time. Possible START time formats: * hh:MM am/pm * HH:MM * HH.MM * HHhMM   STARTXMLTV format: 20080113011500

#TITLE

Program title.

Optional

 

#END

Program end time.

#ENDXMLTV

Program end time in XMLTV format.

#DESCRIPTION

Program description text.

#DAY

Program day (required if not part of page look up).

#MONTH

 

#SUBTITLE

Program or series episode name.

#GENRE

Program genre.

#EPISODE

Episode number.

#SEASON

Season number.

#ACTORS

List of actors.

</TemplateText>

End of the TemplateText.

<Fields>

Mapping from xml-attributes to EPG-fields

This contains a number of "Field" nodes, with these attributes:

<Field name="" xmlname="" />

Tag

Description

Required

 

name

EPG-Field (f.e. #START or #TITLE)

xmlname

Name of xml-attribute

</Fields>

End of the Fields

 

Example:

<Xml channel="id=28" xpath="airing">
    <Fields>
        <Field name="#START" xmlname="air_time" />
        <Field name="#TITLE" xmlname="title" />
        <Field name="#DESCRIPTION" xmlname="description" />
    </Fields>
</Xml>

</Xml>

End of Xml section

<JSON channel="" xpath="">

To be able to parse JSON data returned from the web source.

Tag

Description

Required

 

channel

Filter to apply to the list of EPG-elements

xpath

Xpath expression which returns the EPG-elements. Note that this is a custom implementation of Xpath, and not all possibilities are supported (yet).

<Fields>

Mapping from JSON-attributes to EPG-fields. This contains a number of "Field" nodes, with these attributes:

<Field name="" jsonname="" />

Tag

Description

Required

 

name

EPG-Field (f.e. #START or #TITLE)

jsonname

JSON-attribute

</Fields>

End of the Fields

 

Example:

<JSON channel="channel/id=28" xpath="airing">
    <Fields>
        <Field name="#START" jsonname="air_time" />
        <Field name="#TITLE" jsonname="title" />
        <Field name="#DESCRIPTION" jsonname="episode/original_title" />
    </Fields>
</JSON>

</JSON>

End of JSON section

<Data>

Must match listing type.

</Listing>

End of Listing section

<Actions>

<Modify channel="" field="" search="" action="">value</Modify>

Attribute

Value(s)

Description

Required

 

 

channel

* or channel id

The channel on which the modify will be performed. (* = all channels)

field

field to modify

 

search

search string

 

action

Replace/Remove

 

value

string to replace

Only required for Replace action.

</Actions>

End of the Actions section

</Grabber>

End of the grabber config

Further Information

Changelog

Change

Date

Release

WebEPG Grabber

2013/11/01

1.6.0

 

 

 

   

 

2 Comments

  1. says:
    I tried to explain MatchList as I understand it (I've never used MatchList but there's one grabber using it - PT\www_tvcabo_pt.xml). More details are found in the WebEPG Template page.
    Posted Nov, 18 2012 16:55

  2. says:
    What exactly does the "MatchList" tag do? It isn't really explained.
    Posted Aug, 17 2012 17:55