Yadabyte
 
 
Writing a Raid Script
 
Definition
 
Processes
 
Raid Script Structure
 
Variables
 
Assigning Article categories
 
Designing and Testing Raid Scripts
 
Raid Script Commands (Comments)
 
Raid Script Commands (Specifiers)
 
Raid Script Commands (Quantifiers)
 
 
Raid Script Commands (Built-in Functions )
 

Part 6: Designing and Testing Raid Scripts

Introduction

In order to easily write Raid Scripts you will need to test the scripts as they are constructed. In order to do this you will need to use the program RaidScriptTester.exe that is installed along with NewsRaider in C:\Artciels\Scripts\Test\. For the Time being all raid Scripts operate in C:\artciles\scripts\ or C:\artciles\scripts\test\

In order to be picked up by the raid Script engine the script files must end with the extension raid.txt,. such as WorldNewsraid.txt or WorldNews_raid.txt

Using RaidScriptTester.exe

When you run raidScripttester.exe you will see a list of available Raid Scripts, these are all the files in C:\artciles\scripts\test\

that end with raid.txt.

If you select a script from the list and click the Test Raid Script button then the script will be processed and the out will be placed in the C:\artciles\scripts\test\ folder as a single file called:

ARTICLE_TEST_OUTPUT.txt

This file can be opened in a text editor or imported into TomeRaider 3 for Windows ( http://www.tomeraider.com ) so you can see if it has worked properly.

If you have used the command TEST_LINK_PROCESSING then another file will be created called:

LINK_TEST_OUTPUT.txt

All testing of raid scripts outputs into one of these test files.

The Designing and Testing process

Stage 1: Determine which Links are needed from the start page.

Once you have a site you wish to raid the first thing that needs to be done is to work out which links are articles and which are not. The best way to see this is to simply get all of the links from the page using the INCLUDE_ALL_LINKS command.

The following example shows how you can do this:

//Example 2

BEGIN_DEF

START_URL " http://www.cnn.com "

SOURCE "CNN"

CALL Start

END

BEGIN_PROCESS Start

DOWNLOAD_PAGE

INCLUDE_ALL_LINKS

GET_LINKS

TEST_LINK_PROCESSING

END

This will create the text file LINK_TEST_OUTPUT.txt in the scripts test folder that will contain all of links from CNN.com. You will then be able to fine tune it using the EXCLUDE_LINKS commands. If there are a huge number of links to exclude then it might be better for your script to not use the INCLUDE_ALL_LINKS command at all, but rather to specify exactly which links you want using INCLUDE_LINKS commands and then use EXCLUDE_LINKS along with it.

Stage 2: Designing the Article Processing

Once you have your set of article links it is time to start to write the script for processing the individual articles. You need to get at least 2 bits of information from the article page:

  1. The article title
  2. The article text

In order to allow easy testing it is advisable to test extracting one article rather than waiting till all articles are extracted. This will save your valuable time. Following example 2.1 shows how you can do this.

//Example 2.1

BEGIN_DEF

//START_URL "http://www.cnn.com"

START_URL “http://www.cnn.com/2005/WORLD/americas/04/20/ecuador/index.html”

SOURCE "CNN"

//CALL Start

CALL GetArticle

END

BEGIN_PROCESS Start

DOWNLOAD_PAGE

INCLUDE_ALL_LINKS

GET_LINKS

//TEST_LINK_PROCESSING

//REPEAT_FOR_ALL_LINKS GetArticle

END

BEGIN_PROCESS GetArticle

DOWNLOAD_PAGE

ARTICLE_FROM "<!--endclickprintexclude--><p>" to "<!--endclickprintinclude-->"

//Images

INCLUDE_IMAGES = ".jpg"

//Acquire Title

FIND_LINE "<title>CNN.com -"

VAR=LINE

VAR_REMOVE_FROM VAR_START to "<title>CNN.com - "

VAR_REMOVE_FROM " -" TO VAR_ENd

TITLE=VAR

WRITE_ARTICLE

END

You will notice that the START_URL is changed from the CNN's index page to a specific article and then we are calling "GetArticle" process instead of "Start" from DEF. This will write the output to ARTICLE_TEST_OUTPUT.txt file. If you are satisfied with the formatting of the article then you can make DEF call "Start" process and change the START_URL back to CNN's index page. Then remove the comments in //REPEAT_FOR_ALL_LINKS GetArticle and you have a working script.

Modifying Other Peoples Raid Scripts

Learning raid script to write your own scripts is great, but it is also going to be useful in tailoring other peoples raid scripts to your own needs.

For example, you might have a script for your favorite news site but you don't want any sports news, so you could change their script to exclude such articles.


 
 | Site Map | Copyright © 2005 Yadabyte. All Rights Reserved. Site by Yadabyte Websites..
Yadabyte