Click here to Skip to main content
15,867,686 members
Articles / Web Development / ASP.NET
Article

YASS - Yet Another Site Searcher

Rate me:
Please Sign up or sign in to vote.
4.10/5 (11 votes)
26 Feb 20042 min read 88.4K   2.4K   46   16
A small single site Webcrawler with built-in scheduler.

Introduction

Want an effective search routine for your website containing content in both static HTML/ASPX files, as well as SQL Servers, databases etc. etc.?

Background

After having been the programmer on about ... well.. A LOT of websites, I got fed up with the existing search engines on the market, both freeware and commercial, and writing my own custom search engine every time that searched through our database content was getting a bit tedious at the end. So, I decided to write my own search engine.. based on the same concepts as a normal spider/webcrawler.

But the idea basically caused a few headaches to me...

  • Speed.. crawling through hundreds of pages was kinda slow.. even on a fast server
  • No custom software on the server.. Developers using 3rd party hosting can't always persuade the hosting company to run scheduled tasks on their server
  • No SQL Server dependency...
  • EASY IMPLEMENTATION!

Speed, we all want it, but crawling through a whole website in real-time doesn't work with that, so I decided to build a caching search engine. And with the AWESOME threading capabilities of .NET, the 2nd requirement became quite easy to solve.... The 3rd requirement was solved in like 2 seconds.... DataSets/DataTables ... learn to use/love 'em :). Number 4 .. well.. I'm lazy .. :)

Using the code

YASS is VERY easy to implement on your website. Here's an example of how to search the site.

C#
DataTable result = SiteSearch.Search("Search words");

That's it! You've now got a DataTable containing the URLs and the URL ranking returned in a DataTable.. Calling the indexing service itself is also quite basic.

C#
// this will run the indexer in a background thread once.. 
IndexerSchedule.Install(0);

// this will run the indexer in a background thread every hour
IndexerSchedule.Install(60*60); // takes seconds as argument

Pretty easy eh? :)

The downside to this is that every time the ASP worker process on the server gets restarted/killed, the indexer thread disappears.. but there is an easy solution to that to put the function in your Global.asax.

C#
protected void Application_Start(Object sender, EventArgs e)
{
    IndexerSchedule.Install(60*60);
}

I've included a very simple example, as well as my yass.cs source.. but please, don't hit me. It's VERY messy.. and will be cleaned up later.

Requirements

The only things you have to do, is to put the yass.dll in your bin folder, and add these three keys in your Web.config appsettings.

XML
<add key="YASSHost" value="http://localhost" />
<add key="YASSEntrypoint" value="/default.aspx" />
<add key="YASSXmlDir" value="c:\\inetpub\\wwwroot\\yass\\xml\\" />

Make sure that the folder specified has read/write rights.. otherwise this will fail...

Future

So far my future plans are:

  • Make it faster...it seems to slow down with approx 1200 pages
  • Make support for more Entrypoints in web.config
  • Make SQL server plug-in
  • Make the DataTable return "teaser" text under each URL
  • Make support for Exclude URLs/filetypes in web.config
  • Clean up my yass.cs code.. and make it readable for people who can't read Danish

History

1.0 - First hack.. done in 3 days.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer
Denmark Denmark
Webdeveloper based in Holstebro, Denmark
Been developing various websolutions over the past 6 years, Democoder in sparetime..

Comments and Discussions

 
QuestionData.xml Pin
yasser said10-Feb-09 4:44
yasser said10-Feb-09 4:44 
Questionerror Pin
rahuldot18-May-08 21:39
rahuldot18-May-08 21:39 
GeneralGood Job Pin
Member 120032121-Jul-04 8:05
Member 120032121-Jul-04 8:05 
GeneralSorry guys Pin
Kenneth "fessor" Christensen23-Mar-04 21:43
Kenneth "fessor" Christensen23-Mar-04 21:43 
GeneralRe: Sorry guys Pin
Member 120032121-Jul-04 15:08
Member 120032121-Jul-04 15:08 
GeneralI have added some things and created it as a VS-project Pin
Esben Sundgaard5-Mar-04 8:47
Esben Sundgaard5-Mar-04 8:47 
GeneralRe: I have added some things and created it as a VS-project Pin
Member 120032122-Jul-04 10:17
Member 120032122-Jul-04 10:17 
GeneralBug with parent folder links Pin
Jos Branders1-Mar-04 1:47
Jos Branders1-Mar-04 1:47 
GeneralRe: Bug with parent folder links Pin
Kenneth "fessor" Christensen1-Mar-04 1:51
Kenneth "fessor" Christensen1-Mar-04 1:51 
GeneralRe: Bug with parent folder links Pin
Kenneth "fessor" Christensen2-Mar-04 0:01
Kenneth "fessor" Christensen2-Mar-04 0:01 
GeneralRe: Bug with parent folder links Pin
Jos Branders2-Mar-04 0:38
Jos Branders2-Mar-04 0:38 
GeneralASP.NET style demo page Pin
Jos Branders29-Feb-04 5:01
Jos Branders29-Feb-04 5:01 
GeneralRe: ASP.NET style demo page Pin
Kenneth "fessor" Christensen29-Feb-04 8:46
Kenneth "fessor" Christensen29-Feb-04 8:46 
GeneralRe: ASP.NET style demo page Pin
tsmyrnio8-Mar-04 19:45
tsmyrnio8-Mar-04 19:45 
GeneralNice Solution... Pin
Matthew Hazlett27-Feb-04 0:23
Matthew Hazlett27-Feb-04 0:23 
GeneralRe: Nice Solution... Pin
Kenneth "fessor" Christensen27-Feb-04 0:27
Kenneth "fessor" Christensen27-Feb-04 0:27 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.