-- 关注搜索引擎的开发

日历

2025年7月

日

一

二

三

四

五

六

统计

随笔 - 82
文章 - 2
评论 - 228
引用 - 0

随笔分类(45)

随笔档案(82)

文章档案(2)

2006年4月 (2)

Java Spaces

搜索

积分与排名

积分 - 66204
排名 - 812

阅读排行榜

评论排行榜

Good or Bad, Check your OO Design

An idea is proposed by a PHD student of University of Auckland to check your OO Design on Java. The key point is to use directed graph to analyze the dependencies between all java classes, and the more classses involved in some cycle, the worse design it is.

Several Java Open source softwares have been examed in his research report...
Though it is not the only metric to check your OO design, I'd like to say that it is an interesting thought.

posted @ 2006-06-08 03:05 Dedian 阅读(990) | 评论 (0) | 编辑收藏

Retrieve values in HashTable or HashMap

Unlike collection types such as Vector or List, Map (HashTable or HashMap) accesses a value by a key. If we want to retrieve all the values that have been put in a Map, one of simple ways to do that is employing a Collection or plus an Iterator, here is the sample code (just retrieve vaules, skip keys), assuming there is a variable: HashMap<String, <ComplexDataType>> links

Collection c = links.value();
Vector<ComplexDataType> v = new Vector<ComplexDataType>(c);
for(int i = 0; i< v.size(); i++)
{
ComplexDataType tempData = (ComplexDataType)v.get(i);
dosomethingwith(tempData);
}

P.S. Map provides three views of map: keySet, entrySet and values collection, we can use any of them .

posted @ 2006-06-02 07:16 Dedian 阅读(349) | 评论 (0) | 编辑收藏

Java Interview Questions

These questions are very useful for some Java newbies and guys who wanna prepare some interviews on Java programming positions, which is really cool.

reference:
http://www.allapplabs.com/interview_questions/java_interview_questions.htm
http://www.allapplabs.com/interview_questions/java_interview_questions_2.htm
http://www.allapplabs.com/interview_questions/java_interview_questions_3.htm
http://www.allapplabs.com/interview_questions/java_interview_questions_4.htm
http://www.allapplabs.com/interview_questions/java_interview_questions_5.htm
http://www.allapplabs.com/interview_questions/java_interview_questions_6.htm

posted @ 2006-06-02 06:14 Dedian 阅读(398) | 评论 (0) | 编辑收藏

Java Reading & Writing file

1. Reading text from Standard Input

try 
{
       BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
       String str = "";
       while (str != null) 
       {
          System.out.print("> some prompt ");
          str = in.readLine();
	  dosomethingwith(str);
       }
} 
catch (IOException e) 
{
}

2. Reading text from a file

try 
{
     BufferedReader in = new BufferedReader(new FileReader("filename"));
     String str;
     while ((str = in.readLine()) != null) 
     {
	dosomethingwith(str);
     }
     in.close();
} 
catch (IOException e) 
{
}

3. Reading a file into a BityArray

    // Returns the contents of the file in a byte array.
    public static byte[] getBytesFromFile(File file) throws IOException 
    {
        InputStream is = new FileInputStream(file);

        // Get the size of the file
        long length = file.length();

        // You cannot create an array using a long type.
        // It needs to be an int type.
        // Before converting to an int type, check
        // to ensure that file is not larger than Integer.MAX_VALUE.
        if (length > Integer.MAX_VALUE) 
	{
            // File is too large
        }

        // Create the byte array to hold the data
        byte[] bytes = new byte[(int)length];

        // Read in the bytes
        int offset = 0;
        int numRead = 0;
        while (offset < bytes.length
               && (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) 
	{
            offset += numRead;
        }

        // Ensure all the bytes have been read in
        if (offset < bytes.length) 
	{
            throw new IOException("Could not completely read file "+file.getName());
        }

        // Close the input stream and return bytes
        is.close();
        return bytes;

    }

4. Writing to a file

try 
{
    BufferedWriter out = new BufferedWriter(new FileWriter("filename"));
    out.write("some string");
    out.close();
} 
catch (IOException e) 
{
}

Note: If the file does not already exist, it is automatically created.

5. Appending to a file

try 
{
     BufferedWriter out = new BufferedWriter(new FileWriter("filename", true));
     out.write("appending String");
     out.close();
} 
catch (IOException e) 
{
}

6. Using a Random Access File

try 
{
     File f = new File("filename");
     RandomAccessFile raf = new RandomAccessFile(f, "rw");

     // Read a character
     char ch = raf.readChar();

     // Seek to end of file
     raf.seek(f.length());

     // Append to the end
     raf.writeChars("aString");
     raf.close();
} 
catch (IOException e) 
{
}

reference:
http://javaalmanac.com/egs/java.io/pkg.html

posted @ 2006-05-31 08:12 Dedian 阅读(569) | 评论 (1) | 编辑收藏

Java Glossary -- Volatile

volatile

The volatile keyword is used on variables that may be modified simultaneously by other threads. This warns the compiler to fetch them fresh each time, rather than caching them in registers. This also inhibits certain optimisations that assume no other thread will change the values unexpectedly. Since other threads cannot see local variables, there is never any need to mark local variables volatile.

quote from:

http://mindprod.com/jgloss/volatile.html

posted @ 2006-05-25 04:45 Dedian 阅读(314) | 评论 (1) | 编辑收藏

Lucene 2.0 release mostly this Friday

Though still under voting, it is originally mentioned by Doug Cutting, and got only positive votes. So it is very likely we can get a 2.0 release version on this Friday. Some bugs has been fixed and deprecated code has been removed in this approaching version.

posted @ 2006-05-24 09:00 Dedian 阅读(230) | 评论 (0) | 编辑收藏

岁月遐想

二十年前

我受着老师家长的各种表扬带着各种的小红花拿着各种的竞赛奖状

我现在的老板也许正在池塘里抓鱼树上捕知了向家长闹棒棒糖吃

十年前

我开始谈恋爱开始在月光下行走在没人行走的小道上开始学着犹豫地写诗

我现在的老板也许正在狂啃高中课本而郁郁寡欢或许也开始递小纸条给邻座的小女生

十年后的今天

恋人终成我的内人然后我在吭哧吭哧地在我现在的老板提供的一片小天地下写着莫名其妙的代码

邻座的小女生终成记忆然后我现在的老板在我10米不远的窗明几净的空旷的房间里看着我以及100号在他眼里和我差不多的人卖命地为他写着代码而轻松的听者不知是不是摇滚的音乐而摇头晃脑。

十年后的明天

？

结局1：

内人依然还是内人我还在吭哧吭哧地写着代码身边却多了一个长着和我有些许相似的小孩拽着我的胳膊闹着要用我的电脑玩游戏

无数的漂亮女生在大楼里走马观花然后我现在的老板在我100米以外不知是不是房间的里面开着大会和着几个肥头大耳的股东讨论着我以及1000号类似的人类的存活问题

结局2：

内人依然还是内人我终于省吃俭用和内人开办有史以来第一个属于自己的公司坐在属于自己的窗明几净的办公室里看着外面100号年轻如20年前的我的小兄弟们热火朝天的干着革命

漂亮的女生们依然走马观花现在我的老板在更高更大的高楼大厦里和着几个肥头大耳的股东讨论着怎么把曾经是他的手下如今却成了一个小老板的我的公司进行兼并的大事。

结局3：

内人依然还是内人我却拥有一个属于自己的公司办公室聚集着一帮曾经是我的同事以及现在的老板混在其中的人群在空调房里为我出谋划策或者吭哧吭哧地写着和10年前不一样的代码

一个漂亮的女生终于成为漂亮少妇现在的老板却因为经营不善转手把公司卖给曾经在他手下吭哧吭哧写代码的我然后我给了他一个不错的职位让他养家糊口娶妻生子。

P.S. 函数 Likely(结局n) (1<=n<=3)为严格单调递减函数，其上限为0.0001

P.S.

以上岁月遐想纯属yy,我的老板不是中国人，没有我yy中的他的少年以及青年。既然他不懂中文，我这里用中文进行yy决不会有落把柄在他手中的危险。写这段yy的话的目的是表达我对年轻的他的敬仰(希望他能看懂这句中文)，以及我还未泯灭在幸福生活中的一点雄心。

posted @ 2006-05-20 13:28 Dedian 阅读(280) | 评论 (0) | 编辑收藏

Ooops! my laptop not working...

Oops! My laptop, Compaq Presario R3230, is not working now (just worked yesterday evening), blue screen, hangs at disk checking...when I reboot with safe mode, it still hangs at is multi(0)disk(0)rdisk(0)partition(1)\windows\system32\drivers\atisgkaf.sys, I guess there is something wrong with my video driver, but how can I fix that problem without wipe out my documents in harddriver?

I am trying to google by it, it seems some guys also got that problem, some steps are suggested:

1. Insert the QuickRestore CD into the CD drive and restart the
system.
2. When the red Compaq logo appears, press and hold the Caps
Lock key. Next screen will be a blinking QuickRestore screen.
3. When the QuickRestore text stops blinking, press and hold the
Num Lock key.

but where can I get QuickRestore CD? included CD seems not in my room any more...anybody has thought about that?

posted @ 2006-05-20 04:32 Dedian 阅读(189) | 评论 (0) | 编辑收藏

最近的一些心得 -- 关于搜索引擎

由于工作的需要，最近对搜索引擎感兴趣起来，下面有些心得：

1。其实要让自己的Blog的点击率狂涨的办法很简单，就是写一个最简单的webcrawler程序，不断的访问自己的主页(发送http请求)，很多计数器的原理就是根据这个来计算的，而不会核实IP地址，不信，只要自己F5刷新一下自己的页面就知道了。照这样下去，点击率超过老徐是肯定没有问题的。不过，新浪本来就玩点击率猫腻的，因为他们可以自己修改计数器，所以和他们玩这个没有意义。

2。点击率高并不表示你的页面排名高(PageRank)。PageRank是一个技术含量比较高的词，想当初Google那两个毛头小伙子Larry Page(真的很巧和，那小子的姓居然是Page,真的想不做Page的老大都不行)和 Sergey Brin就是靠在斯坦福期间有关PageRank的研究发家的，如今年纪轻轻就可以和MS叫板。当然，Google的PageRank的算法是商业秘密。不过网上牛人不乏其数，居然有人根据Google的一些搜索行为和利用概率建模等数学知识硬是弄出一套PageRank的解释，在网上大为流行。那篇Paper只要Google一下PageRank Uncovered(by Chris Ridings and Mike Shishigin)就可以找到。据说，还有人利用里面的机制大大戏弄了一把Google的搜索引擎。不过已无法考证，因为Google也在不断完善自己。

3。简单来说，PageRank就是一个衡量自己网站或网页的重要性的一个很关键的指标。其概念的核心简单来说就是看有多少网页链接到你的网页，特别是有多少重要的网页链接到你的网页。换句话说，如果老徐的Blog因为其点击率或在全国人民的博客世界的影响力使得其PageRank达到10，即为一非常重要之网页，而你又有幸得到老徐的青睐加为友情链接，即她之重要网页有链接指向了你的网页，则你的PageRank必有所提高。当然，这只是一个非常简单的例子，具体的公式还没那么简单，自己有兴趣可以在网上查到，即便这样，这只是一个因素而已。不过这就不难理解为什么会有那么多的人会在名人的博客上抢沙发甚至故意大放厥词已引起各方注意了。也就不难理解广告做到博客上去了。

4.其实，PageRank的idea来源于我们平时的生活中。比如，我想买一个电脑，我希望一个懂电脑的人告诉我买什么电脑。比如我知道小王比较懂，我就会问小王，小王说，恩，dedian牌电脑不错，就买dedian牌电脑吧。我说，好吧，就买它了，可你是怎么知道的呢，哪里有介绍呢，有哪些优点呢？小王说，这。。。，我也不是很清楚，我也是听小李那丫说的，你去问那小子吧。这时，即便我不认识小李，可他在我心目中的形象一下高大了许多，小王都要听他丫的。。。

5。所以，要让自己的网页或网站就有影响力，就要千方百计让别人来连接你，来引用你。当然还有一种办法，就是不断的引用别人的文章，这里的引用不是说在你自己的网页里嵌上别人的连接，而是利用别人的网页嵌上自己网页。怎么做，其实就是很多Blog的Trackback的功能，细心可以发现，只要你Trackback别人的Blog,你的Blog地址就留在别人的Blog的网页里(comments一样)。不过，现在大都的blog都开始有设置不允许别人Trackback或comments.新浪好像也开始做了手脚，名人的博客不让引用了好像，不过新浪的博客对很多的搜索引擎都不友好，也就别动他的主意了。倒是MSN space似乎可以，可以写一段代码自动连到各个网页上fetch出每个blog的permalink然后执行一段MSN自己提供的javascript就可以trackback了，不过这只是我最近想到的，还没有写代码实现。如果可以成功的话，很多其他的博客也一样可以成功。这个想法是最近老看到一些乱七八糟的网站出现在我的trackback里想到的。

6。不过现在网上提供越来越多的服务会杜绝类似的不友好攻击行为。比如，如果你很讨厌有人在你的博客里乱引用，乱写评论。你可以申请一个类似托管的服务，就是让另一个网站先收集那些留言或评论，再筛选，再放到你的博客上。总之，网络的林子大了，什么鸟都有。

posted @ 2006-05-19 16:15 Dedian 阅读(1539) | 评论 (3) | 编辑收藏

Notes for exploration of Search Engine (keep updating...)

+ Webcrawler

    -- study open source code
          purpose: analyze code structure and basic componences
          focus on: Nutch (http://lucene.apache.org/nutch/)
                    & HTMLParser (http://htmlparser.sourceforge.net/)
                   & GData(http://code.google.com/apis/gdata/overview.html)

    -- understand PageRank idea
       relative articles:
       http://en.wikipedia.org/wiki/PageRank
       http://www.thesitewizard.com/archive/google.shtml
       paper : "PageRank Uncoverd" by Chris Ridings and Mike Shishigin
       http://www.rankforsales.com/n-aa/095-seo-may-31-03.html (about Chris Ridings & SEO)
       http://en.wikipedia.org/wiki/Web_crawler (basic idea about crawler)

    -- familar with RSS & Atom protocol

    -- sample coding:
       Interface: Scheduler for fetching web links
       Interface: Web page paser/Analyzer --> to deal with XML-based websites(Weblogs or news sites, RSS & Atom) --> Paser classes based on SAX parser
       Interface: Retractor/Fetcher --> to get links from page
       Interface: Collector --> check URL whether duplicated and save in URL database with certian data structure
       Interface: InformationProcesser --> PageRank should be one important factor --> (under thinking)
       Interface: Policies(Filter) --> will be served for Collector and InformationProcessor --> (under thinking)

+ Indexer/Searcher (almost done base on Lucene)

posted @ 2006-05-19 09:40 Dedian 阅读(301) | 评论 (1) | 编辑收藏

my favorite way to load a Java project

Motivation:

always, if you wanna check/analyze source code or do some contribution in open source communities, you would like to download the source code of some projects and load (or import) it into your own IDE. (if you don't wanna use CVS or SVN)

Following is my favorite way to do that under Eclipse:

1. create a new blank Java project:

File -> New -> Project ... -> Java Project --> Next >> -> input the project name (project layout: Create seperate source and output folders) --> click Finish

2. right click Source Folder "src" --> import ... -> select File system -> choose correct source code folder where you put the downloaded source code by click the top "Browse..." button (source code folder means the root folder thus can keep folder structure as package structure) --> Finish

3. if you import wrong source code folder, you can delete whole project to redo. (it is no use merely deleting some failed packages)

Note:

if there is Ant build file (some stuff like build.xml) included in source code package, that will be cool, just using File -> New -> Project... -> Java Project from Existing Ant Buildfile.

posted @ 2006-05-19 02:58 Dedian 阅读(256) | 评论 (0) | 编辑收藏

Crawling policies

The behavior of a web crawler is the outcome of a combination of policies:

A selection policy that states which pages to download.
A re-visit policy that states when to check for changes to the pages.
A politeness policy that states how to avoid overloading websites.
A parallelization policy that states how to coordinate distributed web crawlers.

cite from:

http://en.wikipedia.org/wiki/Web_crawler

posted @ 2006-05-18 06:34 Dedian 阅读(187) | 评论 (0) | 编辑收藏

Compiler problem in Eclipse

Problem Description:

I wanna build GData source code under Eclipse which contrains creating type-specific map codes, the Eclipse IDE will complain something like that: Syntax error, parameterized types are only available if source level is 5.0

Reason:

The new feature to create a type-specific map can only be supported at source level 5.0

Solution:

Do some IDE compiler configuration:
Window > Preferences > Java > Compiler > Compiler compliance level => 5.0

Note:
1. type-specific map: create a map that will hold only objects of a certain type
example:

Map<Integer, String> map = new HashMap<Integer, String>();

    map.put(1, "first");
    map.put(2, "second");

2. if source level 5.0 is applied, Type-safe problem should be noticed for collection data type, such as Vector, List, Stack or Map etc.
that means, you can write code under level 1.4 like this:

private Vector MyList = new Vector();
...
MyList.add(str);

you'd better change to some stuff like this under level 5.0:

private Vector<String> MyList = new Vector<String>();

posted @ 2006-05-17 09:41 Dedian 阅读(404) | 评论 (0) | 编辑收藏

Planning for next job

1. Develop a searching engine merely for Weblogs (Main jobs will be on WebCrawler, Indexer and Searcher part has been done for xml-based information retrieval)

Motivation:
   a. Weblog is more and more popular recently
   b. Though there has some weblog search engines such as Technorati and Blogdigger, but still seems lots of work need to do.
   c. the formats of weblog feed (RSS2.0 & Atom) are xml-based and more standard, which is very close to my current job on xml-based information retrieval
   d. easily extensible for crawling xml-based information websites besides weblogs

HOWTO:
         a. Utilize GData for feeding xml-based information
or      b. using some Open Source Crawlers + Lucene (similar idea in this article)
or    c. develop and merge my own simple Crawler package into my Shemy project which is clustering structure searching engine design based on Lucene

         likely: c > a > b (coz most open source crawlers are supposed to deal with much complex web pages/links, while since weblog feed is simpler, the crawler for it should be lighter)

Requirement/Functionality Analysis : (in progress)

Schedule: (in progress)

2. Exploration of performation tuning on searching issues to improve Shemy kernel

posted @ 2006-05-17 06:36 Dedian 阅读(247) | 评论 (0) | 编辑收藏

Java Glossary -- Nested Class

Definition:

A class within another class

Example:

class EnclosingClass 
{
    ...
    class ANestedClass 
    {
        ...
    }
}

Purpose:

Reflect and enforce the relationship between two classes. (esp. in the scenarios that the nested class makes sense only in the context of its enclosing class or when it relies on the enclosing class for its functionthe nested class makes sense only in the context of its enclosing class or when it relies on the enclosing class for its function)

Interesting features:

1. An instance of InnerClass can exist only within an instance of

EnclosingClass

2. InnerClass instance has direct access to the instance variables and methods of its enclosing instance.
3. two special kinds of inner classes: local classes and anonymous classes

reference:
http://java.sun.com/docs/books/tutorial/java/javaOO/nested.html

posted @ 2006-05-16 08:22 Dedian 阅读(331) | 评论 (0) | 编辑收藏

仅列出标题


Copyright © Dedian	Powered by: 博客园模板提供：沪江博客

导航

常用链接

留言簿(8)