How to use Scala and Lucene to create a basic search application. One of the powerful benefits of Scala is that it has full access to any Java libraries; giving you a tremendous number of available resources and technology. This example doesn’t tap into the full power of Lucene, but highlights how easy it is to incorporate Java libraries into a Scala project.
This example is based off a Twitter analysis app I’ve been noodling on; which I am utilizing Lucene. The code below takes a list of tweets from a text file; creates an index that you can search and extract info from.
All code and working demo app available here: https://github.com/mkaz/Scala-and-Lucene
For this example, the data are simply lines in a file; each line is a tweet to be indexed. The indexer loops through the file creates a Lucene document from each line and adds it to the index.
val analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT)
val directory = new NIOFSDirectory(new java.io.File("tmp/lucene"))
val writer = new IndexWriter(directory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED)
val fileLines = io.Source.fromFile("data/tweets.txt").getLines.toList
fileLines foreach { line =>
writer.addDocument(simpleDoc(line))
}
/** Simple Lucene Document */
private def simpleDoc(text: String) = {
val doc = new Document()
doc.add(new Field("tweet", text, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES))
doc
}
This example extracts the terms from the index and sorts them based on frequency count. You could use this to see what’s popular or common trends.
val allTerms = collection.mutable.HashMap[String, Int]()
val reader = IndexReader.open(directory, true)
// create map of popular terms
val terms = reader.terms
while (terms.next) {
allTerms += terms.term.toString -> terms.docFreq()
}
// sort map
allTerms.toList sortBy { _._2 } foreach {
case (key, value) =>
println(key + ": " + value)
}
An example of a basic search using Lucene’s term query.
val searcher = new IndexSearcher(directory, true)
val query = new TermQuery(new Term("tweet", q))
// perform search, return top 10
val docs = searcher.search(query, 10)
docs.scoreDocs foreach { docId =>
val d = searcher.doc(docId.doc)
println(d.get("tweet"))
println
}
This was just an introduction to using Lucene and Scala, showing the ease of leveraging existing Java libraries. Lucene is a very powerful tool and provides numerous ways to store and query data; a more complex search app would utilize these features more in depth. Here’s an example using Lucene to perform Spatial Distance Searches and another example using Lucene to create Summarizations
To learn more about Lucene, you should check out Lucene in Action or other books available on the subject.
I hope you found it useful.