- •Credits
- •About the Authors
- •About the Reviewers
- •www.PacktPub.com
- •Table of Contents
- •Preface
- •Introduction
- •Installing Groovy on Windows
- •Installing Groovy on Linux and OS X
- •Executing Groovy code from the command line
- •Using Groovy as a command-line text file editor
- •Running Groovy with invokedynamic support
- •Building Groovy from source
- •Managing multiple Groovy installations on Linux
- •Using groovysh to try out Groovy commands
- •Starting groovyConsole to execute Groovy snippets
- •Configuring Groovy in Eclipse
- •Configuring Groovy in IntelliJ IDEA
- •Introduction
- •Using Java classes from Groovy
- •Embedding Groovy into Java
- •Compiling Groovy code
- •Generating documentation for Groovy code
- •Introduction
- •Searching strings with regular expressions
- •Writing less verbose Java Beans with Groovy Beans
- •Inheriting constructors in Groovy classes
- •Defining code as data in Groovy
- •Defining data structures as code in Groovy
- •Implementing multiple inheritance in Groovy
- •Defining type-checking rules for dynamic code
- •Adding automatic logging to Groovy classes
- •Introduction
- •Reading from a file
- •Reading a text file line by line
- •Processing every word in a text file
- •Writing to a file
- •Replacing tabs with spaces in a text file
- •Deleting a file or directory
- •Walking through a directory recursively
- •Searching for files
- •Changing file attributes on Windows
- •Reading data from a ZIP file
- •Reading an Excel file
- •Extracting data from a PDF
- •Introduction
- •Reading XML using XmlSlurper
- •Reading XML using XmlParser
- •Reading XML content with namespaces
- •Searching in XML with GPath
- •Searching in XML with XPath
- •Constructing XML content
- •Modifying XML content
- •Sorting XML nodes
- •Serializing Groovy Beans to XML
- •Introduction
- •Parsing JSON messages with JsonSlurper
- •Constructing JSON messages with JsonBuilder
- •Modifying JSON messages
- •Validating JSON messages
- •Converting JSON message to XML
- •Converting JSON message to Groovy Bean
- •Using JSON to configure your scripts
- •Introduction
- •Creating a database table
- •Connecting to an SQL database
- •Modifying data in an SQL database
- •Calling a stored procedure
- •Reading BLOB/CLOB from a database
- •Building a simple ORM framework
- •Using Groovy to access Redis
- •Using Groovy to access MongoDB
- •Using Groovy to access Apache Cassandra
- •Introduction
- •Downloading content from the Internet
- •Executing an HTTP GET request
- •Executing an HTTP POST request
- •Constructing and modifying complex URLs
- •Issuing a REST request and parsing a response
- •Issuing a SOAP request and parsing a response
- •Consuming RSS and Atom feeds
- •Using basic authentication for web service security
- •Using OAuth for web service security
- •Introduction
- •Querying methods and properties
- •Dynamically extending classes with new methods
- •Overriding methods dynamically
- •Adding performance logging to methods
- •Adding transparent imports to a script
- •DSL for executing commands over SSH
- •DSL for generating reports from logfiles
- •Introduction
- •Processing collections concurrently
- •Downloading files concurrently
- •Splitting a large task into smaller parallel jobs
- •Running tasks in parallel and asynchronously
- •Using actors to build message-based concurrency
- •Using STM to atomically update fields
- •Using dataflow variables for lazy evaluation
- •Index
Concurrent Programming in Groovy
There's more...
In this short recipe, we only tried out few of the many "parallel" methods. In the following recipes, we will see more examples of the Parallelizer in action. For a complete
list of parallel operations, refer to the Javadoc page of GParsUtil: http://gpars. org/1.0.0/javadoc/groovyx/gpars/GParsPoolUtil.html.
See also
ff Downloading files concurrently
ff http://gpars.codehaus.org/
ff http://www.gpars.org/guide/
ff http://gpars.org/1.0.0/javadoc/groovyx/gpars/GParsPoolUtil.html
ff http://nlp.stanford.edu/nlp
ff http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ process/PTBTokenizer.html
Downloading files concurrently
This recipe is about downloading files concurrently from the network. As for most recipes in this chapter, we will use the GPars framework to leverage the concurrent features required by the parallel downloading.
Getting ready
This recipe reuses the same build infrastructure created in the Processing collections concurrently recipe.
How to do it...
The download logic is completely encapsulated in a Groovy class.
1.Add a new FileDownloader class to the src/main/groovy/org/groovy/ cookbook folder:
package org.groovy.cookbook
import static groovyx.gpars.GParsPool.*
import static com.google.common.collect.Lists.* class FileDownloader {
340
www.it-ebooks.info
Chapter 10
static final int POOL_SIZE = 25 static pool
FileDownloader() {
pool = createPool(POOL_SIZE)
}
private void downloadFile(String remoteUrl, String localUrl) {
new File("$localUrl").withOutputStream { out -> new URL(remoteUrl).withInputStream { from -> out << from
}
}
}
private void parallelDownload(Map fromTo) { withExistingPool(pool) {
fromTo.eachParallel { from, to -> downloadFile(from, to)
}
}
}
void download(Map fromTo, int maxConcurrent) { if (maxConcurrent > 0) {
use(MapPartition) {
List maps = fromTo.partition(maxConcurrent) maps.each { downloadMap ->
parallelDownload(downloadMap)
}
}
}else { parallelDownload(fromTo)
}
}
}
class MapPartition {
static List partition(Map delegate, int size) {
def rslt = delegate.inject( [ [:] ] ) { ret, elem -> (ret.last() << elem).size() >= size ?
ret << [:] : ret
}
rslt.last() ? rslt : rslt[0..-2]
}
}
341
www.it-ebooks.info
Concurrent Programming in Groovy
2.Let's write a unit test, to test our newly created class. Don't forget to place the test in the src/main/groovy/org/groovy/cookbook folder:
package org.groovy.cookbook import org.junit.*
class FileDownloaderTest2 {
static final DOWNLOAD_BASE_DIR = '/tmp' static final TEST_SERVICE =
'https://androidnetworktester.googlecode.com' static final TEST_URL =
"${TEST_SERVICE}/files/1mb.txt?cache="
def downloader = new FileDownloader() Map files
@Before
void before() { files = [:] (1..5).each {
files.put( "${TEST_URL}1.${it}",
"${DOWNLOAD_BASE_DIR}/${it}MyFile.txt"
)
}
}
@Test
void testSerialDownload() {
long start = System.currentTimeMillis() files.each{ k,v ->
new File(v) << k.toURL().text
}
long timeSpent = System.currentTimeMillis() - start println "TIME NOPAR: ${timeSpent}"
}\
@Test
void testParallelDownload() {
long start = System.currentTimeMillis() downloader.download(files, 0)
long timeSpent = System.currentTimeMillis() - start
342
www.it-ebooks.info
Chapter 10
println "TIMEPAR: ${timeSpent}"
}
@Test
void testParallelDownloadWithMaxConcurrent() { long start = System.currentTimeMillis() downloader.download(files, 3)
long timeSpent = System.currentTimeMillis() - start println "TIMEPAR MAX 3: ${timeSpent}"
}
}
3.As usual, execute the test by issuing the following command in your shell: groovy -i clean test
4.The results are highly dependent on your network latency, but you should see an output as follows:
TIME NOPAR: 635 TIMEPAR: 391 TIMEPAR MAX 3: 586
How it works...
The FileDownloader class uses the Parallel Arrays implementation offered by GPars.
This implementation provides parallel variants of the common Groovy iteration methods such as each, collect, and findAll. Every time you come across a collection that is slow to process, consider using parallel collection methods. Although enabling collections for parallel processing imposes a certain overhead (mostly because of the cost of initializing a thread pool), it frequently outweighs the ineffectiveness of processing a collection in a sequential fashion. GPars gives you two options here:
ff GParsPool, which uses the "fork/join" algorithm, using a "fork/join" based thread pool;
ff GParsExecutorsPool, which uses the Java 5 executors.
In the majority of cases, the first option is more efficient, but it is always worth trying both thread pools, to verify which one performs better for a specific case.
The FileDownloader class resorts to GParsPool, which gets initialized in the class constructor. The pool creation operation is an expensive one and adds the higher overhead on the parallel framework.
343
www.it-ebooks.info