-
Notifications
You must be signed in to change notification settings - Fork 997
Closed
Description
Hi,
I am using Amazon ElasticSearch service and seprate Spark cluster on EMR and trying to execute elasticsearch-hadoop Apache Spark writing example , However on submitting the job on my local before I am getting following exception as info.
16/06/22 13:56:26 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Connection timed out: connect
16/06/22 13:56:26 INFO HttpMethodDirector: Retrying request
16/06/22 13:56:26 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Connection timed out: connect
16/06/22 13:56:26 INFO HttpMethodDirector: Retrying request
and later on am getting following exception.
Strack trace:
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:190)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:379)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[search-spark-elasticsearch-5cengch4w4hghs5coa5xu2tyoq.us-east-1.es.amazonaws.com:9200]]
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:142)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:434)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:414)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:418)
at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:122)
at org.elasticsearch.hadoop.rest.RestClient.esVersion(RestClient.java:564)
at org.elast
Following is my code that I am running -
Code:
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.elasticsearch.spark.rdd.EsSpark
object SparkElasticSearch extends App {
val conf = new SparkConf().setAppName("ElasticSearchTest").setMaster("local[*]")
val sc = new SparkContext(conf)
var esconf = Map("es.nodes" -> "search-spark-elasticsearch-xxxxx.xxxx.es.amazonaws.com", "es.nodes.wan.only" -> "true")
val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("OTP" -> "Otopeni", "SFO" -> "San Fran")
import org.elasticsearch.spark._
sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs", esconf)
}
Maven dependency
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-hadoop</artifactId>
<version>2.3.2</version>
</dependency>
Is It a bug or am I missing something here ?