JavaScript encapsulation & the module pattern
Encapsulation is one of the key features of object oriented programming languages. In languages like Java, it is very straight forward concept to implement. Since I know JavaScript is considered an OO...
View ArticlejQuery Deferred – one step closer to desktop apps
Every time I forget why I like jQuery, they keep reminding me. Not too long ago I came across jQuery deferred (even though it was added already in JQuery 1.5) and I immediately liked it. I feel this...
View ArticleBest code convention syndrome
Developers often tend to think that one coding convention is better than another in terms of readability. Some people think that adding a break before the curly braces is more coherent. Some like...
View ArticleHow to properly collect AWS EMR metrics?
Working with AWS EMR has a lot of benefits. But when it comes to metrics, AWS currently does not supply a proper solution for collecting cluster metrics from EMRs. Well, there is AWS Cloudwatch of...
View ArticleThe right way to use Spark and JDBC
A while ago I had to read data from a MySQL table, do a bit of manipulations on that data and store the results on the disk. The obvious choice was to use Spark, I was already using it for other stuff...
View ArticleQuick tip: Easily find data on the data lake when using AWS Glue Catalog
Finding data on the data lake can sometimes be a challenge. At my current workplace (ZipRecruiter) we have hundreds of tables on the data lake and it’s growing each day. We store the data on AWS S3...
View ArticleCoalesce with care…Coalesce Vs. Repartition in SparkSQL
Here is a quick Spark SQL riddle for you; what do you think can be problematic in the next spark code (assume that spark session was configured in an ideal way)? sparkSession.sql("select * from...
View ArticleSpark and Small Files
In my previous post I have showed this short code example: sparkSession.sql("select * from my_website_visits where post_id=317456") .write.parquet("s3://reports/visits_report") And I asked what may be...
View ArticleParquet data filtering with Pandas
When it comes to filtering data from Parquet files using pandas, several strategies can be employed. While it’s widely recognized that partitioning data can significantly enhance the efficiency of...
View ArticleData Engineering: Strategies for data retrieval on multi-dimensional data
You’ve likely heard about the benefits of partitioning data by a single dimension to boost retrieval performance. It’s a common practice in relational databases, NoSQL databases, and, notably, data...
View Article
More Pages to Explore .....