whost49's comments

whost49 · on Oct 23, 2023

I hope that I would be able to finish this quickly because I want to focus on my other project (https://github.com/yousseb/atfal-ai & https://github.com/yousseb/atfal-site) which is an effort to track missing children/people and use AI to help find them. I love meld, but as a person who has lost his brother in a way that he couldn't come back - and I keep losing family in Gaza -, I want to help reunite families who still have the chance to be reunited. Everyone is welcome to help.

whost49 · on July 23, 2016

Drew, thanks for sharing your settings. It looks like you are using a 2GB RAM system.

How many unicorn workers are you using? For 2GB systems, I'd recommend at most 2 rather than the default 3. Some discussion about this is here: https://gitlab.com/gitlab-org/omnibus-gitlab/issues/1279

We'll be doing more work in these coming months to profile and reduce the memory usage needed by GitLab so that all your tools can run comfortably in the 2GB range.

tenken · on July 23, 2016

Hi. I have a 2gb synology nas I run Gitlab via docker...i may mod the nas to say 6gb or 8gb ... But trying to avoid that.

My Gitlab backup is ~5gb monolithic .tar.gz daily. Is there any reason the tar.gz is 1 huge file, would using Linux split stop these massive Worker threads and device utilization? http://unix.stackexchange.com/a/61776/86052

An added benefit of splitting to a configurable max size is certain cloud vendors have a max filesize limit...like 5gb :/ when replicating the backup.

drewcrawford · on July 23, 2016

I have it set to 2 already :-) Unfortunately I still ENOMEM at least once a day.

sytse · on July 23, 2016

Strange, your screenshot shows 0.5GB free. I recommend configuring swap to prevent the system falling over. It shouldn't actively swap much with 2GB unless making a backup or something like that.

whost49 · on April 8, 2016

For your use case, I think Fluentd may work fine. LogZoom currently deals with structured JSON log data received from hundreds of hosts around the world. It could be modified to handle arbitrary logs (and wrap a structure around it) and integrate with Docker, but that was not the goal here.

whost49 · on April 8, 2016

> If you're around I'd love to hear specifically what you mean by this. Internally Logstash is very thread friendly, we only recommend multiple processes when you want either greater isolation or greater fault tolerance.

Right, we considered using multiple Logstash processes, but we really didn't want to run three instances of Logstash requiring three relatively heavyweight Java VMs. The total memory consumption of a single VM running Logstash is higher than running three different instances of LogZoom.

We looked at the Filebeat Redis output as well. First, it didn't seem to support encryption or client authentication out of the box. But what we really wanted was a way to make Logstash duplicate the data into two independent queues so that Elasticsearch and S3 outputs could work independently.

andrewvc · on April 8, 2016

Thanks for the thoughtfully considered response :).

Regarding security with redis. Did you read the docs here? https://www.elastic.co/guide/en/logstash/current/plugins-out... Logstash does support Redis Password auth (as does Filebeat). Regarding the encryption with redis point, seeing as Redis doesn't support SSL itself, are you using spiped as the official Redis docs recommend?

Regarding the two queues, I would like to clarify that you can do this with the:

Filebeat -> Logstash -> Redis -> Logstash -> (outputs) technique.

If you declare two Logstash Redis outputs in the first 'shipper' Logstash you can write to two separate queues. And have the second 'indexer' read from both.

It is true that if one output is down we will pause processing, but you can use multiple processes for that. It is possible that in the near future we will support multiple pipelines in a single process (which we already do internally in our master branch for metrics, just not in a publicly exposed way yet).

Regarding JVM overhead. That's a fair point about memory. The JVM does have a cost. That said, memory / VMs are cheap these days, and that cost is fixed. One thing to be careful of is that we often times see people surprised to find that they get a stray 100MB event going through their pipeline due to an application bug. Having that extra memory is a good idea regardless. We have many users increasing their heap size far beyond what the JVM requires simply to handle weird bursts of jumbo logs.

whost49 · on April 8, 2016

Thanks for that information. There's no doubt Logstash can do a lot, and it sounds like with the multiple pipeline feature Logstash will make it easier to do what we wanted to do in a single process.

In the past, we've also been burned by many Big Data solutions running out of heap space that adding more processes that relied on tuning JVM parameters again did not appeal to us.

whost49 · on April 8, 2016

Heka looks good and does a lot more. It doesn't appear to support Redis and S3 out of the box, so we would have probably had to evaluate, learn, and change the third-party plugins had we known about Heka beforehand.

whost49 · on April 8, 2016

No, we did not consider Gollum--it definitely looks like a possible solution and one we might have considered. I think the name of the project makes it hard to find, unfortunately.

whost49 · on June 16, 2015

Is there a reason why you didn't just use collectd, which is written in C and has lots of plugins, including statsd, InfluxDB, Graphite, etc.?

jssjr · on June 18, 2015

We use collectd extensively and it is wonderful software. Brubeck and collectd do very different jobs.

whost49 · on June 18, 2015

What does Brubeck do that the collectd statsd plugin can't?

https://collectd.org/wiki/index.php/Plugin:StatsD

whost49 · on Sept 1, 2014

Aclima - http://www.aclima.io - San Francisco, CA

==========

Aclima is an early-stage company based in San Francisco that designs and deploys distributed, large-scale sensor networks to empower people with actionable environmental quality data. Our end-to-end solutions collect, process and analyze real-time streaming data from thousands of sensors, enabling a level of environmental awareness that has never been possible before. We believe our technologies can redefine the way we imagine and manage our buildings, communities, and cities, helping us improve our collective well-being. We are looking for passionate engineers to help build, scale, and improve our platform.

We have no required list of skills or years of experience. Instead, we’re looking for engineers who are smart and get things done. Our engineering culture values rapid iteration, continuous improvement, and as much automation as is sensible. We work in a relaxed, purpose-driven atmosphere with flexible hours and competitive perks.

Positions open:

* Full-time Backend Engineer

Our stack includes: Python, Git, MariaDB, Cassandra, Nginx, NSQ, Redis, Ansible

- solid understanding of functional programming languages, distributed systems

* Full-time Frontend Developer

- solid understanding of core JavaScript, HTML5 and CSS3

- experience building well-structured web applications

- a passion for user-driven interaction design and delightful user experiences

- excitement about data visualization, mobile design, and responsive design

* Full-time UI/UX Designer

- At least 2-3 years of experience who has heavy UX chops and expertise in CSS-based design, peppered with JavaScript-based interaction design/development.

- Someone who is excited and passionate about data visualization, mobile design, and responsive design and has strong opinions about all three.

* Full-time DevOps Engineer

- some experience writing shell and Python scripts

- Debian/Ubuntu, Jenkins, Locust, JMeter, Google Cloud/AWS, Ansible experience a plus

==========

If you’re up for the challenge, contact us: jobs@aclima.io

Apply directly: http://boards.greenhouse.io/aclima

whost49 · on Aug 1, 2014

Aclima - http://www.aclima.io - San Francisco, CA

==========

Aclima is an early-stage company based in San Francisco that designs and deploys distributed, large-scale sensor networks to empower people with actionable environmental quality data. Our end-to-end solutions collect, process and analyze real-time streaming data from thousands of sensors, enabling a level of environmental awareness that has never been possible before. We believe our technologies can redefine the way we imagine and manage our buildings, communities, and cities, helping us improve our collective well-being.

We are looking for passionate engineers to help build, scale, and improve our platform. We have no required list of skills or years of experience. Instead, we’re looking for engineers who are smart and get things done. Our engineering culture values rapid iteration, continuous improvement, and as much automation as is sensible. We work in a relaxed, purpose-driven atmosphere with flexible hours and competitive perks.

Positions open:

* Full-time Backend Engineer

Our stack includes: Python, Git, MariaDB, Cassandra, Nginx, NSQ, Redis, Ansible

* Full-time Frontend Engineer

- solid understanding of core JavaScript, HTML5 and CSS3

- experience building well-structured web applications

- a passion for user-driven interaction design and delightful user experiences

- excitement about data visualization, mobile design, and responsive design

* Full-time UI/UX Designer

- At least 2-3 years of experience who has heavy UX chops and expertise in CSS-based design, peppered with JavaScript-based interaction design/development.

- Someone who is excited and passionate about data visualization, mobile design, and responsive design and has strong opinions about all three.

* Full-time DevOps Engineer

- Debian/Ubuntu, Jenkins, Locust, JMeter, Google Cloud/AWS, Ansible experience a plus

==========

If you’re up for the challenge, contact us: jobs@aclima.io

whost49 · on July 1, 2014

Aclima - http://www.aclima.io - San Francisco, CA

---

* Full-time Backend Engineer

* Full-time Frontend Engineer

* Full-time UI/UX Designer

* Full-time DevOps Engineer

---

Aclima is an early-stage company based in San Francisco that designs and deploys distributed, large-scale sensor networks to empower people with actionable environmental quality data. Our end-to-end solutions collect, process and analyze real-time streaming data from thousands of sensors, enabling a level of environmental awareness that has never been possible before. We believe our technologies can redefine the way we imagine and manage our buildings, communities, and cities, helping us improve our collective well-being.

We are looking for passionate engineers to help build and scale our platform. We have no required list of skills or years of experience. Instead, we’re looking for engineers who are smart and get things done. Our engineering culture values rapid iteration, continuous improvement, and as much automation as is sensible. We work in a relaxed, purpose-driven atmosphere with flexible hours and competitive perks.

If you’re up for the challenge, contact us: jobs@aclima.io