-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Linstor volume plugin #4994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linstor volume plugin #4994
Conversation
|
Thanks for the PR @rp- is this ready for review/testing? |
|
Yes, but I'm also open for design discussions as most of the code was done by looking at other storage plugins and trying to understand(debug) how cloudstack handles most things. |
|
@rp- a can you at least raise a documentation PR (https://github.com/apache/cloudstack-documentation) to explain how to use this and some details on how this was tested. It would be great if you can share a design document of this feature (for example, https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158873968 a new storage plugin). I can help kick regression tests in the meanwhile. |
|
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✔️ centos7 ✔️ centos8 ✔️ debian. SL-JID 514 |
|
@blueorangutan test |
|
@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
Trillian test result (tid-604)
|
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/LinstorStoragePool.java
Outdated
Show resolved
Hide resolved
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/LinstorStoragePool.java
Outdated
Show resolved
Hide resolved
...n/java/org/apache/cloudstack/storage/datastore/driver/LinstorPrimaryDataStoreDriverImpl.java
Outdated
Show resolved
Hide resolved
| break; | ||
| case TEMPLATE: | ||
| s_logger.debug("createAsync - creating template"); | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the driver doesn't create snapshot / template here, better do not handle these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I wouldn't completely remove the cases, but I added an errMsg, so the callback gets an error answer,
If I would remove the cases, it would return an InvalidDataobject error, which it isn't
.../org/apache/cloudstack/storage/datastore/lifecycle/LinstorPrimaryDataStoreLifeCycleImpl.java
Outdated
Show resolved
Hide resolved
.../org/apache/cloudstack/storage/datastore/lifecycle/LinstorPrimaryDataStoreLifeCycleImpl.java
Show resolved
Hide resolved
@rhtyd |
|
@rp- can you join our mailing list and drop a line to dev@ http://cloudstack.apache.org/mailing-lists.html I think someone will access will grant you permission to add/edit pages (cc @DaanHoogland - do you have management access or do we ask ASF infra?) @blueorangutan package |
|
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✔️ centos7 ✔️ centos8 ✔️ debian. SL-JID 523 |
|
|
||
| StringJoiner sj = new StringJoiner(System.getProperty("line.separator")); | ||
| reader.lines().iterator().forEachRemaining(sj::add); | ||
| result = sj.toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As result is first used here, you could declare it here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may not even need this whole function if there is a way how I could get the running agent node/host name.
Do you know if there is something like that within the agent code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a little search and found a similar usage in Agent.java
cloudstack/agent/src/main/java/com/cloud/agent/Agent.java
Lines 475 to 478 in 6b9f3fb
| final Script command = new Script("hostname", 500, s_logger); | |
| final OutputInterpreter.OneLineParser parser = new OutputInterpreter.OneLineParser(); | |
| final String result = command.execute(parser); | |
| final String hostname = result == null ? parser.getLine() : addr.toString(); |
If we go deeper we may find another implementations.
I would suggest you to create an agent utility and implement a method to retrieve the hostname, then use it instead of this whole method.
...ns/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/LinstorStorageAdaptor.java
Outdated
Show resolved
Hide resolved
...ns/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/LinstorStorageAdaptor.java
Outdated
Show resolved
Hide resolved
| public boolean disconnectPhysicalDisk(String volumePath, KVMStoragePool pool) | ||
| { | ||
| s_logger.debug("Linstor: disconnectPhysicalDisk " + pool.getUuid() + ":" + volumePath); | ||
| // TODO remove diskless/diskful on node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better to register TODO in issues instead of comments in the code.
| public boolean disconnectPhysicalDisk(Map<String, String> volumeToDisconnect) | ||
| { | ||
| s_logger.debug("Linstor: disconnectPhysicalDisk map"); | ||
| // TODO remove diskless/diskful on node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better to register TODO in issues instead of comments in the code.
|
|
||
| final QemuImgFile srcFile = new QemuImgFile(sourcePath, sourceFormat); | ||
|
|
||
| // create linstor resource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method name is clear and auto explanatory, I think we do not need this comment.
| s_logger.error("Failed to copy " + srcFile.getFileName() + " to " + | ||
| destFile.getFileName() + " the error was: " + e.getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can pass the exception as parameter here.
|
@rp- I noticed that you added a lot of comments to the code. Most of they only describe what the next line is doing, e.q.: //verify if var is true then print it in console
if (var) {
System.out.println("var is true");
}These types of comments do not improve debugging and goes against a clean code. However, there are comments that explain a bit of the process, but they could be resolved with an extraction of the code to a method with a corresponding name, javadoc and etc. Also, there are Comments should be a last resort, not the rule. If we take all the comments and sort then in a logic way, we could write a book 😄 . |
| int timeout) | ||
| { | ||
| s_logger.info("Linstor: createDiskFromTemplate"); | ||
| // if source is linstor lvm-thin snapshot we could snapshot and restore from, but do plain copy for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rp- I do not know Linstor and do not know why you choose to use this way, but I am wondering why do a plain copy instead of a snapshot...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well it kinda is explained in the comment.
Linstor only supports snapshots from thin-lvm and ZFS backends, but with ZFS there exists a parent-child relation between snapshots. This means it could happen that you wan't to delete an old volume which once was your base of a template, but you can't deleted it, because you have several snapshots from it. (only way is to reverse the parent-child relation)
So from this experience we decided in other plugins to do rather plain volume copies because none of those problems would arise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rp- ok, understood.
I know from qemu that there is a way to coalesce (merge) or collapse the snapshots (cheatsheet). Could ZFS have anything like that?
|
I added a basic PR for documentation and also a first version of the design document: Btw. do you have some storage test guidelines? Like what basic functions should work and how to test them? |
@rp- Please check if Linstor storage plugin supports all the functionalities mentioned in the cwiki. Can you add some storage references in cwiki ? |
What do you mean with "storage references" ? |
@rp- I mean some references, for the docs to know about the Linstor storage. I found one mentioned in cwiki page, and updated the references section. |
|
@rp- here are some links for you for learning/references: cc @Philipp-Reisner Here are some high-level lifecycle operations a typical storage provider can support: (it also depends if which ones of the following operations are even possible/allowed by the storage control plane)
Most of the above operations already have a high level CloudStack API, generally the storage provider/plugin implements the backend for these operations. In some cases, if an operation is not allowed (for example, raw-disk snapshots on KVM) you may need to add conditional checks or refactor the CloudStack service layers (API, managers, storage sub-system... which largely defines the policy) and hypervisor sub-system (KVM+storage specific storage data motion strategy, LibvirtStoragePoolDef, handling of config drive, VM/disk migration, KVM specific linstore storage pool manager + storage processor + storage adaptor classes, i.e. they implement how storage/hypervisor specific behaviour are executed). In general you would setup a dev-test env, and while adding new storage pool in CloudStack add a label to that pool (primary storage) and then create disk offerings with the label - this way you can force VM's root&data disks to be created on the pool. These are called storage tags (we've something similar called host tags which can be used to map compute/service offerings with hypervisor hosts). Lastly, you want to write Marvin-based integration tests specific to your storage provider. Example implementations for references: Hope this helps. |
b1fa765 to
7ff45f4
Compare
|
@blueorangutan test centos7 xenserver-71 |
|
@rhtyd a Trillian-Jenkins test job (centos7 mgmt + xenserver-71) has been kicked to run smoke tests |
|
@blueorangutan test centos7 vmware-67u3 |
|
@rhtyd a Trillian-Jenkins test job (centos7 mgmt + vmware-67u3) has been kicked to run smoke tests |
This commit adds a volume(primary) storage plugin for the Linstor SDS. Currently it can create/delete/migrate volumes, snapshots should be possible, but currently don't work for RAW volume types in cloudstack.
ea7efff to
81ec81a
Compare
|
@rhtyd
Fixed/Added
Yes, unfortunately the cloudbyte plugin uses an old version of jersey client and I introduced a newer one which didn't build |
|
Thanks @rp- |
|
Trillian test result (tid-2047)
|
|
Thanks @rp- @rhtyd - smoke tests passed, let's wait for @sureshanaparti's review |
|
@blueorangutan package |
|
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Trillian test result (tid-2049)
|
|
Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 1273 |
|
@blueorangutan test |
|
@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
Trillian test result (tid-2058)
|
|
Merged based on 2xlgtms, and on smoketests and Travis passing. We haven't tested Linstor itself which is tested by @rp- |
@nvazquez reviewed the PR, LGTM. |
|
@rp- @Philipp-Reisner can you check a potential Linstor regression - https://markmail.org/message/tc4qkm2izzcjqahn |
|
We are trying to deploy linstor driver for primary storage and it is not wokring. We have configured 3 linstor nodes with ZFS, ACS 4.16 First we get an error that linstor was not installed in the host, so we install it and the error disappear, we think is not correct that Host has to be installed with the software because they just to be a "client". But there aren't any documentation about this. Then, when that error disappear, we try to create a VM and the log in the host show this: INFO When we go to the linstor dashboard, we see that Linstor creates a Resource Definition, then a resource into and finally the volumen, until this moment it all looking good, but inmediatly, deletes all resources created (resource definition, resource and volume). Linstor and Cloudstack are connecting fine when we add the primary storage. Error from the linstor side: ============================================================ Application: LINBIT® LINSTOR ============================================================ Reported error:Category: RuntimeException Error message: Resource definition 'cs-c3bbbcbb-142a-4f28-bf1b-fd0748c7a6f7' not found. Error context: Asynchronous stage backtrace: Call backtrace: Suppressed exception 1 of 1:Category: RuntimeException Error message: Error context: Call backtrace: END OF ERROR REPORT. |
|
Hi @rp- @Philipp-Reisner can you please advise on the reported issue? Could be a configuration issue or a bug in CloudStack that needs fixing? |
|
@nvazquez I'm on it via a different communication channel But I suspect a configuration problem, I'll report back here what solved the case. |
Description
This PR adds a Linstor volume(primary storage) cloudstack driver to cloudstack.
Basic operations as creating/migrating/deleting are working.
Snapshots work with Linstor for most storage pool types, but will fail because cloudstack doesn't seem to support
snapshots of RAW image types(suggestions/ideas are welcome)
It also has a commit to port elastistor jersey-client to version 2.26, so it will not collide with Linstor dependencies (I can put that in a separate PR if needed)
Design doc: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Linstor+storage+plugin
Fixes: #4403
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
How Has This Been Tested?
4 Node cloudstack cluster (1 Mgmt node, 3 KVM-hosts) with equivalent Linstor setup(1 Controller, 3 Satellites)
Attaching volumes to VM's, migrating volumes between NFS storage and Linstor, creating templates from Linstor volumes